In this article, I’ll take you on a journey to find out what blockchain oracles are, what they’re for, and see the technical aspects behind them. We’ll go through base oracle flow, external data sources, authenticity proof (proof that a party doesn’t tamper with data), as well as oracle verification. Fasten your seatbelts and let’s begin!
Update 23/1/2019: While we in the Espeo Blockchain development team use some of the blockchain oracles already available on the market, we weren’t completely satisfied. So we decided to build a solution for ourselves and to share it with you. We’ve just launched a free, open-source oracle called Gardener. Read more about the project here, or find it on GitHub. Enjoy!
What’s the issue with blockchain oracles?
Blockchains function in a closed, trustless environment and can’t get any information from outside the blockchain due to security reasons or so-called sandboxing. You can treat everything within the node network as a single source of truth, secured by the consensus protocol. Following the consensus, all nodes in the network agree to accept only one version of their managed state of the world. Think of it like blinders on a horse — useful, but not much perspective.
However, sometimes the information available in the network isn’t enough. Let’s say I need to know what the price of gold is on a blockchain-based derivatives trading app. Using only data from inside the blockchain we have no way of knowing that. Because the smart contract lives in the sandboxed environment it has no option to retrieve that data by itself, the only viable alternative is to request that data and wait for some external party we trust to send it back. That’s where the utility of blockchain oracles come in.
There are two components that any sensibly working oracle in the blockchain has to incorporate. One is a data source handling component that retrieves requested data from the reliable data feed sources. These data sources can be data stores or databases, APIs of various types, even internal data stored in some ERP or CRM enterprise systems. The second component is the off-chain monitoring mechanism. It looks for the requests from smart contract and retrieves the required data from the data source handling component. Then, it feeds it back to the smart contract using some unique identification data to communicate which request the submitted data is related to.
When do we need oracles?
I’ve already discussed how the data can be provided back to the requesting smart contract. We also need to consider the timing of when the data should be acquired from its source and at what moment it should be sent back to the smart contract. Let’s first consider in what situations a smart contract may need access to the blockchain oracles. There are endless cases and even more solutions to them, so let’s explore a handful of them:
- Derivative platforms that need to know the pricing feeds for the underlying assets, such as one we worked on called CloseCross
- Prediction markets, when the ultimate result of the event has to be established reliably,
- Solutions which need provable randomness (Ethereum is a deterministic platform),
- Information from other blockchains,
- Heavy computations, which don’t fit within block gas limits (or even if they fit they’re extremely expensive)
- Complex mathematical equations (using f.ex. WolframAlpha)
- Retrieval of some data from IPFS or other data storage
Implementing the concept
I already wrote what sort of data you can retrieve, and how and for what reasons. Now, let’s dive into more details on how you can implement the whole oracle concept in more practical terms. Because part of the blockchain oracles is an off-chain mechanism, it could be developed using any modern programming language.
Of course, any language which has frameworks that allow it to communicate with the blockchain. It constantly searches (listens) for events emitted by a relevant smart contract and checks if they’re requests for external data. If so, it accesses that data from some data source and processes it according to the specified rules. One option is to use an external data provider we trust. Due to some external factors (agreements), we know it would never cheat.
On the other hand, if we use the data provider that we can’t trust or even force the smart contract clients to use our internal data, we can cause a lot of disruption in the client’s operation. For example, if the data finally provided cannot be trusted or isn’t reliable – as we can put literally any data we want there. We can try and cheat to force our contract to behave according to our expectations even though the truth about the external world’s conditions is different. To sum up, choosing the right data provider is half the battle.
An improvement to that could be to choose a few separate sources of the data, but then a problem related to data accuracy would become apparent. For example, when we want to get exchange rates for EUR/USD from a few different exchange rates agencies or exchanges, it’s almost guaranteed that the values would be slightly different for each of them. On the surface, the simple task of providing the data back to the smart contract appears to be, in general, quite a hard problem to solve correctly and reliably.
Once we have our data inside the oracle software, it would be good to prove that we didn’t manipulate it. The most basic uses don’t include any proof. Our users have to believe that we just pass on what we get. But there are stronger proofs. Oraclize.it, the leader in Ethereum blockchain oracles at the moment, uses the TLSNotary proof, Android proof and Ledger proof. Let’s briefly check how they differ.
The first one – the TLSNotary proof – leverages features of the TLS protocol to split the TLS master key between three parties: the server, an auditee and an auditor. Oraclize is the auditee, while a special locked-down AWS instance acts as the auditor. The TLSNotary protocol is open sourced and you can read more about it on the project site: https://tlsnotary.org/.
The black boxes are steps from the standard TLS protocol. The green ones are added in TLSNotary. Basically, what we achieved is that the auditor can get the original data and check if it’s not been tampered with. However, he doesn’t know the auditee’s credentials so he can’t perform the action on his behalf.
The next one is the Android proof. It uses two technologies developed by Google: SafetyNet and Android Hardware Attestation. SafetyNet validates if the Android application is running on a safe and not rooted physical device. It also checks if the application code hash isn’t tampered with. Because the application is open source, it can be easily audited and any changes to it would change the code hash.
On the other hand, Android Hardware Attestation checks if the device is running on the latest OS version to prevent any potential exploits. Both technologies together ensure that the device is a provably secure environment where we can make untampered HTTPS connections with a remote data source.
The last one from Oraclize is the Ledger proof. It uses hardware wallets from the French company Ledger (mainly known for Ledger Nano S or Ledger Blue). These devices encompass the STMicroelectronics secure element, a controller and an operating system called BOLOS. Via BOLOS SDK developers can write their own applications which they can install on the hardware just like cryptocurrency wallets. BOLOS exposes kernel-level API and some operations from that are about cryptography and attestation.
The last one is especially useful here. Via the API, we can ask the kernel to produce a signed hash from the binary. It is signed by a special attestation key, which is controlled by the kernel and out of reach of the application developers. Thanks to this, we can make code attestation as well as device attestation. Currently the Ledger Proof is used for providing untampered entropy to smart-contracts.
Another solution – TownCrier – offers Intel SGX, which is a new capability of some new Intel CPUs. SGX is an acronym for Software Guard Extensions and its architecture extensions. It was designed to increase the security of application codes and data. It is achieved by the introduction of enclaves – protected areas of execution in the memory. There, the code is executed by special instructions and other processes or areas don’t have access to them.
The image above shows how it works. The contract User calls the TC contract, which emits an event cought by the TC server. Then the TC server, via the TLS connection, connects to the data store and feeds the data back to contract. Because all of this happens in the TC server enclave, even the operator of the server can’t peek into the enclave or modify its behaviour, while the TLS prevents tampering or eavesdropping on the communication.
A word of caution
Keep in mind, however, that even though each of these solutions provides a way to prove data integrity, no one offers a verifiable on-chain method. You can either trust a big company (like Intel) or make a separate verification off-chain, but even then we notice tampering only after the first successful occurrence.
The last thing I haven’t mentioned yet is how the oracle contract verifies who to accept responses from. This solution is rather simple (at least in most cases). Every account in Ethereum, and the off-chain servers has private and public keys pairs, which identify each of them uniquely (as long as nobody steals it from the server).
To sum up, here’s how the business and technical aspects of oracle constructions work. I started with business needs and use cases, next switched to an oracle description and then started examining exactly how it works. I talked about data sources, authenticity proofs, and server identifications. All this knowledge should give you a general overview of blockchain oracles.
If you’d like to use one of the current solutions or feel that none of them meet your expectations for blockchain oracles, you can ask us for help 🙂 And by the way, here’s my previous article on Ethereum-related issues (gas costs).