Categories
Software Technology

Automatic Jira Task Estimation based on the Azure Machine Learning Predictive Model

Machine Learning has been increasingly entering every area of our lives over recent  years. From the recognition of photos and voices to, intelligent hints and suggestions online and even in our cars. According to a study by mckinsey.com, by 2030, 30% of the work that people currently do will be taken over by machines. In software companies it will be even more so. Worse-case scenario – you’re afraid that they’ll make you redundant from your favorite job, and they’ll replace you with machines. Don’t think that. The machine will rather take over the basic work leaving you to focus on more innovative things. And this is how AI can support our daily administrative tasks…Let’s take a closer look to the Jira task estimation based on the Azure Machine Learning predictive model.

AI powered planning with Jira

We spend 2 hours twice a month estimating the backlog tasks. For a team of 5 people that is 10 hours, and on a monthly basis 20 hours of planning. It’s quite a lot, given that the programmer, hopefully, earns on average 30 USD per hour. Annually it means 8 thousands USD spent on planning alone. In this case, 4 additional people from different departments take part in the planning as they are responsible for the whole process of delivering the software to the client.

On the one hand, without planning, the project would certainly not look as it should. And inevitably, the level of complexity and structure would decline over time. On the other…few programmers like the so-called “Meeting Day”. It’s the day when you think of all the things you could be doing, but can’t  because you’re at a meeting.

Jira is a well-structured tool where many processes can be simplified, but maybe Jira could give something more just by itself. You have backlog of 200 tasks. You read the content of the task. After 5 seconds you know, more or less, how much time it will take you to complete each task (as long as the content is written legibly, concisely, and illustratively). Clear situation – the task is priced, you move on to the next task. This is another task in a row that you priced to 3 Story Points. You have already done 20 similar tasks.

Azure Machine Learning Planning  – Configuration Model

The first step to include Azure Machine Learning to our planning process is to export current tasks from Jira to CSV so that they can be analyzed. Unfortunately, Jira exports the CSV file in a way that is not compatible with what we expect. The file is very dirty (spaces, tabs, enters, html code). Azure ML cannot import it properly either. Additionally, the team valued the tasks according to the Fibonacci sequence on a scale of 0,1,2,3,5,8,13. For our calculations it is too big a gap – we will simplify it to the form of EASY(1), MEDIUM(3), DIFFICULT(5).
We export the data to a html file and then parse it using NodeJS to a format that we can accept.
https://gist.github.com/ssuperczynski/b08d87843674eb4be64cb0fe7f658456
After importing a new CSV file to Azure ML we get the following task distribution.

CSV file to Azure ML

Since our JS script only helped us to pre-prepare the file, we now have to further prepare it for more analysis.
Steps we have already taken:

  • merge title with description,
  • drop html tags (this step will be explained later),
  • replace numbers,
  • remove special chars.

Steps we need to take now:

  • remove duplicate chars,
  • convert to lower case,
  • stem the words: driver, drive, drove, driven, drives, driving becoming drive,
  • remove stopwords (that, did, and, should …).

Verification process

Further steps that I have not done, but they can significantly influence the final result. In order to verify the truthfulness of the estimate it is necessary to attach it one by one and verify the final effect.

  • Do not cut out html tags (the project has tasks for frontenders, where html tags sometimes have a key meaning in the estimation of the task).
  • Some of the tasks have French words – it is questionable whether they are significant, here we should make changes at the level of project management in Jira.
  • Separation of frontend and backend tasks. Currently, despite the fact that each task has its own EPIC, we do not attach it, so the front and back tasks are combined into one backlog.
  • The number of tasks is too small – the more tasks the better model

 
The current Azure ML scheme is as follows
Azure ML scheme
After making all the necessary modifications, the sentence that initially looked like this:
Create table to log user last logins. This omit issue with locks on user table
Is changed to:
creat tabl log user login omit issu lock user tabl
Below is a graph of word statistics for tasks larger than 5SP.

jira estimation

The next step is to analyze the words from two angles.
Unigram – a method that counts all single occurrences of words. In our case, the method may prove ineffective, because the content of “Change font under application form”. – having 1 SP and “add two new child forms under application form” has 5 SP points to the word “application form” which once has 1SP and another 5SP.
Bigram, Trigram, N-gram – based on N word statistics, where N is 2, 3 and so on.
I chose the N-gram method, which turned out to be much more effective.
In N-gram analysis we stop comparing strings, and switch to on hash – this comparison works faster, and because our database will continue to grow with time, comparisons will be faster.

feature hashing
jira task estimation

Once the N-grams analysis is created, we can create and train our model, taking 70% of the data as the data to train, and 30% as test data.

n grams
 
The last step is to give our scheme the ability to introduce content into the analysis, and to show the level of difficulty as simulated by the model.

Azure ML scheme web servives

The tests used were those of my colleagues from the project, who themselves gave me the content of the tasks for analysis.
Here are the results:

  •  oAuth Refresh token doesn’t work – Azure ML – easy – friends – easy
  • Add BBcode feature to the form – Azure ML – easy – friends easy
  • Fix the styles for upload button – Azure ML – easy – friends – easy
  • Message form refactor – Azure ML – difficult – friends – difficult
  • Random string method has many implementations – unify them into one – Azure ML – easy – friends – easy
request response 1
request response 2

Summing up

As you can see in the above 5 examples, the accuracy of our program was 100%, despite the fact that there are places where we can improve our model. Based on our tests overall accuracy was around 80%.

At the moment it can be used during planning – for tests, but in the near future to efficiently inform the customer how much the task will cost and whether it should be divided into pieces – and all this before planning.

The next step is to build a Jira plugin and include it right next to the task description.

See also:

Categories
Blockchain Software Technology

Decentralized AI: Blockchain's bright future

Blockchain and artificial intelligence are driving technological innovation worldwide and both have profound  implications for the future of business as well as our personal data. How can the two technologies merge? I’ll discuss the opportunities which could arise from decentralized AI.

Before we look at the possible merging of blockchain and AI into decentralized AI, let’s look at the two separately. Let’s look at the benefits of Artificial Intelligence and blockchain.

Artificial intelligence (AI) is a field in computer science dedicated to creating intelligent machines. Also known as machine learning, AI gives machines skills traditionally reserved to humans. Problem solving, speech recognition, planning, and learning are among them.

Meanwhile, blockchain is a decentralized technology which is a global network of computers. A robust platform allows blocks of similar information to be stored over the network.
PwC predicts that by 2030 AI will add up to $15.7 trillion to the world economy, and as a result, global GDP will rise by 14%. According to Gartner’s prediction, business value added by blockchain technology will increase to $3.1 trillion by the same year. Currently, the cryptocurrency sector makes the most use of blockchain tech. So, is the integration of blockchain and AI possible? Can both merge into one and enter other sectors? Actually, that’s already happening and some businesses are beginning to see the potential of integrating blockchain and AI.

Advantages of blockchain technology

Here are some of the advantages of blockchain technology:

  • Blockchain is decentralized. It allows data to be shared without a central unit.  This keeps transactions on a blockchain verifiable and processable independent of a central force.
  • Blockchain is durable and consistent due to its decentralized nature. It can resist malicious attacks on its systems because it does not have a central point vulnerable to attack.
  • Information, timelines, and authenticity supplied by blockchain technology are all accurate.

Benefits of Artificial Intelligence (AI)

AI, or machine intelligence, has a lower error rate compared to humans when coding. As a result, AI offers a greater level of accuracy, speed and precision.

  • AI  is totally logical as it has no emotions and thus makes error-free rational decisions. 
  • Machines don’t get tired and can thrive in hazardous conditions. This enables them to carry out dangerous tasks, such as space exploration, or even mining.
  • Trusting AI with data analysis is the best decision any company can make. AI can easily calculate unstructured data, and give results in real-time, ensuring accuracy in data analytics.

Previous collaboration between blockchain and AI

There’s been notable integration between AI and blockchain. Some examples of this include the Singularity.Net blockchain and AI program, which was created to enhance smart contract testing. Supply chain firm, Nahame has also incorporated blockchain technology and AI to help companies with auditing. There are some plans by a peer-to-peer car rental company, which have been made public, to produce a fleet of self driving cars on blockchain technology.

Decentralized AI – where AI and blockchain could intersect

The best way to use the two of the biggest technologies out there today is by looking to capitalize on one’s strength to aid the other.

Data protection

Artificial intelligence largely depends on our data and uses it to improve itself through machine learning. What’s particularly relevant to AI is the gathering of data about human interactions and other details. Blockchain is a technology that allows encryption of data storage on a decentralized system, and it runs a totally secured and protected database only authorized users can access. So when we integrate blockchain and AI, it means we have a protected decentralized AI system for sensitive data such as financial or even medical data. Therefore, blockchain technology is a great security advantage.

Let’s take a look at Spotify – it uses users’ data to recommend music based on their recent searches and preferences. Most of the time we aren’t concerned about the information as it isn’t particularly sensitive. However, when it comes to our sensitive information stored in the cloud of a company, we would be more concerned about privacy and the guarantee of that privacy.

Ensuring security

As a centralized system running on a single processor,  hackers or malware can infiltrate an AI system and alter its instructions. With blockchain though, before any information is accepted and processed on a blockchain platform, it must go through several nodes or phases of the network on the system. It becomes more difficult to hack any blockchain-based technology when it has more nodes on its network. Although not impossible, it would be far more difficult to hack a blockchain-based, decentralized AI platform.

Trustworthiness

There is greater trust in the system. In order to have credibility, a system must be trustworthy. Blockchain is a more transparent technology than a closed AI system. Blockchains protect data through encryption — only authorized users can access it. This makes it impossible for unauthorized parties to view anything.

In the case of blockchain application in the healthcare sector, patients don’t want their medical information to be accessible to any unauthorized viewers. Medical information remains encrypted to prevent unauthorized third parties from accessing it. Keeping medical information on a blockchain would also allow healthcare providers to easily access patients’ files so they can provide medical aid in case of an emergency. Adding increased performance AI will bring storage to the blockchain by making it easier to access unstructured data.

Benefits of Artificial Intelligence & blockchain in the long run

There are many benefits businesses can gain from integrating blockchain with AI. Porsche automobile in partnership with XAIN AG is already working on decentralized AI applications in its advanced vehicles. JD.com, a leader in developing AI-based applications, has already started using this integration to build decentralized business applications. So it’s worth considering blockchain and AI as integrated technology. It’s not a problem if you already use blockchain or just AI in your business. You can integrate either technology through your existing website API.
Here are some benefits of Artificial Intelligence merging with blockchain:

Decentralized Intelligence

This is an obvious result of the technology integration. Blockchain is a decentralized system while AI is an intelligent system. It would enable business organizations to set up a blockchain-based architecture that allows a combination of AI design. This could be a peer-to-peer connection that has an image recognition feature or language processing.

Energy saving and cost efficient IT architecture

A 2016 report from Deloitte estimated that the annual cost of authenticating transactions on a blockchain is $600 million, most of which goes into mining operations. An AI-integrated blockchain will help organizations reduce their energy consumption. Since AI can predict and speedily calculate data, it would also make it possible for cryptocurrency miners to know when they are performing a less important transaction. This would also allow enterprises to execute transactions faster.

In fact, as AI becomes more developed, and after the integration of AI and blockchain technology becomes more common, AI may take over the mining process on blockchains. Given the fact that AI learns and adapts to its environment, combined with blockchain, there’s no doubt that it will learn the process and the architecture of the blockchain network.

Flexible AI

AI integration with blockchain will pave the way for the development of an artificial general intelligence (AGI) platform. The blockchain model can create a distributed specimen for the development of an AGI.

The integration of blockchain and AI has yet to take off fully. Combining the two technologies into decentralized AI has deep potential to use data in novel ways. A successful integration of both technologies will allow quicker and smoother data management, verification of transactions, identification of illegitimate documents, etc. Therefore, if you’re contemplating the integration of both technologies for your business, don’t hesitate, do it!

Categories
Software Technology

Staff-efficient Large Scale Server Management using Open Source Tools (Part 1)

Server management on behalf of the client is a fairly common service today, complementing the portfolio of many software houses and outsourcing companies. There are still, however, two types of companies on the market: those that provide high-quality services (with competences) at an attractive price (achieved thanks to synergy, not thanks to crude cost-cutting), and… others. If you’re a user of such services — have you ever wondered, which type of supplier you currently use?

Question of quality

All of us are and have been customers many times, and we have a better or worse idea of ​​what high quality is. Often we simply identify it with the satisfaction from the service. The problem arises in case of more advanced services, or those that we have only been using for a short period of time, or even for the first time — and we don’t know what we should really expect from a professional service, and what is only the solid mediocrity.

Let’s think about the criteria of professionalism in the case of server management services. In 2018 — not in 2008 or 1998. The first answer that comes to mind is, of course, “customer satisfaction.” The thing is, this satisfaction is a subjective matter, relative, for example, to support reaction times or other variable parameters derived from purchased hosting plan — or even subjective feelings from conversations with a support engineer (people can feel sympathy for each other or not).

In 2018, another completely objective parameter is absolutely critical: security. In fact, this is why the management of servers is entrusted to specialists, instead of, for example, a full-time programmer who, after all, also knows how to install Linux, so that our clients’ data is safe.

server management quality

How to provide high-quality services

The question arises, however, on the part of the supplier: how to provide high-quality services (and therefore with maximum pressure on security) at competitive market prices, while paying relatively high wages to employees (in Poland today we have an employee market, especially in IT, which imposes high rates) and still make money on these services?

The answer is very simple and complicated at the same time. It’s simply choosing the right tools, that fit your real business model. This seemingly simple answer is complicated, however, when we begin to delve into details.

The key to the correct selection of tools is the understanding of your own business model at the operational level, i.e. not at the level of contracts and money flow, or even at the level of marketing and sales strategy, but at the level of actual work hours, and possible synergies between analogical activities for different clients. Server management doesn’t have the characteristic of a production line, on which all activities are fully reproducible — the trick is to find certain patterns and dependencies, on the basis of which one can build synergy and choose the right tools.

Instead, unfortunately, most companies that provide good quality server administration services, go to one of two extremes that prevent them from building the synergy, which leads them to high costs. As a result, these services aren’t perceived by their boards as prospective and in time become only a smaller or larger addition to these development services. This, in the long run, leads to self-suppression of good-quality services from the market, in favor of services of dubious quality. But let’s go back to the extremes mentioned. It can go in one of the following directions:

  1. Proprietary, usually closed (or sometimes “shared source”) software for managing services. At the beginning, meeting the company’s needs perfectly, over time, however, these needs change, because the technology changes very quickly and the company itself is evolving. As a result, after 3-4 years, the company stays with the system which isn’t attractive for potential employees (because the experience gained in such a company isn’t transferable to any other company), and secondly, it requires constant and increasing expenditures for maintenance and “small development.”
  2. Widely-known software, often used and liked, or at least recognized by many IT people, only… it fits someone’s imagination about their business model, instead of the real one. Why? The reason is very simple: most of popular tools are written either for large companies managing homogeneous IT infrastructure (meaning that many servers are used for common purposes, have common users, etc.), or for hosting companies (serving different clients but offering strictly defined services).

Open source tools

Interestingly, as of 2018, there are still no widely known, open-source tools for heterogeneous infrastructure management, involving support for different owners, different configurations, installed services and applications, and, above all, completely different business goals and performance indicators. Presumably, because authors of such tools don’t have any interest in publishing them as open-source, decreasing potential profit. All globally used tools (eg. Puppet, Chef, Ansible, Salt and others) are designed to manage homogeneous infrastructure. Of course, you can run a separate instance of one of these tools for each client, but it won’t scale to many clients and won’t build any synergy or competitive advantage.

At this point, it’s worth mentioning how we dealt with it in Espeo. Espeo Software provides software development services to over 100 clients from around the world. For several dozen clients, these services are supplemented by management of both production and dev/test servers, and overall DevOps support. It’s a very specific business model, completely different from eg. web hosting company or a company that manages servers for one big client — at least at the operational level.

Therefore, one should ask what are the key factors in such a business model to build synergies — and, above all, on top of what should this synergy be built, so that it’s not synergy at the expense of professionalism. In the case of Espeo, we decided on a dual-stack model, in which, in simplified terms, server support is divided into infrastructure and application levels. This division is however rather conceptual than rigid since both these levels overlap themselves in many aspects.

This division, however, provides the basis for building synergies at the infrastructure level, where, unlike at the application level, the needs of very different clients are similar: security. At the infrastructure level, we use the open-source micro-framework Server Farmer, which is actually a collection of over 80 separate solutions, closely related to the security of the Linux system and various aspects of heterogeneous infrastructure management based on this system.

The Server Farmer’s physical architecture is very similar to the Ansible framework architecture, which is used by us at the application level. Thanks to the similar architecture of both tools, it’s possible, for example, to use the same network architecture on both levels and for all clients. Most of all, however, we’re able to build huge synergy in the area of ​​security thanks to the change from the management of separate contracts (which, in the era of tens of thousands of different machines searching the whole Internet for vulnerabilities and automatically infecting computers and servers, is simply a weak solution) to the production line model, ensuring the right level of security for all clients and all servers.

server management services

Building synergy

An example can be taken from the process of regularly updating the system software on servers, which in 2018 is an absolutely necessary process to be able to talk about any reasonable level of security at all. Modern Linux distributions have automatic update mechanisms, however, the only software elements updated are those which  won’t cause disruptions in the operation of the services. All the rest should be updated manually (i.e. through appropriate tools, but under the supervision of a person).

And here is the often-repeated problem in many companies: using tools that are known and liked by employees, even if these tools don’t fit the company’s business model and don’t speed up this simple operation in any way.

Let’s imagine a company that supports IT infrastructure for e.g. 50 clients, using 50 separate installations of Puppet, Chef Ansible or, which is even worse, a combination of these tools. As a result, we manage the same group of employees-administrators 50 times, we plan the architecture of the system 50 times, we start to analyze logs 50 times, etc. It’s, of course, feasible and in itself doesn’t lead to lowering of the security level. However, in such a model it’s impossible to use the employees’ time effectively, because with 50 separate installations most of this time is consumed on simple, repetitive and easy to automate activities, and on configuring the same elements, just in different places. What follows is that any business conducted this way isn’t scalable and leads to gradual self-marginalization.

This mistake, however, isn’t due to poor orientation or bad intentions of these companies. It’s simply because open source tools for managing the heterogeneous infrastructure of appropriate quality are relatively specialized software, and as a result, the knowledge of such software by potential employees is quite rare. What’s more, many companies decide to create dedicated instances of Puppet or Ansible for each client, because their undeniable advantage from the perspective of the employee is the transferability of experience between successive employers – even if for the employer himself it means the lack of scalability of the process.

If we look at it from the point of view of building synergy and as a result of permanent business advantage, however, the selection of tools to only satisfy the current and short-term HR needs is a weak idea. A much better attempt is to build a compromise between “employability” and the scalability of the employees’ work. This is why in our dual-stack approach, with the help of the Server Farmer, each employee responsible for infrastructure level management can manage approx. 200 servers per daily-hour (an hour spent each working day).

That means one theoretical job position, understood as the full 8 hours worked during each working day (i.e. full 168 hours per month), can support approx. 1,600 servers in the entire process of maintaining a high level of security (including daily review of logs, uploading software updates, user and permission management, and many other everyday activities). Of course, the real, effective workday of a typical IT employee is closer to five than eight hours a day, nevertheless, the theoretical eight hours is the basis for all comparisons. If you already use server management services, ask your current provider, how many servers is each employee able to support without the loss of quality…

Check out Expertise

Whether you need support for your single-server MVP, or you want to scale up your application to thousands servers – our unique approach gives you the best time-to-market and full transparency.

Security isn’t everything

But, of course, security isn’t everything. After all, nobody pays for maintaining servers only so they are secure as such, but to make money on them. And money is earned by the application level, which is by nature quite different for each of the clients. These differences make building synergies between individual client activities so hard, that in practice, there is no sense in doing it since easy hiring is more important than micro-synergies. That’s why in Espeo we use Ansible at this level, as it’s compatible with the Server Farmer, and at the same time, it’s widely known and ensures the inflow of employees.

Of course, for such a dual-stack solution to work properly, it’s necessary to set clear limits of responsibility, so that on the basis of these limits, as well as on the basis of SLA levels or other parameters bought by individual customers. It’s possible to build specific application solutions, and the actions of individual employees don’t overlap. Only then it will be possible to build effective capacity management processes (modeled after ITIL), providing each customer with high-quality services in a repeatable and predictable manner.

capacity management

In part two we’ll describe more technically, what particular solutions we use, in what architecture and how these solutions apply to business processes, and what processes we use for PCI DSS clients.

Estimate your project

Do you have a creative idea? Give us just a little more details and we will get back to you with a tailored offer!

Find more about our Expertise

Categories
Blockchain Software

Blockchain oracles: Can blockchain talk to the world?

In this article, I’ll take you on a journey to find out what blockchain oracles are, what they’re for, and see the technical aspects behind them. We’ll go through base oracle flow, external data sources, authenticity proof (proof that a party doesn’t tamper with data), as well as oracle verification. Fasten your seatbelts and let’s begin!

Update 23/1/2019: While we in the Espeo Blockchain development team use some of the blockchain oracles already available on the market, we weren’t completely satisfied. So we decided to build a solution for ourselves and to share it with you. We’ve just launched a free, open-source oracle called Gardener. Read more about the project here, or find it on GitHub. Enjoy!

What’s the issue with blockchain oracles?

Blockchains function in a closed, trustless environment and can’t get any information from outside the blockchain due to security reasons or so-called sandboxing. You can treat everything within the node network as a single source of truth, secured by the consensus protocol. Following the consensus, all nodes in the network agree to accept only one version of their managed state of the world. Think of it like blinders on a horse — useful, but not much perspective.

However, sometimes the information available in the network isn’t enough. Let’s say I need to know what the price of gold is on a blockchain-based derivatives trading app. Using only data from inside the blockchain we have no way of knowing that. Because the smart contract lives in the sandboxed environment it has no option to retrieve that data by itself, the only viable alternative is to request that data and wait for some external party we trust to send it back. That’s where the utility of blockchain oracles come in.

Two components

There are two components that any sensibly working oracle in the blockchain has to incorporate. One is a data source handling component that retrieves requested data from the reliable data feed sources. These data sources can be data stores or databases, APIs of various types, even internal data stored in some ERP or CRM enterprise systems. The second component is the off-chain monitoring mechanism. It looks for the requests from smart contract and retrieves the required data from the data source handling component. Then, it feeds it back to the smart contract using some unique identification data to communicate which request the submitted data is related to.

When do we need oracles?

I’ve already discussed how the data can be provided back to the requesting smart contract. We also need to consider the timing of when the data should be acquired from its source and at what moment it should be sent back to the smart contract. Let’s first consider in what situations a smart contract may need access to the blockchain oracles. There are endless cases and even more solutions to them, so let’s explore a handful of them:

  • Derivative platforms that need to know the pricing feeds for the underlying assets, such as one we worked on called CloseCross
  • Prediction markets, when the ultimate result of the event has to be established reliably,
  • Solutions which need provable randomness (Ethereum is a deterministic platform),
  • Information from other blockchains,
  • Heavy computations, which don’t fit within block gas limits (or even if they fit they’re extremely expensive)
  • Complex mathematical equations (using f.ex. WolframAlpha)
  • Retrieval of some data from IPFS or other data storage

Implementing the concept

I already wrote what sort of data you can retrieve, and how and for what reasons. Now, let’s dive into more details on how you can implement the whole oracle concept in more practical terms. Because part of the blockchain oracles is an off-chain mechanism, it could be developed using any modern programming language.

Of course, any language which has frameworks that allow it to communicate with the blockchain. It constantly searches (listens) for events emitted by a relevant smart contract and checks if they’re requests for external data. If so, it accesses that data from some data source and processes it according to the specified rules. One option is to use an external data provider we trust. Due to some external factors (agreements), we know it would never cheat.

On the other hand, if we use the data provider that we can’t trust or even force the smart contract clients to use our internal data, we can cause a lot of disruption in the client’s operation. For example, if the data finally provided cannot be trusted or isn’t reliable – as we can put literally any data we want there. We can try and cheat to force our contract to behave according to our expectations even though the truth about the external world’s conditions is different. To sum up, choosing the right data provider is half the battle.

An improvement to that could be to choose a few separate sources of the data, but then a problem related to data accuracy would become apparent. For example, when we want to get exchange rates for EUR/USD from a few different exchange rates agencies or exchanges, it’s almost guaranteed that the values would be slightly different for each of them. On the surface, the simple task of providing the data back to the smart contract appears to be, in general, quite a hard problem to solve correctly and reliably.

Proofs

Once we have our data inside the oracle software, it would be good to prove that we didn’t manipulate it. The most basic uses don’t include any proof. Our users have to believe that we just pass on what we get. But there are stronger proofs. Oraclize.it, the leader in Ethereum blockchain oracles at the moment, uses the TLSNotary proof, Android proof and Ledger proof. Let’s briefly check how they differ.

TLSNotary proof

The first one – the TLSNotary proof – leverages features of the TLS protocol to split the TLS master key between three parties: the server, an auditee and an auditor. Oraclize is the auditee, while a special locked-down AWS instance acts as the auditor. The TLSNotary protocol is open sourced and you can read more about it on the project site: https://tlsnotary.org/.

The black boxes are steps from the standard TLS protocol. The green ones are added in TLSNotary. Basically, what we achieved is that the auditor can get the original data and check if it’s not been tampered with. However, he doesn’t know the auditee’s credentials so he can’t perform the action on his behalf.

Android proof

The next one is the Android proof. It uses two technologies developed by Google: SafetyNet and Android Hardware Attestation. SafetyNet validates if the Android application is running on a safe and not rooted physical device. It also checks if the application code hash isn’t tampered with. Because the application is open source, it can be easily audited and any changes to it would change the code hash.

On the other hand, Android Hardware Attestation checks if the device is running on the latest OS version to prevent any potential exploits. Both technologies together ensure that the device is a provably secure environment where we can make untampered HTTPS connections with a remote data source.

Ledger proof

The last one from Oraclize is the Ledger proof. It uses hardware wallets from the French company Ledger (mainly known for Ledger Nano S or Ledger Blue). These devices encompass the STMicroelectronics secure element, a controller and an operating system called BOLOS. Via BOLOS SDK developers can write their own applications which they can install on the hardware just like cryptocurrency wallets. BOLOS exposes kernel-level API and some operations from that are about cryptography and attestation.

The last one is especially useful here. Via the API, we can ask the kernel to produce a signed hash from the binary. It is signed by a special attestation key, which is controlled by the kernel and out of reach of the application developers. Thanks to this, we can make code attestation as well as device attestation. Currently the Ledger Proof is used for providing untampered entropy to smart-contracts.

TownCrier

Another solution – TownCrier – offers Intel SGX, which is a new capability of some new Intel CPUs. SGX is an acronym for Software Guard Extensions and its architecture extensions. It was designed to increase the security of application codes and data. It is achieved by the introduction of enclaves – protected areas of execution in the memory. There, the code is executed by special instructions and other processes or areas don’t have access to them.

Source: http://www.town-crier.org/get-started.html

The image above shows how it works. The contract User calls the TC contract, which emits an event cought by the TC server. Then the TC server, via the TLS connection, connects to the data store and feeds the data back to contract. Because all of this happens in the TC server enclave, even the operator of the server can’t peek into the enclave or modify its behaviour, while the TLS prevents tampering or eavesdropping on the communication.

A word of caution

Keep in mind, however, that even though each of these solutions provides a way to prove data integrity, no one offers a verifiable on-chain method. You can either trust a big company (like Intel) or make a separate verification off-chain, but even then we notice tampering only after the first successful occurrence.

The last thing I haven’t mentioned yet is how the oracle contract verifies who to accept responses from. This solution is rather simple (at least in most cases). Every account in Ethereum, and the off-chain servers has private and public keys pairs, which identify each of them uniquely (as long as nobody steals it from the server).

Conclusion

To sum up, here’s how the business and technical aspects of oracle constructions work. I started with business needs and use cases, next switched to an oracle description and then started examining exactly how it works. I talked about data sources, authenticity proofs, and server identifications. All this knowledge should give you a general overview of blockchain oracles.

If you’d like to use one of the current solutions or feel that none of them meet your expectations for blockchain oracles, you can ask us for help 🙂 And by the way, here’s my previous article on Ethereum-related issues (gas costs).

Links:

http://docs.oraclize.it/#security-deep-dive
https://ethereum.stackexchange.com/questions/201/how-does-oraclize-handle-the-tlsnotary-secret
https://tlsnotary.org/
http://www.town-crier.org/