Multi-cloud deployment of blockchain infrastructure

Cloud (or Cloud Computing) was one of the largest buzzwords of the last decade. Still many people identify “the cloud” with Amazon Web Services. Despite that, there are several other cloud computing vendors that are worth looking at, especially if you’re building a blockchain solution supporting a large group.

These are primarily cloud service providers Google Cloud Platform, Microsoft Azure, Oracle Cloud, IBM Cloud, Rackspace and Hetzner Cloud. Below, I’ll discuss their advantages and show how we deployed our blockchain infrastructure.

The domination of AWS

Amazon’s domination among cloud providers in the minds of people is somewhat justified because Amazon was the precursor and main promoter of the concept of cloud computing. Amazon’s services are the most known, have the highest reliability, the best documentation. In short, they’re the role model for the competition.

But there are also a number of other vendors that provide private clouds and public clouds. Some examples are the Chinese Alibaba Cloud or the Polish e24cloud. These are more or less successful AWS clones and have even more or less similar APIs. Most often, technologically they don’t bring anything new, but they operate in regions poorly handled by competitors (e.g. Alibaba Cloud in China).

Location, location, location

Let’s begin with data center locations. As I’ll show later, this might be an issue for blockchain infrastructure. With the increase in the physical distance between the client and the data center, network delays increase. In transaction systems, this may determine the order of transaction processing from individual clients, and consequently, profitability or other economic parameters.

AWS covers most of the world but doesn’t have data centers in Africa, China, and Russia. The data centers in India, Brazil, and Australia don’t offer a full range of services. So if we want to start a service strongly dependent on the quality of connections (e.g. blockchain network or high-frequency trading), then it may be reasonable to take multicloud approach and use several different cloud vendors at the same time.

Multicloud strategy translates into various advantages, for instance one of the main pros of Microsoft Azure is having over 50 data centers in various regions of the world. These include the central states of the USA, Eastern Canada, Switzerland, Norway, China, India, Australia, South Korea, South Africa or the United Arab Emirates — in these regions AWS offers relatively large network delays.

More pros and cons of other cloud computing vendors

Google Cloud Platform

In addition to services based on open-source software (Linux, Docker, MySQL, Postgres, MongoDB, HBase, etc.), also provides its own services. These are, for example, BigTable and Realtime Database. They allow more efficient operation of large amounts of data than if you’re using only open source technology, as well as more efficient load balancing than AWS services. The price for this, however, is vendor lock-in, i.e. the impossibility of departing from this particular vendor.

Microsoft Azure

In addition to a number of locations, is also the best place to run all kinds of solutions based on Windows. This can be important if in our blockchain stack we use ready-made .NET libraries that don’t have their own implementations for Linux.

Hetzner Cloud

It’s a relatively new service of Hetzner Online, so far specializing in web hosting and low-cost dedicated servers. The Cloud offer brought a significant improvement in quality in relation to the current offer while maintaining very low prices. It still can’t compete with AWS in terms of stability, but it seems to be a matter of time. Its unique advantage is a data center in Finland.

Espeo’s solutions

Let’s take a look at the solutions we’ve used in Espeo for multi-cloud infrastructure management as well as the blockchain platform itself for blockchain infrastructure.

First approach — manual management

Our blockchain journey with working on distributed ledger technologies on cloud was, of course, manual management. By this, I mean logging into different cloud consoles from several different browsers. This approach worked quite well until we were in control of about 5-6 AWS accounts and one account for each other cloud vendors. With so few accounts, it was still possible to manage them so efficiently “on foot.” It seemed that the investments in the implementation of appropriate tools would take way too long to start paying off, especially that we didn’t know what technologies to stick to and which ones to avoid.

Second approach — tools. Open source?

The second approach was to analyze the available tools, but we wanted them to be open source tools. We were interested, among others, in the Terraform tool (from the creators of Vagrant). Very quickly, however, we got the impression that almost all existing open-source tools didn’t line up with how we work. So, either to manage your own infrastructure (for one company or one group of companies) or in the best case for managing large projects in the Infrastructure as Code model. The latter means describing the infrastructure elements in the form of a language specially created for this purpose.

Infrastructure as Code is, of course, a very sensible approach, but it has a disadvantage. It doesn’t work well for very small projects, which are often at the MVP stage and operate on a single server. In such cases, the Infrastructure as Code approach is to shoot a fly with a cannon. You’ll achieve it, but clients will want to know why they’re paying so much for it.

Third approach — Polynimbus

Ultimately, we decided to use the Polynimbus tool. It supports multicloud environments – eight different cloud vendors and have a competitive advantage of being a relatively simple (compared to Terraform) cloud resources pool, which perfectly suited our needs. Polynimbus supports an unlimited number of AWS accounts and requires minimum configuration for each of them. It basically covers only issuing the access key, secret access key, and the default region. All the rest, including e.g. fast changing AMI ID numbers of system images, are detected automatically.

Let’s take a look at our entire blockchain infrastructure:

As you can see, Polynimbus is one of the elements of a perfectly integrated stack. It covers the management of the full lifecycle of the instance, regardless of whether they are instances of AWS (EC2), Azure, Oracle or others. Creating an instance looks like this:

  • Polynimbus – proper creation of a new instance.
  • ZoneManager – adding a DNS record to Amazon Route53, binding the destination hostname to the IP address returned by Polynimbus.
  • Server Farmer – provisioning of the instance; at this stage various aspects of server security are configured. Central logging of events, backups, automatic updates, and then the instance is plugged into the farm (ie the central management system).
  • Ansible – application provisioning, starting with Docker and support tools. Then the Go stack is built (non-standard due to Hyperledger requirements), after which Hyperledger Fabric and Consul services are installed and configured. The latter in client or server mode. In general, there is no real need to run more than two Consul instances per single availability zone.
  • Next, the integration with a separate Apache Kafka cluster is configured, as well as with CircleCI.com responsible for the CI / CD processes, ie deployment of new versions of the application. So, the next step would be to start the Fabric node by CircleCI.com.

Conclusion

What’s important for both us and our clients, Polynimbus gives us full independence from any cloud vendor. Therefore, if we get a dedicated, more advantageous price offer, e.g. from Oracle, we don’t have to stay with AWS to provide blockchain services to our clients just because of some technical reasons.

One must remember real limitations. Not all power of each subsequent instance can be allocated to the proper application because one must remember about Consul cluster — so that Hyperledger connects to Consul in its own availability zone. And therefore, each of them must contain one or two Consul instances.

Thanks to this, we avoid a situation where global network failure causes problems with the correct operation of the application. In a correctly configured multi-cloud environment, multi-region, multi-AZ… In the case of global network failure, selected nodes simply cease to support current traffic. However, this failure doesn’t result in any other consequences. Thanks to an efficient management stack, in this case, if we anticipate longer problems, we’re able to add new blockchain nodes in other cloud vendors and regions.