SDI 4 Apps - Uptake of open geographic information through innovative services based on linked data

This blog post describes my user experience with 3 commercial public clouds and 1 community cloud. The public clouds are the three biggest IaaS clouds – Amazon Elastic Compute Cloud, Google Computing Engine and Microsoft Azure. The community cloud is the cloud operated by Masaryk University’s CERIT-SC center for the academic community, targeted to high-performance scientific computing.

In my previous blog post I have explained the 3 service models of cloud computing – Software-as-a-Service, Platform-as-a-Service and Infrastructure-as-a-Service, and how they relate to the SDI4Apps platform.  The SDI4Apps platform is a set of software comprising the platform, which should be deployable into any IaaS cloud. I want to share my experience gained when developing a solution for deployment into multiple clouds.

The compared clouds

Web interfaces

Each of the clouds provides a web portal for managing your VM instances and other services:

All these clouds offer similar services, but their interfaces – both web and command line – are wildly different. All of them let you choose from a set of preinstalled disk images with popular operation systems (it means various Linux distributions and MS Windows), and create a VM instance with a publicly accessible IP address.

In general, the Amazon web interface seems to be the most advanced and offers the easiest ways how to setup complicated things like HTTP load balancers. Google seems to think that cloud services should be managed primarily from command line, so its web interface does not allow all operations that are available from command line. I have found the Microsoft Azure web interface to be the most confusing one, even when it is the most colorful one. The OpenNebula interface has two modes, one “simple” with limited choices, and one “advanced” with more choices, but less user friendly.

How to get an account

For the three commercial clouds, you need just a credit card for paying for their services. In all of them, you just use your more general account (Amazon account, Google account, Microsoft account) with a registered credit card to access the cloud management web portal.

While creating and using Amazon and Google accounts was easy, I had problems to create a Microsoft account, it was discriminating among some subtypes of Microsoft accounts for persons and for organizations, and it took me a lot of time to figure it out correctly.

The Masaryk University’s cloud is not public, it can be used only for scientific research, so getting an account there is more complicated. Users must apply for an account, and confirm their identity through the eduGAIN federation of universities and other research organizations, or must be confirmed as members of a research project.

How much it costs

Using the Masaryk University’s cloud is free, but limited to research only. No commercial services can run there.

The commercial public clouds charge for actual use only. They usually offer a set of predefined computer configurations for VMs, ranging from small 1 CPU 1 GB RAM configurations up to large configurations with tens of CPUs and hundreds of GB RAM. Prices vary depending on geographical location, so prices for machines located in Germany are different from prices for machines located in the US east coast.

Amazon EC2 pricing differentiates on-demand instances charged by CPU-hour which are more expensive than reserved instances that are pre-paid. For example, the m4.xlarge instance with 4 CPU, 16 GB RAM, Linux OS, located in Frankfurt costs $0.285 per hour, which is $2496 per year, while the same instance in the same location costs $1599 per year if prepaid for 1 year, or even $1095 per year if prepaid for 3 years. Amazon also offers so called spot instances which are charged a price that changes every 5 minutes, and run only if the price is below a user defined limit. The price depends on demand, for example right now the price of m4.xlarge spot instance is ten times lower than the price for on-demand instance.

Google Compute Engine pricing also offers on-demand instances, for example its n1-standard-4 instance with 4 CPU, 15 GB RAM, Linux OS, located in Europe costs $0.440 per hour, but discounts are automatically applied for sustained running, so it would cost $1,349 per year. Google additionally offers custom machine types, which can have any combination of number of CPUs and RAM, in addition to the predefined configurations.

Microsoft Azure pricing offers only on-demand instances. Its A3 Standard instance with 4 CPU, 7 GB RAM, Linux OS, located in West Europe costs $0.240 per hour, which is $2102 per year. No discounts or custom machine types are available here.

Amazon entices new customers by giving them free 750 hours per month of its t2.micro instance for the first year, which is basically a free year of continuous running of one such instance.

External IP addresses

The commercial public clouds differ on how they provide a public IP address which you can use for example for a web server.

Amazon simply assigns a random IP address that is mapped to some ugly name like ec2-54-154-199-120.eu-west-1.compute.amazonaws.com which becomes the name of your VM.

Google similarly assigns a random IP address, like 104.155.82.227 , which is reverse-DNS mapped to a name like 227.82.155.104.bc.googleusercontent.com. You can use the name or the IP address.

Microsoft does not assign a public IP to every VM. Instead you have to create a “service” name (in fact a group of VMs), that must be unique among all of the Azure “services”, let’s say myownservice1, and only one VM in a service can have public IP address. The IP address is then mapped to a name like myownservice1.cloudapp.net. This is why the name must be unique among all users, all of them are used as names under the domain cloudapp.net.

The Amazon and Google approach seems to be more reasonable to me, as you can use the IP directly, the ugly name, or you can create a DNS alias for that name or dynamic DNS record for the IP.

The Masaryk University’s cloud uses the same approach as Amazon and Google, a public IP is assigned, which is mapped to a name like cloud58b.cerit-sc.cz.

Command line interfaces

Each of the four clouds offers a command-line interface. Their commands are totally incompatible, and each has different way of authenticating users.

The Masaryk University’s cloud requires you to install OpenNebula command line tools, which are written in Ruby programming language, and use X509 certificates for authentication.

Amazon EC2 has some command line interface, but I was unable to get it working.

Google Computing Engine cloud requires your to install Google Cloud SDK, which is written in Python. It uses OAuth for authentication. A useful guide for setting it up is here.

Microsoft Azure cloud requires Azure CLI tools, which are written in Node.js, and uses X509 certificates converted from ssh keys for authentication, which is very strange.

Contextualization

Contextualization is a way how to provide a freshly booted VM with some data that specify things like its name, which user accounts should be created inside, which SSH keys are authorized to be used for logging in, and so on.

Part of the data can be specified by the user when launching the VM instance. These user data are usually referred as user-data. The Ubuntu and Debian Linux OSes contain a software package named cloud-init, that accepts a text in YAML format specifying the user-data. The user-data can do things like to add software repositories, update software packages, add user accounts with authorized SSH keys, install any software, and run arbitrary shell scripts.

Only Amazon EC2 web interface allows to specify user-data during launching a VM instance. Google CE and Microsoft Azure allow user-data to be loaded from a file specified in the command line when starting a VM instance. OpenNebula does not have the concept of user-data, so it must be provided as part of so-called template which is a text specifying properties for launching VMs.

Summary

The four clouds offer similar services, but their interfaces are very different. For a user that is new to a cloud, I would recommend to start with the Amazon EC2 cloud, because it has the best web interface (which even allows specification of contextualization data from the web form when launching a new VM instance), and gives the first year running of a 1 CPU VM for free.

Share this: