SDI 4 Apps - Uptake of open geographic information through innovative services based on linked data

This blog post explains how it is possible to deploy the same platform in a uniform way into multiple clouds, even when the clouds have very different interfaces (as described in my previous blog post about Amazon, Google and Microsoft clouds).

The motivation for uniform deployment into multiple clouds

The SDI4Apps platform is in esence a set of software for working with geospatial data. As explained by Runar in this post and this post, it is a virtual machine with server-side web services and downloadable client-side JavaScript libraries, plus some common data sets.

Depending on the chosen business model, it can be provided by a service provider as a Platform-as-a-Service to application developers, or it can be deployed by an independent application developer into an Infrastructure-as-a-Service cloud. In either way, we should not limit the options where to deploy it.

To test the viability of such desirable cloud-provider independence, I have tried to find a way how to deploy the same software easily into 3 of the most famous IaaS clouds – Amazon EC2, Google Computing Engine and Microsoft Azure – as well as into a fourth one – our scientific research community cloud operated by Masaryk University using the OpenNebula software stack.

The first option – pre-installed  disk images

The first idea that came into my mind was the option of installing a selected operating system, installing all required geospatial software on top of the OS, then uploading the resulting virtual disk image into each cloud.
However we have quickly realized that this way of deploying the platform has some disadvantages:

  • the disk image size is in gigabytes, so
    • upload takes time
    • the IaaS provider charges money for the transfer
    • the IaaS provider charges money for storing the disk image
  • the installed OS and software should  receive security patches often
  • clouds have different requirements for uploaded disk images, so a separate image must be maintained for each cloud

This disadvantages led us to thinking about other options.

The second option – standard disk images modified by Cloud-Init

Another option is to take the disk images offered by each cloud provider, and to modify them during startup. These pre-defined disk images are created, stored and maintained by the cloud providers, so there is no upload, no charges for network transfer or storage space, and no need to maintain own images.

Each of the cloud providers prepared some set of disk images with pre-installed popular operating systems:

  • Amazon EC2 provides: Amazon Linux ; Red Hat Enterprise Linux 7.2 ; SUSE Linux Enterprise Server 11, 12; Ubuntu 14.04 ; MS Windows Server 2012, 2008, 2003
  • Google CE provides: Debian 7, 8; CentOS 6, 7 ; Core OS ; OpenSUSE ; Ubuntu 12.04 , 14.04, 15.04, 15.10 ; Red Hat Enterprise Linux 6, 7 ; SUSE Linux Enterprise Server 11, 12; MS Windows Server 2012, 2008
  • Microsoft Azure provides: Ubuntu 14.04 ; MS Windows Server 2012, 2008
  • Masaryk University’s OpenNebula: any OS for which we have installation media and licence

Clearly, there is only one free OS that all the providers have in common – Ubuntu 14.04. It is the LTS (Long Term Support – 5 years of support) version of probably the most popular Linux distribution.

We have asked the developers of all the six pilot applications in SDI4Apps if they agree with Ubuntu 14.04 as the OS for the SDI4Apps platform, and all of them agreed. Thus Ubuntu 14.04 was selected as the OS for the platform.

The Ubuntu disks prepared by the cloud providers are basic server installations of Ubuntu, with disk sizes around 10 GB.

We have followed the advice of the EGI Federated Cloud Task Force about contextualization, and used the Cloud-Init software which comes pre-installed in all the Ubuntu 14.04 disk images provided by the cloud providers. Contextualization is the process of installing, configuring and preparing software upon boot time on a pre-defined virtual machine image (e.g. setting the hostname, IP addresses, SSH authorized keys, starting services, installing applications, etc.).

Cloud-Init

Cloud-Init (wikidocumentation) is software developed by Ubuntu. It is run during each OS boot, and its purpose is to make modifications that are specified by three sources:

  • by the file /etc/cloud/cloud.cfg prepared by the creator of the disk image
  • by meta-data provided by the cloud infrastructure during launch – usually  server name, instance id, display name and other cloud specific details, in a cloud specific format, provided in a cloud specific way
  • by user-data passed in by the user when launching a new VM – it is a text in YAML format specified by Cloud-Init

The modifications usually include:

  • setting network (host name, /etc/hosts)
  • creating user accounts (by default a user named ubuntu) and adding SSH keys to ~/.ssh/authorized_keys
  • setting main package repositories to a mirror local to the cloud
  • upgrading installed packages to their newest versions
  • adding more package repositories
  • installing new software packages
  • adding files (e.g. motd with ASCII art logo displayed on successful login)
  • running arbitrary commands

As the /etc/cloud/cloud.cfg file and the meta-data are provided by the cloud, we need to provide only the user-data. The content of user-data does not depend on the cloud, only on the pre-installed image, which is always Ubuntu 14.04, thus we can use a single user-data for every cloud ! That’s the essence of the uniform deployment into any cloud.

For example, the following user-data code installs Oracle Java 7 and PostGIS from special repositories:

#add software repositories
apt_sources:
 - source: "ppa:webupd8team/java" # Oracle Java PPA
 - source: 'deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main' # PostgreSQL
 filename: pgdg.list
# Agreement with Oracle licence before packages are installed
bootcmd:
 - 'echo oracle-java7-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections'
#install these packages
packages:
 - oracle-java7-installer
 - oracle-java7-set-default
 - oracle-java7-unlimited-jce-policy
 - postgresql
 - postgresql-9.4-postgis-2.1 
 - postgresql-9.4-postgis-2.1-scripts

The following user-data code runs shell commands that download a shell script form GitHub and execute it:

runcmd:
 - 'wget --quiet https://raw.githubusercontent.com/SDI4Apps/cloud-platform/master/install_jslibs.sh -O /tmp/install_jslibs.sh'
 - 'bash /tmp/install_jslibs.sh'

In this way we can install the whole SDI4Apps platform. We have established a GitHub repository at https://github.com/SDI4Apps/cloud-platform which contains the user-data and scripts installing the platform from scratch on a plain Ubuntu 14.04 installation.

The third option  – Docker containers

There is also a third option, but it is not investigated enough yet.
We were enticed by the following  quote (source in Czech):
[blockquote author=”Patrick Zandl, the marketing director of Energomonitor”]Docker allows us to commoditize cloud expenses. If we want, we can pick up our traps and move them elsewhere using Docker.[/blockquote]
Docker is a tool for deployment of software in so-called containers. A container is an isolated environment with complete system libraries, running inside a hosting OS. A container is in principle a chroot directory with cgroups and namespaces, with exportable directories and TCP ports for linking to other containers.
A container can contain i.e. Ubuntu 14.04 with Apache, but can run on any Linux, MacOS X or Windows host.
Docker containers are like .deb or .rpm packages, but OS-independent.
There is a versioned repository of containers on DockerHub, containing official containers for famous software (e.g. Postgres, Apache, Ubuntu, Centos, …).
Anybody can create a new container by modifying another container. Containers can be linked (e.g. Apache -> PostGIS -> Postgres). Persistent data can be stored outside of containers using exported directories.

Summary

The most developed way of deploying the SDI4Apps platform into multiple clouds so far is the cloud-init way. The Docker containers way seems promising, but needs more investigation.

Share this: