Docker + OpenStack = True

OpenStack and DockerOpenStack has the potential to revolutionize how enterprises do virtualization, so lots of people are currently busy with setting up OpenStack private clouds. I’ve been part of such a project too. In this post, I’ll explain why I believe Docker makes a lot of sense to use for deploying the services that together make up an OpenStack cloud. Docker features flexibility, orchestration, clustering and more.

All about microservices

OpenStack is commonly viewed as a platform for virtualization. However, under the hood OpenStack is nothing more than a bunch of microservices which end-users can interact with, either directly or through the Horizon UI. Most of the microservices are REST APIs for controlling different aspects of the platform (block storage, authentication, telemetry and so on). API nodes for an OpenStack cloud end up running dozens of small processes, each with their own distinct purpose. In such a setting, Dockerizing all these microservices makes a lot of sense.

Flexibility and speed

Of several improvements Dockerizing brings, the most important is the flexibility. Running a Docker Swarm for OpenStack API services makes it possible to easily scale out by adding more swarm nodes and launching additional copies of the containers on them. Coupled with HAProxy, the new containers can join the backend pool for an API service in seconds and help alleviate high load. Sure, the same can be accomplished by adding physical API nodes, provision them with Chef/Puppet/Salt/Ansible and reconfigure HAProxy to include them in the backend pool for each service, but that takes considerably longer time than just launching more pre-built containers.

Versioning and ease of debugging

Since Docker images are versioned and pushed to a central registry, it’s trivial to ensure that all instances of a service run with identical configs, packages and libraries. Furthermore, even though a service like Nova typically consists of 4-5 different sub-services, all of them can share the same config and therefore use the same container image. The only difference is which command the container runs when started. Being able to easily check that all backend instances are identical (use the same version of the image) is important when debugging issues. Also, Docker Compose has built-in support for viewing logs from several containers sorted chronologically, no matter which physical node they run on. That also includes the option to follow logs in real-time from several containers at once.

Orchestration and clustering

Using Docker Compose to orchestrate microservices is important. Compose provides an interface to scale the number of containers for each service and supports constraints and affinities on where each container should run. For example if you run a clustered MySQL instance for the internal OpenStack databases, Compose can ensure that each database container runs on a different physical host than the others in the MySQL cluster. When creating container images for each service, an entrypoint shell script can be included and used to detect if there is an existing cluster to add itself to, or if this is the first instance of the service. Clustering the services that OpenStack APIs rely on (notably MySQL and RabbitMQ) becomes easier with this type of pattern.

Service Discovery

A solution for Service Discovery is a requirement when operating dozens of microservices that expect to be able to find each other and talk together. In the Docker ecosystem, Consul is a great option for service discovery. Coupled with the Registrator container deployed on each swarm node, all microservices that listen on a TCP/UDP port are automatically added as services in Consul. It’s easy to query Consul for the IP addresses of a particular service using either the HTTP or DNS interface. With the right Consul setup, each Dockerized service can reference other services by their Consul DNS name in config files and so on. This way, no server names or IP addresses need to be hard coded in the config of each service, which is a great plus.

There are more desireable effects of Dockerizing OpenStack microservices, but the most important ones in my opinion are flexibility, ease of debugging, orchestration and service discovery. If you wonder why Docker doesn’t just replace OpenStack entirely, I recommend reading this TechRepublic article. There Matt Asay points out that a common enterprise pattern is to utilize OpenStack for its strong multi-tenancy model. Applications can in that case be deployed with Docker on top of VMs provisioned using OpenStack, which I think will be a very useful way of utilizing OpenStack and Docker for big enterprises with a diverse set of applications, departments and users.

~ Arne ~

Spark cluster on OpenStack with multi-user Jupyter Notebook

Spark on OpenStack with Jupyter

Apache Spark is gaining traction as the defacto analysis suite for big data, especially for those using Python. Spark has a rich API for Python and several very useful built-in libraries like MLlib for machine learning and Spark Streaming for realtime analysis. Jupyter (formerly IPython Notebook) is a convenient interface to perform exploratory data analysis and all kinds of other analytic tasks using Python. In this post I’ll show step-by-step how to set up a Spark cluster on OpenStack and configure Jupyter with multi-user access and an easy-to-use PySpark profile.

Setting up a Spark cluster on OpenStack VMs

To get a Spark standalone cluster up and running, all you need to do is spawn some VMs and start Spark as master on one of them and slave on the others. They will automatically form a cluster that you can connect to from Python, Java and Scala applications using the IP address of the master node. Sounds easy enough, right? But there are some pitfalls, so read on for tips on how to avoid them.

To create the first VM to be used as master node, I use the OpenStack command line client. We’ll get that node up and running first. The distribution is Ubuntu 14.04 Trusty. Cloud-init is an easy way to perform bootstrapping of nodes to get the necessary software installed and set up. To install and start a Spark master node I use the following Cloud-init script:

#!/bin/bash
#
# Cloud-init script to get Spark Master up and running
#
SPARK_VERSION="1.5.0"
APACHE_MIRROR="apache.uib.no"
LOCALNET="10.20.30.0/24"

# Firewall setup
ufw allow from $LOCALNET
ufw allow 80/tcp
ufw allow 443/tcp
ufw allow 4040:4050/tcp
ufw allow 7077/tcp
ufw allow 8080/tcp

# Dependencies
apt-get -y update
apt-get -y install openjdk-7-jdk

# Download and unpack Spark
curl -o /tmp/spark-$SPARK_VERSION-bin-hadoop1.tgz http://$APACHE_MIRROR/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop1.tgz
tar xvz -C /opt -f /tmp/spark-$SPARK_VERSION-bin-hadoop1.tgz
ln -s /opt/spark-$SPARK_VERSION-bin-hadoop1/ /opt/spark
chown -R root.root /opt/spark-$SPARK_VERSION-bin-hadoop1/*

# Configure Spark master
cp /opt/spark/conf/spark-env.sh.template /opt/spark/conf/spark-env.sh
sed -i 's/# - SPARK_MASTER_OPTS.*/SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=4 -Dspark.executor.memory=2G"/' /opt/spark/conf/spark-env.sh

# Make sure our hostname is resolvable by adding it to /etc/hosts
echo $(ip -o addr show dev eth0 | fgrep "inet " | egrep -o '[0-9.]+/[0-9]+' | cut -f1 -d/) $HOSTNAME | sudo tee -a /etc/hosts

# Start Spark Master with IP address of eth0 as the address to use
/opt/spark/sbin/start-master.sh -h $(ip -o addr show dev eth0 | fgrep "inet " | egrep -o '[0-9.]+/[0-9]+' | cut -f1 -d/)

Save this as init-spark-master.sh for use with the OpenStack command line client. The script first adds some firewall rules to allow access to the different components and installs the OpenJDK dependency. Next, a Spark tarball is downloaded, unpacked and made available under /opt/spark on the host. The tarball is prepackaged with Hadoop v1 libraries (note the “hadoop1.tgz” suffix), so adjust this if you need Hadoop v2 instead.

The only configuration of Spark we need at this point is to set the options “spark.deploy.defaultCores” and “spark.executor.memory”. They are used to configure how much resources each application will get when it starts. Since the goal is to set up a multi-user environment with Jupyter notebooks, we need to limit the total amount of CPU cores and RAM that each notebook will use. Each notebook is an “application” on the cluster for as long as the notebook is active (i.e until it is shutdown by the user). If we don’t limit the resource allocation, the first notebook created will allocate all available CPU cores on each worker, leaving no CPU cores free for the next user. In addition, the default RAM allocation for each app is only 512 MB on each worker node, which might be a bit too small, so we bump that up to 2 GB.

The echo line adds “spark-master” to /etc/hosts with a reference to the IP address of the VM. Spark tries to resolve the local hostname on startup. Without a resolvable hostname you might encounter “Name or service not known”-errors, resulting in Java exceptions and exits.

On the last line the Spark master process is started. The master process is given the IP address of the local host as an argument to make sure it binds to the correct interface. The IP address is extracted from the output of the “ip addr” command.

One way to launch the master VM with the Cloud-init script is like this:

# Install OpenStack client if not present already
sudo apt-get -y install python-openstackclient

# Customize these values to match your OpenStack cluster
OS_IMAGE="f9c9b0dc-6407-4ac2-ad0e-45cb4e47bb01"
OS_NETID="df7cc182-8794-4134-b700-1fb8f1fbf070"
OS_KEYNAME="arnes"

# Create Spark master VM
openstack server create --flavor m1.medium --image $OS_IMAGE --nic net-id=$OS_NETID --key-name $OS_KEYNAME --user-data init-spark-master.sh spark-master

If you rather prefer using the web UI (Horizon), you could just as easily paste the Cloud-init script into the text box on the Post-Creation tab of the Launch Instance dialog and archieve the same result.

It will take some minutes for the Spark master VM to finish bootstrapping. When it’s done the Spark master UI will be available on port 8080. Remember to associate a floating IP to the VM to be able to access it from outside the OpenStack project:

openstack ip floating add 10.1.1.1 spark-master

Verify that the Spark master UI is reachable and displays metadata about the cluster. If the UI is not reachable, first check that your Security Group rules allow port 8080 to the Spark master VM. Second, check the Cloud-init logs on the VM to ensure all parts of the initialization succeeded. You’ll find the Cloud-init log file on the VM as /var/log/cloud-init.log and the output from the Cloud-init script in /var/log/cloud-init-output.log. You can also try to re-run parts of the Cloud-init script with sudo to narrow down any issues with the initialization. When initialization succeeds the Spark master UI will look like this:

Spark master UI with no workers

As expected there are no workers alive yet, so let’s initialize some. To do so we use a slightly modified version of the Cloud-init script above. The main difference is the startup command, which is now /opt/spark/sbin/start-slave.sh with the address to the master as the only argument. Remember to adjust the variables below to your IP range and master IP address.

#!/bin/bash
#
# Cloud-init script to get Spark Worker up and running
#
SPARK_VERSION="1.5.0"
APACHE_MIRROR="apache.uib.no"
LOCALNET="10.20.30.0/24"
SPARK_MASTER_IP="10.20.30.178"

# Firewall setup
ufw allow from $LOCALNET
ufw allow 8081/tcp

# Dependencies
apt-get -y update
apt-get -y install openjdk-7-jdk

# Download and unpack Spark
curl -o /tmp/spark-$SPARK_VERSION-bin-hadoop1.tgz http://$APACHE_MIRROR/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop1.tgz
tar xvz -C /opt -f /tmp/spark-$SPARK_VERSION-bin-hadoop1.tgz
ln -s /opt/spark-$SPARK_VERSION-bin-hadoop1/ /opt/spark
chown -R root.root /opt/spark-$SPARK_VERSION-bin-hadoop1/*

# Make sure our hostname is resolvable by adding it to /etc/hosts
echo $(ip -o addr show dev eth0 | fgrep "inet " | egrep -o '[0-9.]+/[0-9]+' | cut -f1 -d/) $HOSTNAME | sudo tee -a /etc/hosts
# Start Spark worker with address of Spark master to join cluster 
/opt/spark/sbin/start-slave.sh spark://$SPARK_MASTER_IP:7077

Save this as init-spark-worker.sh. Notice how the last line starts up a slave with the address to the cluster master on port 7077 as the only argument. Since the Cloud-init script has no worker-specific details, it is easy to expand the cluster just by creating more worker VMs initialized with the same Cloud-init script. Let’s create the first worker:

# Customize these values to match your OpenStack cluster
OS_IMAGE="f9c9b0dc-6407-4ac2-ad0e-45cb4e47bb01"
OS_NETID="df7cc182-8794-4134-b700-1fb8f1fbf070"
OS_KEYNAME="arnes"

# Create first Spark worker VM, this time with flavor m1.large
openstack server create --flavor m1.large --image $OS_IMAGE --nic net-id=$OS_NETID --key-name $OS_KEYNAME --user-data init-spark-worker.sh spark-worker1

Pause for a moment to let the worker creation process finish to ensure that Cloud-init does the necessary work without errors. There is no point in initializing more workers until the process is proven to work as expected. Again, it can be useful to check /var/log/cloud-init.log and /var/log/cloud-init-output.log on the new VM to verify that Cloud-init does what it’s supposed to do. On success you’ll see the worker in the Spark master UI:

Spark master UI with one worker registered

Create some more worker nodes to scale the cluster to handle more parallel tasks:

openstack server create --flavor m1.large --image $OS_IMAGE --nic net-id=$OS_NETID --key-name $OS_KEYNAME --user-data init-spark-worker.sh spark-worker2

openstack server create --flavor m1.large --image $OS_IMAGE --nic net-id=$OS_NETID --key-name $OS_KEYNAME --user-data init-spark-worker.sh spark-worker3

openstack server create --flavor m1.large --image $OS_IMAGE --nic net-id=$OS_NETID --key-name $OS_KEYNAME --user-data init-spark-worker.sh spark-worker4

Verify that the new worker nodes show up in the Spark master UI before continuing.

Installing Jupyter and JupyterHub

A shiny new Spark cluster is fine, but we also need interfaces to be able to use it. Spark comes prepackaged with shells for Scala and Python where connection to a cluster is already set up. The same level of usability is possible to get with Jupyter (formerly IPython Notebook), so that when you open a new notebook a connection to the Spark cluster (a SparkContext) is established for you. The SparkContext is available through the variable “sc” in the notebook, ready to use by calling sc.textFile() to create an RDD, for instance.

JupyterHub is a multi-user server for Jupyter notebooks. That makes it possible for several users to use Jupyter independently and have their own notebooks and files in their home directory instead of a shared storage directory for all notebooks. However, this requires that each user has a user account on the VM where JupyterHub is running. Add user accounts for relevant users now if needed. JupyterHub uses unix authentication, meaning that it relays the username and password to the underlying authentication system on the VM for credential check.

In this deployment JupyterHub is installed on the Spark master VM and launched there. It could run on a separate VM, but there is normally no need for that since the Spark master process does not require that much resources. The VM where Jupyter notebooks are executed are called the “driver” in Spark, and that will require some processing power and memory use, depending on the use case.

SSH into the Spark master VM and run the following set of commands:

# Install pip3 and other dependencies
sudo apt-get -y install python3-pip npm nodejs-legacy
sudo npm install -g configurable-http-proxy

# Install JupyterHub and Jupyter
sudo pip3 install jupyterhub
sudo pip3 install "ipython[notebook]"

pip3 is used instead of pip because JupyterHub depends on Python >= 3.3. After installing all software and dependencies, start the JupyterHub service:

sudo jupyterhub --port 80

The benefit of having JupyterHub listen on port 80 instead of the default port 8000 should be obvious, but it requires that you start the service as root. In addition you might want to look into securing JupyterHub with an SSL certificate and have it listen on port 443, since it asks for passwords when users log in. When you have the necessary certificate and keys on the VM, the service can be started like this instead:

sudo jupyterhub --port 443 --ssl-key hub.key --ssl-cert hub.pem

Now try to open the JupyterHub login page on the floating IP address of the VM and log in. After login you should be greeted with an empty home directory with no notebooks. A new notebook can be created by clicking “New” on the right above the notebook list.

Jupyter - notebook list empty

If you create a new notebook, you’ll notice that the only supported kernel is Python3 at the moment. We need to add PySpark to that list to be able to use the Spark cluster from Jupyter.

Configuring Jupyter for PySpark

Jupyter relies on kernels to execute code. The default kernel is Python, but many other languages can be added. To use the Spark cluster from Jupyter we add a separate kernel called PySpark. In addition, kernels can run specific commands on startup, which in this case is used to initialize the SparkContext. First, some dependencies need to be installed:

sudo apt-get -y install python-dev python-setuptools
sudo easy_install pip
sudo pip install py4j
sudo pip install "ipython[notebook]"

It might seem odd to install ipython[notebook] as a dependency, but the reason is that IPython/Jupyter contains a number of Python support modules that kernels rely on. Previously when we installed using pip3, we got the Python3 versions of those modules. When installing again with pip, we get Python2 versions. PySpark depends on Python2.

To add PySpark as a kernel, a file containing a kernel definition must be created. Kernel definitions are JSON files in a specific directory. Kernels can either be enabled globally for all users or for one user only, depending on where the definition file is placed. We want the PySpark kernel to be available for all users, so we’ll add it under /usr/local/share/jupyter/kernels/ like this:

sudo mkdir -p /usr/local/share/jupyter/kernels/pyspark/
cat <<EOF | sudo tee /usr/local/share/jupyter/kernels/pyspark/kernel.json
{
 "display_name": "PySpark",
 "language": "python",
 "argv": [
  "/usr/bin/python2",
  "-m",
  "ipykernel",
  "-f",
  "{connection_file}"
 ],
 "env": {
  "SPARK_HOME": "/opt/spark/",
  "PYTHONPATH": "/opt/spark/python/:/opt/spark/python/lib/py4j-0.8.2.1-src.zip",
  "PYTHONSTARTUP": "/opt/spark/python/pyspark/shell.py",
  "PYSPARK_SUBMIT_ARGS": "--master spark://10.20.30.178:7077 pyspark-shell"
 }
}
EOF

This kernel definition ensures that the Spark built-in “pyspark-shell” is started under the hood as the process where our code will be executed. Notice how the address to the Spark cluster, “spark://10.20.30.178:7077”, is sent as an argument. Remember to customize that address to your specific environment. The address references the Spark master VM (the same host as Jupyter runs on), but could just as easily reference an external host. For instance if you wanted to setup Jupyter on a separate OpenStack VM, or if you already have a Spark cluster running somewhere else that you want to connect to. The Spark master UI shows the right URL to use to connect, right below the Spark logo. Note that the Spark workers depend on being able to establish connections back to the host where the driver process runs (the Jupyter notebook), which may not be possible depending on the firewall setup when connecting to a remote Spark cluster. This is the reason a firewall rule allowing all traffic on the local network (10.20.30.0/24 in my case) is added by Cloud-init on all the Spark VMs.

After adding the kernel definition file for PySpark you’ll have to refresh the Jupyter homepage to see the new kernel in the list. No need to restart JupyterHub.

Debugging PySpark startup errors

If you get an error message about “Dead kernel” when creating new notebooks with the PySpark kernel, there might be several causes. For instance the VM running Jupyter might not be able to connect to the Spark cluster. Or it might lack some dependencies (packages/modules) to initialize the SparkContext. To debug kernel startup issues, first check the output from JupyterHub in the terminal where you started it (might be smart to keep that terminal open until everything works as expected). JupyterHub will for example output log lines like this when a Python module is missing:

[I 2015-09-20 20:31:24.276 ubuntu kernelmanager:85] Kernel started: 8a0b760d-357a-4507-a18b-da4bebd09e3f
/usr/bin/python2: No module named ipykernel
[I 2015-09-20 20:31:27.278 ubuntu restarter:103] KernelRestarter: restarting kernel (1/5)
/usr/bin/python2: No module named ipykernel
...
[W 2015-09-20 20:31:39.295 ubuntu restarter:95] KernelRestarter: restart failed
[W 2015-09-20 20:31:39.295 ubuntu kernelmanager:52] Kernel 8a0b760d-357a-4507-a18b-da4bebd09e3f died, removing from map.
ERROR:root:kernel 8a0b760d-357a-4507-a18b-da4bebd09e3f restarted failed!
[W 2015-09-20 20:31:39.406 ubuntu handlers:441] Kernel deleted before session

This error occured because I hadn’t installed ipython[notebook] using pip yet, so the Python2 modules needed by PySpark were not available. Notice how the error message states that it is /usr/bin/python2 that reports the error. Jupyter tries to restart the kernel a total of five times, but hits the same error every time and eventually gives up. In the notebook UI this is shown as a “Dead kernel” message.

Other errors can pop up in the Spark logs on master or worker nodes. Spark logs to /opt/spark/logs, so have a look there if anything is malfunctioning. The Spark master node logs every new application that is started on the Spark cluster, so if you don’t see output there when opening a new notebook with the PySpark profile, something is not right.

One last debugging tip is to try to start the PySpark shell from Bash on the VM where Jupyter runs. It is useful to inspect what happens when the PySpark shell starts. Here is an example of output when a dependency is missing:

$ python2 /opt/spark/python/pyspark/shell.py
Traceback (most recent call last):
 File "/opt/spark/python/pyspark/shell.py", line 28, in <module>
 import py4j
ImportError: No module named py4j

Remember to use Python2 when starting the shell. The above command mimics what Jupyter does behind the scenes when a new notebook is created.

Ready to use PySpark in Jupyter

If everything went according to plan, you now have a Spark cluster which you can easily use from Jupyter notebooks just by creating them with the PySpark profile 🙂 The variable “sc” is initialized as a SparkContext connected to the Spark cluster and you can start exploring the rich Spark API for data transformation and analysis. Here’s a screenshot from a notebook where I extracted responsetime numbers from Varnish NCSA logs (web cache server logs) and computed common statistics like mean and standard deviation for the responsetime of each backend in use by the cache server:

Example use of PySpark in Jupyter

~ Arne ~

Spark – How to fix “WARN TaskSchedulerImpl: Initial job has not accepted any resources”

Apache Spark and Firewalls

When setting up Apache Spark on your own cluster, in my case on OpenStack VMs, a common pitfall is the following error message:

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

This error can pop up in the log output of the interactive Python Spark shell or Jupyter (formerly IPython Notebook) after starting a PySpark session and trying to perform any kind of Spark action (like .count() or .take() on a RDD), rendering PySpark unusable.

As the error message suggests, I investigated resource shortages first. The Spark Master UI reported that my PySpark shell had allocated all the available CPU cores and a small portion of the available memory. I therefore lowered the number of CPU cores for each Spark application on the cluster, by adding the following line in spark-env.sh on the master node and restarting the master:

SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=4"

After this change my PySpark shell was limited to 4 CPU cores of the 16 CPU cores in my cluster at that time, instead of reserving all available cores (the default setting). However, even though the Spark UI now reported there would be enough free CPU cores and memory to actually run some Spark actions, the error message still popped up and no Spark actions would execute.

While debugging this issue, I came across a Spark-user mailing list post by Marcelo Vanzin of Cloudera where he outlines two possible causes for this particular error:

"...
- You're requesting more resources than the master has available, so
your executors are not starting. Given your explanation this doesn't
seem to be the case.

- The executors are starting, but are having problems connecting 
back to the driver. In this case, you should be able to see 
errors in each executor's log file.
..."

The second of these was causing this error in my case. The host firewall on the host where I ran my PySpark shell rejected the connection attempts back from the worker nodes. After allowing all traffic between all nodes involved, the problem was resolved! The driver host was another VM in the same OpenStack project, so allowing all traffic between the VMs in the same project was OK to do security-wise.

The error message is not particularly useful in the case where executors are unable to connect back to the driver. If you encounter the same error message, remember to check firewall logs from all involved firewalls (host and/or network firewalls).

On a side note, this requirement of Spark to connect back from executors to the driver makes it harder to set up a Spark cluster in a secure way. Unless the driver is in the same security zone as the Spark cluster, it may not be possible to allow the Spark cluster workers to establish connections to the driver host on arbitrary ports. Hopefully the Apache Spark project will address this limitation in a future release, by making sure all necessary connections are established by the driver (client host) only.

~ Arne ~

Getting Started with SocketPlane and Docker on OpenStack VMs

With all the well-deserved attention Docker gets these days, the networking aspects of Docker become increasingly important. As many have pointed out already, Docker itself has somewhat limited networking options. Several projects exist to fix this; SocketPlane, Weave, Flannel by CoreOS and Kubernetes (which is an entire container orchestration solution). Docker recently acquired SocketPlane to become part of Docker itself, both to gain better native networking options and to get help building the networking APIs necessary for other network solutions to plug into Docker.

In this post, I’ll show how to deploy and use SocketPlane on OpenStack VMs. This is based on the technology preview of SocketPlane available on Github, which I’ll deploy on Ubuntu 14.04 Trusty VMs.

Launch the first VM and bootstrap the cluster

As SocketPlane is a cluster solution with automatic leader election, all nodes in the cluster are equal and run the same services. However, the first node has to be told to bootstrap the cluster. With at least one node running, new nodes automatically join the cluster when they start up.

To get the first node to download SocketPlane, install the software including all dependencies, and bootstrap the cluster, create a Cloud-init script like this:

cat > socketplane-first.sh <<EOF
#!/bin/bash
curl -sSL http://get.socketplane.io/ | sudo BOOTSTRAP=true sh
sudo socketplane cluster bind eth0
EOF

Start the first node and wait for the SocketPlane bootstrap to complete before starting more nodes (it takes a while, so grab a cup of coffee):

$ nova boot --flavor m1.medium --image "Ubuntu CI trusty 2014-09-22" --key-name arnes --user-data socketplane-first.sh --nic net-id=df7cc182-8794-4134-b700-1fb8f1fbf070 socketplane1
$ nova floating-ip-associate socketplane1 10.0.1.244

You have to customize the flavor, image, key-name, net-id and floating IP to suit your OpenStack environment before running these commands. I attach a floating IP to the node to be able to log into it and interact with SocketPlane. If you want to watch the progress of Cloud-init, you can now tail the output logs via SSH like this:

$ ssh ubuntu@10.0.1.244 "tail -f /var/log/cloud-init*"
Warning: Permanently added '10.0.1.244' (ECDSA) to the list of known hosts.
==> /var/log/cloud-init.log <==
Mar 4 18:20:16 socketplane1 [CLOUDINIT] util.py[DEBUG]: Writing to /var/lib/cloud/instances/4e158f82-c5d8-4629-b7dc-2c1fbbe5f9f2/sem/config_scripts_vendor - wb: [420] 20 bytes
Mar 4 18:20:16 socketplane1 [CLOUDINIT] helpers.py[DEBUG]: Running config-scripts-vendor using lock (<FileLock using file '/var/lib/cloud/instances/4e158f82-c5d8-4629-b7dc-2c1fbbe5f9f2/sem/config_scripts_vendor'>)
...
Mar 4 18:20:16 socketplane1 [CLOUDINIT] util.py[DEBUG]: Running command ['/var/lib/cloud/instance/scripts/part-001'] with allowed return codes [0] (shell=False, capture=False)

==> /var/log/cloud-init-output.log <==
511136ea3c5a: Pulling fs layer
511136ea3c5a: Download complete
8771fbfe935c: Pulling metadata
8771fbfe935c: Pulling fs layer
8771fbfe935c: Download complete
0e30e84e9513: Pulling metadata
...

As you can see from the output above, the SocketPlane setup script is busy fetching the Docker images for the dependencies of SocketPlane and the SocketPlane agent itself. When the bootstrapping is done, the output will look like this:

7c5e9d5231cf: Download complete
7c5e9d5231cf: Download complete
Status: Downloaded newer image for clusterhq/powerstrip:v0.0.1
Done!!!
Requesting SocketPlane to listen on eth0
Cloud-init v. 0.7.5 finished at Wed, 04 Mar 2015 18:25:54 +0000. Datasource DataSourceOpenStack [net,ver=2]. Up 348.19 seconds

The “Done!!!” line marks the end of the setup script downloaded from get.socketplane.io. The next line of output is from the “sudo socketplane cluster bind eth0” command I included in the Cloud-init script.

Important note about SocketPlane on OpenStack VMs

If you just follow the deployment instructions for a Non-Vagrant install / deploy in the SocketPlane README, you might run into an issue with the SocketPlane agent. The agent by default tries to autodetect the network interface to bind to, but that does not seem to work as expected when using OpenStack VMs. If you encounter this issue, the agent log will be full of messages like these:

$ sudo socketplane agent logs
INFO[0007] Identifying interface to bind ... Use --iface option for static binding
INFO[0015] Identifying interface to bind ... Use --iface option for static binding
INFO[0023] Identifying interface to bind ... Use --iface option for static binding
INFO[0031] Identifying interface to bind ... Use --iface option for static binding
INFO[0039] Identifying interface to bind ... Use --iface option for static binding
...

To resolve this issue you have to explicitly tell SocketPlane which network interface to use:

sudo socketplane cluster bind eth0

If you don’t, the SocketPlane setup process will be stuck and never complete. This step is required on all nodes in the cluster, since they follow the same setup process.

Check the SocketPlane agent logs

The “socketplane agent logs” CLI command is useful for checking the cluster state and to see what events have occured. After the initial setup process has finished, the output will look similar to this:

$ sudo socketplane agent logs
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
2015/03/04 18:25:54 consul.watch: Watch (type: nodes) errored: Get http://127.0.0.1:8500/v1/catalog/nodes: dial tcp 127.0.0.1:8500: connection refused, retry in 5s
==> Starting Consul agent RPC...
==> Consul agent running!
 Node name: 'socketplane1'
 Datacenter: 'dc1'
 Server: true (bootstrap: true)
 Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
 Cluster Addr: 10.20.30.161 (LAN: 8301, WAN: 8302)
 Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

 2015/03/04 18:25:54 [INFO] serf: EventMemberJoin: socketplane1 10.20.30.161
 2015/03/04 18:25:54 [INFO] serf: EventMemberJoin: socketplane1.dc1 10.20.30.161
 2015/03/04 18:25:54 [INFO] raft: Node at 10.20.30.161:8300 [Follower] entering Follower state
 2015/03/04 18:25:54 [INFO] consul: adding server socketplane1 (Addr: 10.20.30.161:8300) (DC: dc1)
 2015/03/04 18:25:54 [INFO] consul: adding server socketplane1.dc1 (Addr: 10.20.30.161:8300) (DC: dc1)
 2015/03/04 18:25:54 [ERR] agent: failed to sync remote state: No cluster leader
INFO[0111] Identifying interface to bind ... Use --iface option for static binding
INFO[0111] Binding to eth0
2015/03/04 18:25:55 watchForExistingRegisteredUpdates : 0
2015/03/04 18:25:55 key :
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Error starting agent: Failed to start Consul server: Failed to start RPC layer: listen tcp 10.20.30.161:8300: bind: address already in use
 2015/03/04 18:25:55 [ERR] http: Request /v1/catalog/nodes, error: No cluster leader
2015/03/04 18:25:55 consul.watch: Watch (type: nodes) errored: Unexpected response code: 500 (No cluster leader), retry in 5s
 2015/03/04 18:25:55 [WARN] raft: Heartbeat timeout reached, starting election
 2015/03/04 18:25:55 [INFO] raft: Node at 10.20.30.161:8300 [Candidate] entering Candidate state
 2015/03/04 18:25:55 [INFO] raft: Election won. Tally: 1
 2015/03/04 18:25:55 [INFO] raft: Node at 10.20.30.161:8300 [Leader] entering Leader state
 2015/03/04 18:25:55 [INFO] consul: cluster leadership acquired
 2015/03/04 18:25:55 [INFO] consul: New leader elected: socketplane1
 2015/03/04 18:25:55 [INFO] raft: Disabling EnableSingleNode (bootstrap)
 2015/03/04 18:25:55 [INFO] consul: member 'socketplane1' joined, marking health alive
 2015/03/04 18:25:56 [INFO] agent: Synced service 'consul'
INFO[0114] New Node joined the cluster : 10.20.30.161
2015/03/04 18:25:59 Status of Get 404 Not Found 404 for http://localhost:8500/v1/kv/ipam/10.1.0.0/16
2015/03/04 18:25:59 Updating KV pair for http://localhost:8500/v1/kv/ipam/10.1.0.0/16?cas=0 10.1.0.0/16 0
2015/03/04 18:25:59 Status of Get 404 Not Found 404 for http://localhost:8500/v1/kv/network/default
2015/03/04 18:25:59 Updating KV pair for http://localhost:8500/v1/kv/network/default?cas=0 default {"id":"default","subnet":"10.1.0.0/16","gateway":"10.1.0.1","vlan":1} 0
2015/03/04 18:25:59 Status of Get 404 Not Found 404 for http://localhost:8500/v1/kv/vlan/vlan
2015/03/04 18:25:59 Updating KV pair for http://localhost:8500/v1/kv/vlan/vlan?cas=0 vlan 0

SocketPlane uses Consul as a distributed key-value store for cluster configuration and cluster membership tracking. From the log output we can see that a Consul agent is started, the “socketplane1” host joins, a leader election is performed (which this single Consul agent obviously wins), and key-value pairs for the default subnet and network are created.

A note on the SocketPlane overlay network model

The real power of the SocketPlane solution lies in the overlay networks it creates. The overlay network spans all SocketPlane nodes in the cluster. SocketPlane uses VXLAN tunnels to encapsulate container traffic between nodes, so that several Docker containers running on different nodes can belong to the same virtual network and get IP addresses in the same subnet. This resembles the way OpenStack itself can use VXLAN to encapsulate traffic for a virtual tenant network that spans several physical compute hosts in the same cluster. Using SocketPlane on an OpenStack cluster which uses VXLAN (or GRE) means we use two layers of encapsulation, which is something to keep in mind if MTU and fragmentation issues occur.

Spin up more SocketPlane worker nodes

Of course we need some more nodes as workers in our SocketPlane cluster to make it a real cluster, so create another Cloud-init script for them to use:

cat > socketplane-node.sh <<EOF
#!/bin/bash
curl -sSL http://get.socketplane.io/ | sudo sh
sudo socketplane cluster bind eth0
EOF

This is almost identical to the first Cloud-init script, just without the BOOTSTRAP=true environment variable.

Spin up a couple more nodes:

$ nova boot --flavor m1.medium --image "Ubuntu CI trusty 2014-09-22" --key-name arnes --user-data socketplane-node.sh --nic net-id=df7cc182-8794-4134-b700-1fb8f1fbf070 socketplane2
$ nova boot --flavor m1.medium --image "Ubuntu CI trusty 2014-09-22" --key-name arnes --user-data socketplane-node.sh --nic net-id=df7cc182-8794-4134-b700-1fb8f1fbf070 socketplane3

Watch the agent log from the first node in realtime with the -f flag (just like with “tail”) to validate that the nodes join the cluster as they are supposed to:

$ sudo socketplane agent logs -f
2015/03/04 19:10:42 New Bonjour Member : socketplane2, _docker._cluster, local, 10.20.30.162
INFO[6398] New Member Added : 10.20.30.162
 2015/03/04 19:10:42 [INFO] agent.rpc: Accepted client: 127.0.0.1:57766
 2015/03/04 19:10:42 [INFO] agent: (LAN) joining: [10.20.30.162]
 2015/03/04 19:10:42 [INFO] serf: EventMemberJoin: socketplane2 10.20.30.162
 2015/03/04 19:10:42 [INFO] agent: (LAN) joined: 1 Err: <nil>
 2015/03/04 19:10:42 [INFO] consul: member 'socketplane2' joined, marking health alive
Successfully joined cluster by contacting 1 nodes.
INFO[6398] New Node joined the cluster : 10.20.30.162

2015/03/04 19:10:54 New Bonjour Member : socketplane3, _docker._cluster, local, 10.20.30.163
INFO[6409] New Member Added : 10.20.30.163
 2015/03/04 19:10:54 [INFO] agent.rpc: Accepted client: 127.0.0.1:57769
 2015/03/04 19:10:54 [INFO] agent: (LAN) joining: [10.20.30.163]
 2015/03/04 19:10:54 [INFO] serf: EventMemberJoin: socketplane3 10.20.30.163
 2015/03/04 19:10:54 [INFO] agent: (LAN) joined: 1 Err: <nil>
 2015/03/04 19:10:54 [INFO] consul: member 'socketplane3' joined, marking health alive
Successfully joined cluster by contacting 1 nodes.
INFO[6409] New Node joined the cluster : 10.20.30.163

The nodes joined the cluster as expected, with no need to actually SSH into the VMs and run any CLI commands, since Cloud-init took care of the entire setup process. As you may have noted I didn’t allocate any floating IP to the new worker VMs, since I don’t need access to them directly. All the VMs run in the same OpenStack virtual tenant network and are able to communicate internally on that subnet (10.20.30.0/24 in my case).

Create a virtual network and launch the first container

To test the new SocketPlane cluster, first create a new virtual network “net1” with an address range you choose yourself:

$ sudo socketplane network create net1 10.100.0.0/24
{
 "gateway": "10.100.0.1",
 "id": "net1",
 "subnet": "10.100.0.0/24",
 "vlan": 2
}

Now you should have two SocketPlane networks, the default and the new one you just created:

$ sudo socketplane network list
[
 {
 "gateway": "10.1.0.1",
 "id": "default",
 "subnet": "10.1.0.0/16",
 "vlan": 1
 },
 {
 "gateway": "10.100.0.1",
 "id": "net1",
 "subnet": "10.100.0.0/24",
 "vlan": 2
 }
]

Now, launch a container on the virtual “net1” network:

$ sudo socketplane run -n net1 -it ubuntu /bin/bash
Unable to find image 'ubuntu:latest' locally
fa4fd76b09ce: Pulling fs layer
1c8294cc5160: Pulling fs layer
...
2d24f826cb16: Download complete
Status: Downloaded newer image for ubuntu:latest
root@4e06413f421c:/#

The “-n net1” option tells SocketPlane what virtual network to use. The container is automatically assigned a free IP address from the IP address range you chose. I started an Ubuntu container running Bash as an example. You can start any Docker image you want, as all arguments after “-n net1” are passed directly to the “docker run” command which SocketPlane wraps.

The beauty of SocketPlane is that you don’t have to do any port mapping or linking for containers to be able to communicate with other containers. They behave just like VMs launched on a virtual OpenStack network and have access to other containers on the same network, in addition to resources outside the cluster:

root@4e06413f421c:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
...
8: ovsaa22ac2: <BROADCAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default
 link/ether 02:42:0a:64:00:02 brd ff:ff:ff:ff:ff:ff
 inet 10.100.0.2/24 scope global ovsaa22ac2
 valid_lft forever preferred_lft forever
 inet6 fe80::ace8:d3ff:fe4a:ecfc/64 scope link
 valid_lft forever preferred_lft forever

root@4e06413f421c:/# ping -c 1 10.100.0.1
PING 10.100.0.1 (10.100.0.1) 56(84) bytes of data.
64 bytes from 10.100.0.1: icmp_seq=1 ttl=64 time=0.043 ms

root@4e06413f421c:/# ping -c 1 arnesund.com
PING arnesund.com (192.0.78.24) 56(84) bytes of data.
64 bytes from 192.0.78.24: icmp_seq=1 ttl=51 time=29.7 ms

Multiple containers on the same virtual network

Keep the previous window open to keep the container running and SSH to the first SocketPlane node again. Then launch another container on the same virtual network and ping the first container to verify connectivity:

$ sudo socketplane run -n net1 -it ubuntu /bin/bash
$ root@7c30071dbab4:/# ip addr | grep 10.100
 inet 10.100.0.3/24 scope global ovs658b61c
$ root@7c30071dbab4:/# ping 10.100.0.2
PING 10.100.0.2 (10.100.0.2) 56(84) bytes of data.
64 bytes from 10.100.0.2: icmp_seq=1 ttl=64 time=0.307 ms
64 bytes from 10.100.0.2: icmp_seq=2 ttl=64 time=0.057 ms

As expected, both containers see each other on the subnet they share and can communicate. However, both containers run on the first SocketPlane node in the cluster. To prove that this communication works also between different SocketPlane nodes, I’ll SSH from the first to the second node and start a new container there. To SSH between nodes I’ll use the private IP address of the second SocketPlane VM, since I didn’t allocate a floating IP to it:

ubuntu@socketplane1:~$ ssh 10.20.30.162
ubuntu@socketplane2:~$ sudo socketplane run -n net1 -it ubuntu /bin/bash
Unable to find image 'ubuntu:latest' locally
fa4fd76b09ce: Pulling fs layer
...
2d24f826cb16: Download complete
Status: Downloaded newer image for ubuntu:latest
root@bfde7387e160:/#

root@bfde7387e160:/# ip addr | grep 10.100
 inet 10.100.0.4/24 scope global ovs06e4b44

root@bfde7387e160:/# ping -c 1 10.100.0.2
PING 10.100.0.2 (10.100.0.2) 56(84) bytes of data.
64 bytes from 10.100.0.2: icmp_seq=1 ttl=64 time=1.53 ms

root@bfde7387e160:/# ping -c 1 10.100.0.3
PING 10.100.0.3 (10.100.0.3) 56(84) bytes of data.
64 bytes from 10.100.0.3: icmp_seq=1 ttl=64 time=1.47 ms

root@bfde7387e160:/# ping -c 1 arnesund.com
PING arnesund.com (192.0.78.25) 56(84) bytes of data.
64 bytes from 192.0.78.25: icmp_seq=1 ttl=51 time=30.2 ms

No trouble there, the new container on a different OpenStack VM can reach the other containers and also communicate with the outside world.

This concludes the Getting Started-tutorial for SocketPlane on OpenStack VMs. Please bear in mind that this is based on a technology preview of SocketPlane, which is bound to change as SocketPlane and Docker become more integrated in the months to come. I’m sure the addition of SocketPlane to Docker will bring great benefits to the Docker community as a whole!

~ Arne ~

How to configure Knife and Test Kitchen to use OpenStack

When developing Chef cookbooks, Knife and Test Kitchen (hereafter just “Kitchen”) are essential tools in the workflow. Both tools can be set up to use OpenStack to make it easy to create VMs for testing regardless of the capabilities of the workstation used. It’s great for testing some new recipe in a cookbook or making sure changes do not break existing cookbook functionality. This post will go through the configuration of both tools to ensure they use OpenStack instead of the default Vagrant drivers.

Install software and dependencies

First, it is necessary to install the software, plugins and dependencies. Let’s start with some basic packages:

sudo apt-get install ruby1.9 git
sudo apt-get install make autoconf gcc g++ zlib1g-dev bundler

Chef Development Kit

The Chef Development Kit is a collection of very useful tools for any cookbook developer. It includes tools like Knife, Kitchen, Berkshelf, Foodcritic, and more. Fetch download links for the current release from the Chef-DK download page and install it, for example like this for Ubuntu:

wget https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/12.04/x86_64/chefdk_0.4.0-1_amd64.deb
sudo dpkg -i chefdk_0.4.0-1_amd64.deb

Kitchen OpenStack driver

By default, Kitchen uses Vagrant as the driver to create virtual machines for running tests. To get OpenStack support, install the Kitchen OpenStack driver. The recommended way of installing it is to add the Ruby gem to the Gemfile in your cookbook and use Bundler to install it:

echo 'gem "kitchen-openstack"' >> Gemfile
sudo bundle

Knife OpenStack plugin

With the OpenStack plugin Knife is able to create new OpenStack VMs and bootstrap them as nodes on your Chef server. It can also list VMs and delete VMs. Install the plugin with:

gem install knife-openstack

OpenStack command line clients

The command line clients for OpenStack are very useful for checking values like image IDs, Neutron networks and so on. In addition, they offer one-line access to actions like creating new VMs, allocating new floating IPs and more. Install the clients with:

sudo apt-get install python-novaclient python-neutronclient python-glanceclient

Configure Knife to use OpenStack

After installing the plugin to get OpenStack support for Knife, you need to append some lines to the Knife config file “~/.chef/knife.rb”:

cat >> ~/.chef/knife.rb <<EOF
# Knife OpenStack plugin setup
knife[:openstack_auth_url] = "#{ENV['OS_AUTH_URL']}/tokens"
knife[:openstack_username] = "#{ENV['OS_USERNAME']}"
knife[:openstack_password] = "#{ENV['OS_PASSWORD']}"
knife[:openstack_tenant] = "#{ENV['OS_TENANT_NAME']}"
EOF

What these lines do is to instruct Knife to use the contents of environment variables to authenticate with OpenStack when needed. The environment variables are the ones you get when you source the OpenStack RC file of your project. The RC file can be downloaded from the OpenStack web UI (Horizon) by navigating to Access & Security -> API Access -> Download OpenStack RC file. Sourcing the file makes sure the environment variables are part of the current shell environment, and is done like this (for an RC file called “openstack-rc.sh”):

$ . openstack-rc.sh

With this config in place Knife now has the power to create new OpenStack VMs in your project, list all active VMs and destroy VMs. In addition, it can be used to list available images, flavors and networks in OpenStack. I do however prefer to use the native OpenStack clients (glance, nova, neutron) for that, since they can perform lots of other valuable tasks like creating new networks and so on.

Below is an example of VM creation with Knife, using some of the required and optional arguments to the command. Issue “knife openstack server create –help” to get all available arguments. As a quick summary, the arguments I give Knife are the requested hostname of the server, the flavor (3 = m1.medium in my cluster), image ID of a CentOS 7 image, network ID, SSH key name and the default user account used by the image (“centos”).

With the “–openstack-floating-ip” argument I tell Knife to allocate a floating IP to the new server. I could have specified a specific floating IP after that argument, which would have been allocated to the new server whether it was in use before or not. The only requirement is that it must be allocated to my OpenStack project before I try to use it.

$ knife openstack server create -N test-server -f 3 -I b206baa3-3a80-41cf-9850-49021b8bb3c1 --network-ids df7cc182-8794-4134-b700-1fb8f1fbf070 --openstack-ssh-key-id arnes --ssh-user centos --openstack-floating-ip --no-host-key-verify

Waiting for server [wait time = 600].........................
Instance ID 13493d82-8dc2-4b1d-87e8-3eeefa8defe2
Name test-server
Flavor 3
Image b206baa3-3a80-41cf-9850-49021b8bb3c1
Keypair arnes
State ACTIVE
Availability Zone nova
Floating IP Address: 10.0.1.242
Bootstrapping the server by using bootstrap_protocol: ssh and image_os_type: linux

Waiting for sshd to host (10.0.1.242)....done
Connecting to 10.0.1.242
10.0.1.242 Installing Chef Client...
10.0.1.242 Downloading Chef 11 for el...
10.0.1.242 Installing Chef 11
10.0.1.242 Thank you for installing Chef!
10.0.1.242 Starting first Chef Client run...
...
10.0.1.242 Running handlers complete
10.0.1.242 Chef Client finished, 0/0 resources updated in 1.328282722 seconds
Instance ID 13493d82-8dc2-4b1d-87e8-3eeefa8defe2
Name test-server
Public IP 10.0.1.242
Flavor 3
Image b206baa3-3a80-41cf-9850-49021b8bb3c1
Keypair arnes
State ACTIVE
Availability Zone nova

As an added benefit of creating VMs this way, they are automatically bootstrapped as Chef nodes with your Chef server!

Configure Kitchen to use OpenStack

Kitchen has a config file “~/.kitchen/config.yml” where all the config required to use OpenStack should be placed. The config file is “global”, meaning it’s not part of any cookbook or Chef repository. The advantage of using the global config file is that the Kitchen config in each cookbook is reduced to just one line, which is good since that Kitchen config is commonly committed to the cookbook repository and shared with other developers. Other developers may not have access to the same OpenStack environment as you, so their Kitchen OpenStack config will differ from yours.

Run the following commands to initialize the necessary config for Kitchen:

mkdir ~/.kitchen
cat >> ~/.kitchen/config.yml <<EOF
---
driver:
 name: openstack
 openstack_username: <%= ENV['OS_USERNAME'] %>
 openstack_api_key: <%= ENV['OS_PASSWORD'] %>
 openstack_auth_url: <%= "#{ENV['OS_AUTH_URL']}/tokens" %>
 openstack_tenant: <%= ENV['OS_TENANT_NAME'] %>
 require_chef_omnibus: true
 image_ref: CentOS 7 GC 2014-09-16
 username: centos
 flavor_ref: m1.medium
 key_name: <%= ENV['OS_USERNAME'] %>
 floating_ip_pool: public
 network_ref:
 - net1
 no_ssh_tcp_check: true
 no_ssh_tcp_check_sleep: 30
EOF

There is quite a bit of config going on here, so I’ll go through some of the most important parts. Many of the configuration options rely on environment variables which are set when you source the OpenStack RC file, just like for Knife. In addition, the following options may need to be customized according to your OpenStack environment:

  • image_ref: The name of a valid image to use when creating VMs
  • username: The username used by the chosen image, in this case “centos”
  • flavor_ref: A valid name of a flavor to use when creating VMs
  • key_name: Must match the name of your SSH key in OpenStack, here it is set to equal your username
  • floating_ip_pool: The name of a valid pool of public IP addresses
  • network_ref: A list of existing networks to connect new VMs to

To determine the correct values for image, flavor and network above, use the command line OpenStack clients. The Glance client can output a list of valid images to choose from:

$ glance image-list
+--------------------------------------+------------------------+-----+------+------------+--------+
| ID                                   | Name     | Disk Format | Container Format | Size | Status |
+--------------------------------------+------------------------+-----+------+------------+--------+
| ee2cc71b-3e2e-4b11-b327-f9cbf73a5694 | CentOS 6 GC 14-11-12   | raw | bare | 8589934592 | active |
| b206baa3-3a80-41cf-9850-49021b8bb3c1 | CentOS 7 GC 2014-09-16 | raw | bare | 8589934592 | active |
...

Set the image_ref in the Kitchen config to either the ID, the name or a regex matching the name.

Correspondingly, find the allowed flavors with the Nova client:

$ nova flavor-list
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+
| ID | Name      | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+
|  2 | m1.small  |    2048   |  20  |     0     |      |   1   |     1.0     |    True   |
|  3 | m1.medium |    4096   |  40  |     0     |      |   2   |     1.0     |    True   |
...

The network names are available using the neutron client. However, if you haven’t created any networks yet, you can create a network, subnet and router like this:

neutron net-create net1
neutron subnet-create --name subnet1 net1 10.0.0.0/24
neutron router-create gw
neutron router-gateway-set gw public
neutron router-interface-add gw subnet1

These commands assume that the external network in your OpenStack cluster is named “public”. Assuming the commands complete successfully you may use the network name “net1” in the Kitchen config file. To get the list of available networks, use the Neutron client with the net-list subcommand:

$ neutron net-list
+--------------------------------------+--------+----------------------------------------------------+
| id                                   | name   | subnets |
+--------------------------------------+--------+----------------------------------------------------+
| 2d2b2336-d7b6-4adc-b7f2-c92f98d4ec58 | public | 5ac43f4f-476f-4513-8f6b-67a758aa56e7 |
| e9dcbda9-cded-4823-a9fe-b03aadf33346 | net1   | 8ba65517-9bf5-46cc-a392-03a0708cd7f3 10.0.0.0/24 |
+--------------------------------------+--------+----------------------------------------------------+

With all that configured, Kitchen is ready to use OpenStack as the driver instead of Vagrant. All you need to do in a cookbook to make Kitchen use the OpenStack driver, is to change the “driver” statement in the “.kitchen.yml” config file in the cookbook root directory from “vagrant” to “openstack”:

---
driver:
 name: openstack

So, lets take it for a spin:

$ kitchen create
-----> Starting Kitchen (v1.2.1)
-----> Creating <default-ubuntu-1404>...
 OpenStack instance <c08688f6-a754-4f43-a365-898a38fc06f8> created.
.........................
(server ready)
 Attaching floating IP from <public> pool
 Attaching floating IP <10.0.1.243>
 Waiting for 10.0.1.243:22...
 Waiting for 10.0.1.243:22...
 Waiting for 10.0.1.243:22...
 (ssh ready)
 Using OpenStack keypair <arnes>
 Using public SSH key <~/.ssh/id_rsa.pub>
 Using private SSH key <~/.ssh/id_rsa>
 Adding OpenStack hint for ohai
net.ssh.transport.server_version[3fe8926c1320]
net.ssh.transport.algorithms[3fe8926c06b4]
net.ssh.connection.session[3fe89270b420]
net.ssh.connection.channel[3fe89270b2cc]
Finished creating <default-ubuntu-1404> (0m50.68s).
-----> Kitchen is finished. (0m52.22s)

Voilà 🙂

How to Use Cloud-init to Customize New OpenStack VMs

When creating a new instance (VM) on OpenStack with one of the standard Ubuntu Cloud images, the next step is typically to install packages and configure applications. Instead of doing that manually every time, OpenStack enables automatic setup of new instances using Cloud-init. Cloud-init runs on first boot of every new instance and initializes it according to a provided script or config file. The functionality is part of the Ubuntu image and works the same way regardless of the cloud provider used (Amazon, RackSpace, private OpenStack cloud). Cloud-init is also available for other distributions as well.

Creating a customization script

Standard Bash script

Perhaps the easiest way to get started is to create a standard Bash script that Cloud-init runs on first boot. Here is a simple example to get Apache2 up and running:

$ cat > cloudinit.sh <<EOF
> #!/bin/bash
> apt-get update
> apt-get -y install apache2
> a2ensite 000-default
> EOF

This small script installs the Apache2 package and enables the default site. Of course, you’d likely need to do more configuration here before enabling the site, like an rsync of web content to document root and enabling TLS.

Launch a new web instance

Use the nova CLI command to launch an instance named “web1” and supply the filename of the customization script with the “–user-data” argument:

$ nova boot --flavor m1.medium --image "Ubuntu CI trusty 2014-09-22" --key-name arnes --user-data=cloudinit.sh web1
+----------+---------------+
| Property | Value         |
+----------+---------------+
| name     | web1          |
| flavor   | m1.medium (3) |
...

To access the instance from outside the cloud, allocate a new floating IP and associate it with the new instance:

$ nova floating-ip-create public
+------------+-----------+----------+--------+
| Ip         | Server Id | Fixed Ip | Pool   |
+------------+-----------+----------+--------+
| 10.99.1.71 |           | -        | public |
+------------+-----------+----------+--------+
$ nova floating-ip-associate web1 10.99.1.71

Results

The new web instance has Apache running right from the start, no manual steps needed:

Default Apache2 page

More Cloud-init options: Cloud-Config syntax

Cloud-init can do more than just run bash scripts. Using cloud-config syntax many different actions are possible. The documentation has many useful examples of cloud-config syntax to add user accounts, configure mount points, initialize the instance as a Chef/Puppet client and much more.

For example, the same Apache2 initialization as above can be done with the following cloud-config statements:

#cloud-config
packages:
 - apache2
runcmd:
 - [ a2ensite, "000-default" ]

Including scripts or config files

Including a script or config file from an external source is also possible. This can be useful if the config file is under revision control in Git. Including files is easy, just replace the script contents with an include statement and the URL:

#include
https://gist.githubusercontent.com/arnesund/7332e15c5eb9df8c55aa/raw/0bd63296980bb4d8bf33387cfdb2eb60b964490d/cloudinit.conf

The gist contains the same cloud-config statements as above, so the end result it the same.

Troubleshooting

Cloud-init logs messages to /var/log/cloud-init.log and in my tests even debug level messages were logged. In addition, Cloud-init records all console output from changes it performs to /var/log/cloud-init-output.log. That makes it easy to catch errors in the initialization scripts, like for instance when I omitted ‘-y’ to apt-get install and package installation failed:

The following NEW packages will be installed:
 apache2 apache2-bin apache2-data libapr1 libaprutil1 libaprutil1-dbd-sqlite3
 libaprutil1-ldap ssl-cert
0 upgraded, 8 newly installed, 0 to remove and 88 not upgraded.
Need to get 1284 kB of archives.
After this operation, 5342 kB of additional disk space will be used.
Do you want to continue? [Y/n] Abort.
/var/lib/cloud/instance/scripts/part-001: line 4: a2ensite: command not found
2015-02-05 09:59:56,943 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [127]
2015-02-05 09:59:56,944 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2015-02-05 09:59:56,945 - util.py[WARNING]: Running scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
Cloud-init v. 0.7.5 finished at Thu, 05 Feb 2015 09:59:56 +0000. Datasource DataSourceOpenStack [net,ver=2]. Up 22.14 seconds

The line “Do you want to continue? [Y/n] Abort.” is a clear indicator that apt-get install failed since it expected user input. Most CLI tools can be run without user input by just passing the correct options, like ‘-y’ to apt-get. After correcting that error, the output is as expected:

The following NEW packages will be installed:
 apache2 apache2-bin apache2-data libapr1 libaprutil1 libaprutil1-dbd-sqlite3
 libaprutil1-ldap ssl-cert
0 upgraded, 8 newly installed, 0 to remove and 88 not upgraded.
Need to get 1284 kB of archives.
After this operation, 5342 kB of additional disk space will be used.
Get:1 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty/main libapr1 amd64 1.5.0-1 [85.1 kB]
Get:2 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty/main libaprutil1 amd64 1.5.3-1 [76.4 kB]
...
Cloud-init v. 0.7.5 running 'modules:final' at Thu, 05 Feb 2015 12:35:49 +0000. Up 38.42 seconds.
Site 000-default already enabled
Cloud-init v. 0.7.5 finished at Thu, 05 Feb 2015 12:35:49 +0000. Datasource DataSourceOpenStack [net,ver=2]. Up 38.56 seconds

This also reveals that the command “a2ensite 000-default” is not needed since the default site is enabled already. However, it’s included here as an example of how to run shell commands using cloud-config statements.

Testing vs Production

Using Cloud-init to get new instances to the desired state is nice when testing and a necessary step when deploying production instances. In a production context, one would probably use Cloud-init to initialize the instance as a Chef or Puppet client. From there, Chef/Puppet takes over the configuration task and will make sure the instance is set up according to the desired role it should fill. Cloud-init makes the initial bootstrapping of the instance easy.

SSH Key Gotcha with Test Kitchen and OpenStack

When setting up Kitchen to use OpenStack as the provider instead of Vagrant, I encountered a puzzling authentication issue on creation of the instance. I had my public and private SSH key in ~/.ssh/ and they matched the SSH key stored in OpenStack and referenced in the Kitchen configuration. The creation of instances failed with the following error:

$ kitchen create
-----> Starting Kitchen (v1.2.1)
-----> Creating <default-ubuntu-1404>...
 OpenStack instance <88ef6616-04d3-4d0c-a631-8bb0d91a4c63> created.
....................
(server ready)
 Attaching floating IP from <public> pool
 Attaching floating IP <10.0.1.216>
 Waiting for 10.0.1.216:22...
 Waiting for 10.0.1.216:22...
 Waiting for 10.0.1.216:22...
 (ssh ready)
 Using OpenStack keypair <arnes>
 Using public SSH key <~/.ssh/id_rsa.pub>
 Using private SSH key <~/.ssh/id_rsa>
 Adding OpenStack hint for ohai
net.ssh.transport.server_version[3fd08462809c]
net.ssh.transport.algorithms[3fd0846382bc]
net.ssh.authentication.key_manager[3fd08466a064]
net.ssh.authentication.session[3fd08466a8d4]
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: Failed to complete #create action: [Authentication failed for user ubuntu@10.0.1.216]
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
>>>>>> Also try running `kitchen diagnose --all` for configuration

After some debugging and research my focus turned to the contents of the SSH key files. When generating my keys I originally used PuTTYGen on Windows and saved them in OpenSSH format in addition to PuTTY format. It was the OpenSSH-files I had copied to ~/.ssh/. The format of the public key file was:

---- BEGIN SSH2 PUBLIC KEY ----
Comment: <Comment>
<SSH-key-string>
---- END SSH2 PUBLIC KEY ----

I am most used to the one-line format for public keys used in the authorized_keys file, so I changed the contents of the key file to match the following format:

ssh-rsa <SSH-key-string> <Comment>

Luckily, that was enough for Test Kitchen to work as intended:

$ kitchen create
-----> Starting Kitchen (v1.2.1)
-----> Creating <default-ubuntu-1404>...
 OpenStack instance <c08688f6-a754-4f43-a365-898a38fc06f8> created.
.........................
(server ready)
 Attaching floating IP from <public> pool
 Attaching floating IP <10.0.1.243>
 Waiting for 10.0.1.243:22...
 Waiting for 10.0.1.243:22...
 Waiting for 10.0.1.243:22...
 (ssh ready)
 Using OpenStack keypair <arnes>
 Using public SSH key <~/.ssh/id_rsa.pub>
 Using private SSH key <~/.ssh/id_rsa>
 Adding OpenStack hint for ohai
net.ssh.transport.server_version[3fe8926c1320]
net.ssh.transport.algorithms[3fe8926c06b4]
net.ssh.connection.session[3fe89270b420]
net.ssh.connection.channel[3fe89270b2cc]
Finished creating <default-ubuntu-1404> (0m50.68s).
-----> Kitchen is finished. (0m52.22s)