How to configure Knife and Test Kitchen to use OpenStack

When developing Chef cookbooks, Knife and Test Kitchen (hereafter just “Kitchen”) are essential tools in the workflow. Both tools can be set up to use OpenStack to make it easy to create VMs for testing regardless of the capabilities of the workstation used. It’s great for testing some new recipe in a cookbook or making sure changes do not break existing cookbook functionality. This post will go through the configuration of both tools to ensure they use OpenStack instead of the default Vagrant drivers.

Install software and dependencies

First, it is necessary to install the software, plugins and dependencies. Let’s start with some basic packages:

sudo apt-get install ruby1.9 git
sudo apt-get install make autoconf gcc g++ zlib1g-dev bundler

Chef Development Kit

The Chef Development Kit is a collection of very useful tools for any cookbook developer. It includes tools like Knife, Kitchen, Berkshelf, Foodcritic, and more. Fetch download links for the current release from the Chef-DK download page and install it, for example like this for Ubuntu:

wget https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/12.04/x86_64/chefdk_0.4.0-1_amd64.deb
sudo dpkg -i chefdk_0.4.0-1_amd64.deb

Kitchen OpenStack driver

By default, Kitchen uses Vagrant as the driver to create virtual machines for running tests. To get OpenStack support, install the Kitchen OpenStack driver. The recommended way of installing it is to add the Ruby gem to the Gemfile in your cookbook and use Bundler to install it:

echo 'gem "kitchen-openstack"' >> Gemfile
sudo bundle

Knife OpenStack plugin

With the OpenStack plugin Knife is able to create new OpenStack VMs and bootstrap them as nodes on your Chef server. It can also list VMs and delete VMs. Install the plugin with:

gem install knife-openstack

OpenStack command line clients

The command line clients for OpenStack are very useful for checking values like image IDs, Neutron networks and so on. In addition, they offer one-line access to actions like creating new VMs, allocating new floating IPs and more. Install the clients with:

sudo apt-get install python-novaclient python-neutronclient python-glanceclient

Configure Knife to use OpenStack

After installing the plugin to get OpenStack support for Knife, you need to append some lines to the Knife config file “~/.chef/knife.rb”:

cat >> ~/.chef/knife.rb <<EOF
# Knife OpenStack plugin setup
knife[:openstack_auth_url] = "#{ENV['OS_AUTH_URL']}/tokens"
knife[:openstack_username] = "#{ENV['OS_USERNAME']}"
knife[:openstack_password] = "#{ENV['OS_PASSWORD']}"
knife[:openstack_tenant] = "#{ENV['OS_TENANT_NAME']}"
EOF

What these lines do is to instruct Knife to use the contents of environment variables to authenticate with OpenStack when needed. The environment variables are the ones you get when you source the OpenStack RC file of your project. The RC file can be downloaded from the OpenStack web UI (Horizon) by navigating to Access & Security -> API Access -> Download OpenStack RC file. Sourcing the file makes sure the environment variables are part of the current shell environment, and is done like this (for an RC file called “openstack-rc.sh”):

$ . openstack-rc.sh

With this config in place Knife now has the power to create new OpenStack VMs in your project, list all active VMs and destroy VMs. In addition, it can be used to list available images, flavors and networks in OpenStack. I do however prefer to use the native OpenStack clients (glance, nova, neutron) for that, since they can perform lots of other valuable tasks like creating new networks and so on.

Below is an example of VM creation with Knife, using some of the required and optional arguments to the command. Issue “knife openstack server create –help” to get all available arguments. As a quick summary, the arguments I give Knife are the requested hostname of the server, the flavor (3 = m1.medium in my cluster), image ID of a CentOS 7 image, network ID, SSH key name and the default user account used by the image (“centos”).

With the “–openstack-floating-ip” argument I tell Knife to allocate a floating IP to the new server. I could have specified a specific floating IP after that argument, which would have been allocated to the new server whether it was in use before or not. The only requirement is that it must be allocated to my OpenStack project before I try to use it.

$ knife openstack server create -N test-server -f 3 -I b206baa3-3a80-41cf-9850-49021b8bb3c1 --network-ids df7cc182-8794-4134-b700-1fb8f1fbf070 --openstack-ssh-key-id arnes --ssh-user centos --openstack-floating-ip --no-host-key-verify

Waiting for server [wait time = 600].........................
Instance ID 13493d82-8dc2-4b1d-87e8-3eeefa8defe2
Name test-server
Flavor 3
Image b206baa3-3a80-41cf-9850-49021b8bb3c1
Keypair arnes
State ACTIVE
Availability Zone nova
Floating IP Address: 10.0.1.242
Bootstrapping the server by using bootstrap_protocol: ssh and image_os_type: linux

Waiting for sshd to host (10.0.1.242)....done
Connecting to 10.0.1.242
10.0.1.242 Installing Chef Client...
10.0.1.242 Downloading Chef 11 for el...
10.0.1.242 Installing Chef 11
10.0.1.242 Thank you for installing Chef!
10.0.1.242 Starting first Chef Client run...
...
10.0.1.242 Running handlers complete
10.0.1.242 Chef Client finished, 0/0 resources updated in 1.328282722 seconds
Instance ID 13493d82-8dc2-4b1d-87e8-3eeefa8defe2
Name test-server
Public IP 10.0.1.242
Flavor 3
Image b206baa3-3a80-41cf-9850-49021b8bb3c1
Keypair arnes
State ACTIVE
Availability Zone nova

As an added benefit of creating VMs this way, they are automatically bootstrapped as Chef nodes with your Chef server!

Configure Kitchen to use OpenStack

Kitchen has a config file “~/.kitchen/config.yml” where all the config required to use OpenStack should be placed. The config file is “global”, meaning it’s not part of any cookbook or Chef repository. The advantage of using the global config file is that the Kitchen config in each cookbook is reduced to just one line, which is good since that Kitchen config is commonly committed to the cookbook repository and shared with other developers. Other developers may not have access to the same OpenStack environment as you, so their Kitchen OpenStack config will differ from yours.

Run the following commands to initialize the necessary config for Kitchen:

mkdir ~/.kitchen
cat >> ~/.kitchen/config.yml <<EOF
---
driver:
 name: openstack
 openstack_username: <%= ENV['OS_USERNAME'] %>
 openstack_api_key: <%= ENV['OS_PASSWORD'] %>
 openstack_auth_url: <%= "#{ENV['OS_AUTH_URL']}/tokens" %>
 openstack_tenant: <%= ENV['OS_TENANT_NAME'] %>
 require_chef_omnibus: true
 image_ref: CentOS 7 GC 2014-09-16
 username: centos
 flavor_ref: m1.medium
 key_name: <%= ENV['OS_USERNAME'] %>
 floating_ip_pool: public
 network_ref:
 - net1
 no_ssh_tcp_check: true
 no_ssh_tcp_check_sleep: 30
EOF

There is quite a bit of config going on here, so I’ll go through some of the most important parts. Many of the configuration options rely on environment variables which are set when you source the OpenStack RC file, just like for Knife. In addition, the following options may need to be customized according to your OpenStack environment:

  • image_ref: The name of a valid image to use when creating VMs
  • username: The username used by the chosen image, in this case “centos”
  • flavor_ref: A valid name of a flavor to use when creating VMs
  • key_name: Must match the name of your SSH key in OpenStack, here it is set to equal your username
  • floating_ip_pool: The name of a valid pool of public IP addresses
  • network_ref: A list of existing networks to connect new VMs to

To determine the correct values for image, flavor and network above, use the command line OpenStack clients. The Glance client can output a list of valid images to choose from:

$ glance image-list
+--------------------------------------+------------------------+-----+------+------------+--------+
| ID                                   | Name     | Disk Format | Container Format | Size | Status |
+--------------------------------------+------------------------+-----+------+------------+--------+
| ee2cc71b-3e2e-4b11-b327-f9cbf73a5694 | CentOS 6 GC 14-11-12   | raw | bare | 8589934592 | active |
| b206baa3-3a80-41cf-9850-49021b8bb3c1 | CentOS 7 GC 2014-09-16 | raw | bare | 8589934592 | active |
...

Set the image_ref in the Kitchen config to either the ID, the name or a regex matching the name.

Correspondingly, find the allowed flavors with the Nova client:

$ nova flavor-list
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+
| ID | Name      | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+
|  2 | m1.small  |    2048   |  20  |     0     |      |   1   |     1.0     |    True   |
|  3 | m1.medium |    4096   |  40  |     0     |      |   2   |     1.0     |    True   |
...

The network names are available using the neutron client. However, if you haven’t created any networks yet, you can create a network, subnet and router like this:

neutron net-create net1
neutron subnet-create --name subnet1 net1 10.0.0.0/24
neutron router-create gw
neutron router-gateway-set gw public
neutron router-interface-add gw subnet1

These commands assume that the external network in your OpenStack cluster is named “public”. Assuming the commands complete successfully you may use the network name “net1” in the Kitchen config file. To get the list of available networks, use the Neutron client with the net-list subcommand:

$ neutron net-list
+--------------------------------------+--------+----------------------------------------------------+
| id                                   | name   | subnets |
+--------------------------------------+--------+----------------------------------------------------+
| 2d2b2336-d7b6-4adc-b7f2-c92f98d4ec58 | public | 5ac43f4f-476f-4513-8f6b-67a758aa56e7 |
| e9dcbda9-cded-4823-a9fe-b03aadf33346 | net1   | 8ba65517-9bf5-46cc-a392-03a0708cd7f3 10.0.0.0/24 |
+--------------------------------------+--------+----------------------------------------------------+

With all that configured, Kitchen is ready to use OpenStack as the driver instead of Vagrant. All you need to do in a cookbook to make Kitchen use the OpenStack driver, is to change the “driver” statement in the “.kitchen.yml” config file in the cookbook root directory from “vagrant” to “openstack”:

---
driver:
 name: openstack

So, lets take it for a spin:

$ kitchen create
-----> Starting Kitchen (v1.2.1)
-----> Creating <default-ubuntu-1404>...
 OpenStack instance <c08688f6-a754-4f43-a365-898a38fc06f8> created.
.........................
(server ready)
 Attaching floating IP from <public> pool
 Attaching floating IP <10.0.1.243>
 Waiting for 10.0.1.243:22...
 Waiting for 10.0.1.243:22...
 Waiting for 10.0.1.243:22...
 (ssh ready)
 Using OpenStack keypair <arnes>
 Using public SSH key <~/.ssh/id_rsa.pub>
 Using private SSH key <~/.ssh/id_rsa>
 Adding OpenStack hint for ohai
net.ssh.transport.server_version[3fe8926c1320]
net.ssh.transport.algorithms[3fe8926c06b4]
net.ssh.connection.session[3fe89270b420]
net.ssh.connection.channel[3fe89270b2cc]
Finished creating <default-ubuntu-1404> (0m50.68s).
-----> Kitchen is finished. (0m52.22s)

Voilà 🙂

How to Use Cloud-init to Customize New OpenStack VMs

When creating a new instance (VM) on OpenStack with one of the standard Ubuntu Cloud images, the next step is typically to install packages and configure applications. Instead of doing that manually every time, OpenStack enables automatic setup of new instances using Cloud-init. Cloud-init runs on first boot of every new instance and initializes it according to a provided script or config file. The functionality is part of the Ubuntu image and works the same way regardless of the cloud provider used (Amazon, RackSpace, private OpenStack cloud). Cloud-init is also available for other distributions as well.

Creating a customization script

Standard Bash script

Perhaps the easiest way to get started is to create a standard Bash script that Cloud-init runs on first boot. Here is a simple example to get Apache2 up and running:

$ cat > cloudinit.sh <<EOF
> #!/bin/bash
> apt-get update
> apt-get -y install apache2
> a2ensite 000-default
> EOF

This small script installs the Apache2 package and enables the default site. Of course, you’d likely need to do more configuration here before enabling the site, like an rsync of web content to document root and enabling TLS.

Launch a new web instance

Use the nova CLI command to launch an instance named “web1” and supply the filename of the customization script with the “–user-data” argument:

$ nova boot --flavor m1.medium --image "Ubuntu CI trusty 2014-09-22" --key-name arnes --user-data=cloudinit.sh web1
+----------+---------------+
| Property | Value         |
+----------+---------------+
| name     | web1          |
| flavor   | m1.medium (3) |
...

To access the instance from outside the cloud, allocate a new floating IP and associate it with the new instance:

$ nova floating-ip-create public
+------------+-----------+----------+--------+
| Ip         | Server Id | Fixed Ip | Pool   |
+------------+-----------+----------+--------+
| 10.99.1.71 |           | -        | public |
+------------+-----------+----------+--------+
$ nova floating-ip-associate web1 10.99.1.71

Results

The new web instance has Apache running right from the start, no manual steps needed:

Default Apache2 page

More Cloud-init options: Cloud-Config syntax

Cloud-init can do more than just run bash scripts. Using cloud-config syntax many different actions are possible. The documentation has many useful examples of cloud-config syntax to add user accounts, configure mount points, initialize the instance as a Chef/Puppet client and much more.

For example, the same Apache2 initialization as above can be done with the following cloud-config statements:

#cloud-config
packages:
 - apache2
runcmd:
 - [ a2ensite, "000-default" ]

Including scripts or config files

Including a script or config file from an external source is also possible. This can be useful if the config file is under revision control in Git. Including files is easy, just replace the script contents with an include statement and the URL:

#include
https://gist.githubusercontent.com/arnesund/7332e15c5eb9df8c55aa/raw/0bd63296980bb4d8bf33387cfdb2eb60b964490d/cloudinit.conf

The gist contains the same cloud-config statements as above, so the end result it the same.

Troubleshooting

Cloud-init logs messages to /var/log/cloud-init.log and in my tests even debug level messages were logged. In addition, Cloud-init records all console output from changes it performs to /var/log/cloud-init-output.log. That makes it easy to catch errors in the initialization scripts, like for instance when I omitted ‘-y’ to apt-get install and package installation failed:

The following NEW packages will be installed:
 apache2 apache2-bin apache2-data libapr1 libaprutil1 libaprutil1-dbd-sqlite3
 libaprutil1-ldap ssl-cert
0 upgraded, 8 newly installed, 0 to remove and 88 not upgraded.
Need to get 1284 kB of archives.
After this operation, 5342 kB of additional disk space will be used.
Do you want to continue? [Y/n] Abort.
/var/lib/cloud/instance/scripts/part-001: line 4: a2ensite: command not found
2015-02-05 09:59:56,943 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [127]
2015-02-05 09:59:56,944 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2015-02-05 09:59:56,945 - util.py[WARNING]: Running scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
Cloud-init v. 0.7.5 finished at Thu, 05 Feb 2015 09:59:56 +0000. Datasource DataSourceOpenStack [net,ver=2]. Up 22.14 seconds

The line “Do you want to continue? [Y/n] Abort.” is a clear indicator that apt-get install failed since it expected user input. Most CLI tools can be run without user input by just passing the correct options, like ‘-y’ to apt-get. After correcting that error, the output is as expected:

The following NEW packages will be installed:
 apache2 apache2-bin apache2-data libapr1 libaprutil1 libaprutil1-dbd-sqlite3
 libaprutil1-ldap ssl-cert
0 upgraded, 8 newly installed, 0 to remove and 88 not upgraded.
Need to get 1284 kB of archives.
After this operation, 5342 kB of additional disk space will be used.
Get:1 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty/main libapr1 amd64 1.5.0-1 [85.1 kB]
Get:2 http://nova.clouds.archive.ubuntu.com/ubuntu/ trusty/main libaprutil1 amd64 1.5.3-1 [76.4 kB]
...
Cloud-init v. 0.7.5 running 'modules:final' at Thu, 05 Feb 2015 12:35:49 +0000. Up 38.42 seconds.
Site 000-default already enabled
Cloud-init v. 0.7.5 finished at Thu, 05 Feb 2015 12:35:49 +0000. Datasource DataSourceOpenStack [net,ver=2]. Up 38.56 seconds

This also reveals that the command “a2ensite 000-default” is not needed since the default site is enabled already. However, it’s included here as an example of how to run shell commands using cloud-config statements.

Testing vs Production

Using Cloud-init to get new instances to the desired state is nice when testing and a necessary step when deploying production instances. In a production context, one would probably use Cloud-init to initialize the instance as a Chef or Puppet client. From there, Chef/Puppet takes over the configuration task and will make sure the instance is set up according to the desired role it should fill. Cloud-init makes the initial bootstrapping of the instance easy.

New Skill Sets Needed for Network Specialists

I share a lot of the views presented by @netmanchris in his plan for technology areas to focus on and the follow-up post It Generalist or Network Specialist?. I started out in this field as a Networking Specialist focusing on the traditional areas like switching, routing and firewalls. However, over time the need for automation, scripting and data analysis popped up more and more in my day job, to be able to automate manual tasks and improve the quality of networking services like firewall rulesets.

I’ve written about a use case for data analysis in networking in a previous post. Another example I want to highlight is the need for automatically updating firewall object-groups with new IP addresses as DNS entries for remote services change. Not all firewalls support DNS names as destination in a firewall rule. When a server need access to a specific remote site, and that site changes IP addresses from time to time, the IP adresses in the firewall config need to change too. To facilitate this I implemented a solution using Python which keeps object-groups in the firewall config in sync with IP addresses in DNS replies.

So far, these examples are from the traditional networking field. There is a new kind of Networking in the making with the invention of SDN solutions and the pervasive virtualization of everything, not just servers. Almost everything in this new Networking has APIs to support automation and requires us to learn some programming to be able to implement efficient solutions.

Python has emerged as the language of choice to sysadmins and net admins. To be able to use Python efficiently, basic knowledge of relevant modules is key. For example Paramiko for using SSH in scripts and IPy to handle IP addresses and subnets easily. To interface with APIs there may be specific modules available like python-neutronclient for the OpenStack Neutron API, if not you can always use HTTP via Requests.

In addition to Python I see knowledge of Git, Chef/Puppet, hypervisors, Linux networking, OVS, OpenStack and so on as very important in this new infrastructure-as-code movement. The new networking engineer role will be a hybrid sysadmin/netadmin/devop role and brings with it opportunities to make every day at work even more interesting!

~ Arne ~

SSH Key Gotcha with Test Kitchen and OpenStack

When setting up Kitchen to use OpenStack as the provider instead of Vagrant, I encountered a puzzling authentication issue on creation of the instance. I had my public and private SSH key in ~/.ssh/ and they matched the SSH key stored in OpenStack and referenced in the Kitchen configuration. The creation of instances failed with the following error:

$ kitchen create
-----> Starting Kitchen (v1.2.1)
-----> Creating <default-ubuntu-1404>...
 OpenStack instance <88ef6616-04d3-4d0c-a631-8bb0d91a4c63> created.
....................
(server ready)
 Attaching floating IP from <public> pool
 Attaching floating IP <10.0.1.216>
 Waiting for 10.0.1.216:22...
 Waiting for 10.0.1.216:22...
 Waiting for 10.0.1.216:22...
 (ssh ready)
 Using OpenStack keypair <arnes>
 Using public SSH key <~/.ssh/id_rsa.pub>
 Using private SSH key <~/.ssh/id_rsa>
 Adding OpenStack hint for ohai
net.ssh.transport.server_version[3fd08462809c]
net.ssh.transport.algorithms[3fd0846382bc]
net.ssh.authentication.key_manager[3fd08466a064]
net.ssh.authentication.session[3fd08466a8d4]
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: Failed to complete #create action: [Authentication failed for user ubuntu@10.0.1.216]
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
>>>>>> Also try running `kitchen diagnose --all` for configuration

After some debugging and research my focus turned to the contents of the SSH key files. When generating my keys I originally used PuTTYGen on Windows and saved them in OpenSSH format in addition to PuTTY format. It was the OpenSSH-files I had copied to ~/.ssh/. The format of the public key file was:

---- BEGIN SSH2 PUBLIC KEY ----
Comment: <Comment>
<SSH-key-string>
---- END SSH2 PUBLIC KEY ----

I am most used to the one-line format for public keys used in the authorized_keys file, so I changed the contents of the key file to match the following format:

ssh-rsa <SSH-key-string> <Comment>

Luckily, that was enough for Test Kitchen to work as intended:

$ kitchen create
-----> Starting Kitchen (v1.2.1)
-----> Creating <default-ubuntu-1404>...
 OpenStack instance <c08688f6-a754-4f43-a365-898a38fc06f8> created.
.........................
(server ready)
 Attaching floating IP from <public> pool
 Attaching floating IP <10.0.1.243>
 Waiting for 10.0.1.243:22...
 Waiting for 10.0.1.243:22...
 Waiting for 10.0.1.243:22...
 (ssh ready)
 Using OpenStack keypair <arnes>
 Using public SSH key <~/.ssh/id_rsa.pub>
 Using private SSH key <~/.ssh/id_rsa>
 Adding OpenStack hint for ohai
net.ssh.transport.server_version[3fe8926c1320]
net.ssh.transport.algorithms[3fe8926c06b4]
net.ssh.connection.session[3fe89270b420]
net.ssh.connection.channel[3fe89270b2cc]
Finished creating <default-ubuntu-1404> (0m50.68s).
-----> Kitchen is finished. (0m52.22s)

How to Analyze a Firewall Ruleset with Hadoop

Ruleset Analysis is a tool for analyzing firewall log files to determine what firewall rules are in use and by what kind of traffic. The first release supports the Cisco ASA and FWSM firewalls. The analysis is built as Hadoop Streaming jobs since the log volume to analyze easily can reach hundreds of gigabytes or even terabytes for very active firewalls. To make useful results the logs analyzed must span a time period of at least a couple months, preferably six or twelve months. The analysis will tell you exactly what traffic was allowed by each of the firewall rules and when that traffic occurred.

A common use case for Ruleset Analysis is to use the insight produced to reduce the size of large firewall rulesets. Armed with knowledge about when a rule was last in use and by what traffic, it becomes easier to determine if the rule can be removed. Rules with no hits in the analyzed time span are also likely candidates for removal. In addition, Ruleset Analysis can be used to replace a generic rule with more specific rules. Traffic counters are often used to check what rules are in use, but I explained some of their shortcomings in my previous post.

Sample results

Here is an example of the output for each firewall rule:

fw01: access-list inside-in, rule 123: permit tcp 10.1.0.0/24 -> 0.0.0.0/0:[8080]
access-list inside-in extended permit tcp object-group inside-subnets any object-group Web
Total number of hits: 7
 COUNT PROTO  FROM IP       TO IP          PORT  FIRST SEEN           LAST SEEN          
     6  TCP   10.1.0.156    20.30.40.124   8080  2014-06-06 14:47:35  2014-06-06 15:17:01
     1  TCP   10.1.0.98     100.200.31.82  8080  2014-09-27 08:15:34  2014-09-27 08:15:34

This says that outbound access to websites on port 8080 got seven hits during the last year, but only from two distinct sources. An internal machine initiated six of those connections to one external server on port 8080 in half an hour on June 6th. All in all, this tells us that the rule is rarely in use and may be a candidate for removal.

The second line of the output shows the access-list entry in the original Cisco syntax. Note that Ruleset Analysis supports object-groups and will expand the list of objects in the object-group to create distinct rules. For instance, here it has expanded the object-group Web to the TCP port 8080 (and other ports not shown here). For each object in an object-group the preprocessor creates a distinct rule object, effectively expanding the object-group to separate objects. The benefit of this is that Ruleset Analysis is able to find out which objects in an object-group are in use and which are not, so objects not in use can be removed from the object-group (and therefore from the ruleset).

How to run the analysis on Hadoop

To be able to run the analysis you need the firewall config, log files and access to a Hadoop cluster.

Clone the repository from Github:

git clone https://github.com/arnesund/ruleset-analysis.git
cd ruleset-analysis

Preprocess the config file to extract access-lists and generate ACL objects:

./preprosess_access_lists.py -f FW.CONF

Submit the job to the Hadoop cluster with the path to the firewall log files in the Hadoop filesystem HDFS (wildcards allowed):

./runAnalysis.sh /HDFS-PATH/TO/LOG/FILES

The output from Hadoop Streaming is shown on the console:

arnes@hadoop01:~/ruleset-analysis$ ./runAnalysis.sh /data/fw01/*2014*
arnes@hadoop01:~/ruleset-analysis$ packageJobJar: [.//config.py, .//firewallrule.py, .//input/accesslists.db, .//name-number-mappings.db, .//mapper.py, .//connlist-reducer.py, /tmp/hadoop-arnes/hadoop-unjar8081511066204186990/] [] /tmp/streamjob7183564462078091113.jar tmpDir=null
15/01/04 11:24:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/04 11:24:56 WARN snappy.LoadSnappy: Snappy native library not loaded
15/01/04 11:24:57 INFO mapred.FileInputFormat: Total input paths to process : 365
15/01/04 11:24:57 INFO streaming.StreamJob: getLocalDirs(): [/data/1/mapred/local, /data/2/mapred/local, /data/3/mapred/local]
15/01/04 11:24:57 INFO streaming.StreamJob: Running job: job_201411291614_1372
15/01/04 11:24:57 INFO streaming.StreamJob: To kill this job, run:
15/01/04 11:24:57 INFO streaming.StreamJob: /usr/libexec/../bin/hadoop job  -Dmapred.job.tracker=hadoop01:8021 -kill job_201411291614_1372
15/01/04 11:24:57 INFO streaming.StreamJob: Tracking URL: http://hadoop01:50030/jobdetails.jsp?jobid=job_201411291614_1372
15/01/04 11:24:58 INFO streaming.StreamJob:  map 0%  reduce 0%
15/01/04 11:25:07 INFO streaming.StreamJob:  map 1%  reduce 0%
15/01/04 11:25:08 INFO streaming.StreamJob:  map 13%  reduce 0%
15/01/04 11:25:09 INFO streaming.StreamJob:  map 16%  reduce 0%
15/01/04 11:25:11 INFO streaming.StreamJob:  map 24%  reduce 0%
...
15/01/04 11:26:39 INFO streaming.StreamJob:  map 98%  reduce 29%
15/01/04 11:26:41 INFO streaming.StreamJob:  map 99%  reduce 30%
15/01/04 11:26:42 INFO streaming.StreamJob:  map 100%  reduce 30%
15/01/04 11:26:47 INFO streaming.StreamJob:  map 100%  reduce 33%
15/01/04 11:26:49 INFO streaming.StreamJob:  map 100%  reduce 67%
15/01/04 11:26:50 INFO streaming.StreamJob:  map 100%  reduce 100%
15/01/04 11:26:52 INFO streaming.StreamJob: Job complete: job_201411291614_1372
15/01/04 11:26:52 INFO streaming.StreamJob: Output: output-20150104-1124_RulesetAnalysis

Note the name of the output directory on the last line of output, “output-20150104-1124_RulesetAnalysis” in this example. You’ll use that to fetch the results from HDFS. Insert the name of the output directory in the variable below:

mkdir output; outputdir="OUTPUT_PATH_FROM_JOB_OUTPUT"
hadoop dfs -getmerge $outputdir output/$outputdir

With the job results now on disk, the last step is to run postprocessing to generate the final report and view it:

./postprocess_ruleset_analysis.py -f output/$outputdir > output/$outputdir-report.log
less output/$outputdir-report.log

Manually test the analysis on a small log volume

For small log volumes and trial runs, the analysis can be run with no Hadoop cluster (no parallellization), like this:

Clone the repository from Github, if you haven’t already:

git clone https://github.com/arnesund/ruleset-analysis.git
cd ruleset-analysis

Preprocess the config file to extract access-lists and generate ACL objects:

./preprosess_access_lists.py -f FW.CONF

Pipe the firewall log through the Python mapper and reducer manually:

cat FW.LOG | ./mapper.py | sort | ./reducer.py > results

Postprocess the results to generate the final ruleset report and take a look at it:

./postprocess_ruleset_analysis.py -f results > final_report
less final_report

How to get help and answers

If you encounter problems when running Ruleset Analysis, please register it at Github as an Issue. Pull requests are also very welcome.

For instructions on how to install the prerequisites required for the analysis to work (mostly Python modules), see the README at Github.

For generic questions about the analysis, leave a comment here or contact me on Twitter: @A_r_n_e.

Reducing the Size of Large Firewall Rulesets

After operating a set of firewalls for some years, the rulesets have grown to thousands of rules, each fulfilling a specific application need or some user demand. Firewalls don’t live forever, and the time came to replace the current firewall with a new, more powerful appliance from a different vendor. Changing vendors made migrating rules more difficult since the syntax was different. In addition, the conversion tool provided by the new vendor failed to utilize the powerful features of the new syntax. So it was decided to implement all the rules on the new firewall manually.

When faced with a big, manual task, my first question is How can we simplify this? One way to reduce the workload is to reduce the scope of work. In this case reduce the number of firewall rules that must be re-implemented. However, it is not easy to determine if a rule can be removed. Traffic counters give some information, but are typically reset on reboots. Before you go ahead and remove a rule you want to be sure no-one relied on that rule for, say, the last six or maybe twelve months. Any firmware upgrade in the last year could make traffic counters less valuable.

Among all the rules with positive counters there are almost certainly also rules that are no longer in use. The counters do not tell you when each hit occured in the period since last reset. However, the firewall logs contain that information. Configured for full audit logging, a firewall will tell you the exact pattern of the traffic that traversed the firewall. So parsing that information can potentially reveal what rules are in use and when.

A common issue with firewall rulesets is the generic rules. Those are the rules which are added when the application need is unclear or a deadline is rapidly approaching. The generic rules allow more than they should, and removing them requires in-depth knowledge and certainty about what the rule should have looked like. One way to get that knowledge is to inspect the audit logs in detail. Parsing all log entries it is possible to say with certainty what traffic was allowed by which rule. Armed with a list of all traffic matching a generic rule, it’s easier to replace that rule with specific rules or remove it entirely. How to parse logs to be able to do that is the topic of my next post.