Restarting network interfaces in Ansible

I’m using Ansible to set up the network interface cards of multiple racks of storage servers running Centos 6.6. Each server has four network interfaces to configure, a public 1GbE interface, a private 1GbE interface, and two 10GbE interfaces that are set up as a bonded 20GbE interface with two VLANs assigned to the bond.

If Ansible changes an interface on a server it calls a handler to restart the network interfaces so the changes go into effect. However, I don’t want the network interfaces of every single server in a cluster to restart at the same time, so at the beginning of my network.yml playbook I set:

  serial: 1

That way Ansible just updates the network config of one server at a time.

Also, if there are any failures I want Ansible to stop immediately, so if I screwed something up I don’t take out the networking to every computer in the cluster. For this reason I also set:

max_fail_percentage: 1

If a change is made to an interface I’ve been using the following handler to restart the interface:

- name: Restart Network
  service: name=network state=restarted

That works, but about half the time Ansible detects a failure and drops out with an error, even though the network restarted just fine. Checking the server immediately after Ansible says that there’s an error shows that the server is running and it’s network interfaces were configured correctly.

This behavior is annoying since you have to restart the entire playbook after one server fails. If you’re configuring many racks of servers and the network setup is just updating one server at a time I’d end up having to restart the playbook a half dozen times to get through it, even though nothing was actually wrong.

At first I thought that maybe the ssh connection was dropping (I was restarting the network after all) but you can log in via ssh and restart the network and never lose the connection, so that wasn’t the problem.

The connection does pause as the interface that you’re ssh-ing in over resets, but the connection comes right back.

I wrote a short script to repeatedly restart the network interfaces and check the exit code returned, but the exit code was always 0, “no errors”, so network restart wasn’t reporting an error, but for some reason Ansible thought there was a failure.

There’s obviously some sort of timing issue causing a problem, where Ansible is checking to see if all is well, but since the network is being reset the check times out.

I initially came up with this workaround:

- name: Restart Network
  shell: service network restart; sleep 3

That fixes the problem, however, since “sleep 3” will always exit with a 0 exit code (success), Ansible will always think this worked even when the network restart failed. (Ansible takes the last exit code returned as the success/failure of the entire shell operation.) If “service network restart” actually does fail, I want Ansible to stop processing.

In order to preserve the exit code, I wrote a one-line Perl script that restarts the network, sleeps 3 seconds, then exits with the same exit code returned by “service network restart”.

- name: Restart Network
  # Restart the network, sleep 3 seconds, return the
  # exit code returned by "service network restart".
  # This is to work-around a glitch in Ansible where
  # it detects a successful network restart as a failure.
  command: perl -e 'my $exit_code = system("service network restart"); sleep 3; $exit_code = $exit_code >> 8; exit($exit_code);'

Now Ansible grinds through the network configurations of all of the hosts in my racks without stopping.

Hope you find this useful.

Stop mounting ISO files in Linux with “-t iso9660”

Google “How do I mount an ISO image in Linux” and most of the links still say to use “-t iso9660”. For example:

mount -t iso9660 -o loop,ro diskimage.iso /mnt/iso

That worked fine 10 years ago, but these days not all ISOs use ISO9660 file systems. Many use the UDF (Universal Disk Format) file system, and if you specify ISO9660 when mounting a UDF ISO file, subtle problems can occur. For instance, file names that contain upper case letters on a UDF file system will appear in lower case when that ISO is mounted using ISO9660.

On any modern Linux distro mount is smart enough to figure out what type of file system to use when mounting an ISO file, so it’s perfectly fine to let mount infer the type, e.g.:

mount -o loop,ro diskimage.iso /mnt/iso

Here’s an example of what happens when you try to mount a type UDF ISO as type ISO9660. Note that the case of the file names changes to all lower case when mounting as iso9660, which in this case causes subtle errors to occur within the software.

[~]$ blkid /srv/isos/specsfs/SPECsfs2014-1.0.iso
/srv/isos/specsfs/SPECsfs2014-1.0.iso: UUID="2014-10-22-15-52-41-00" LABEL="SPEC_SFS2014" TYPE="udf"

[~]$ mount -t iso9660 -o loop,ro /srv/isos/specsfs/SPECsfs2014-1.0.iso /mnt/iso
[~]$ cd /mnt/iso
[/mnt/iso]$ ls
benchmarks.xml    netmist_modify     redistributable_sources
binaries          netmist_modify.c   sfs2014result.css
copyright.txt     netmist_monitor    sfs_ext_mon
docs              netmist_monitor.c  sfsmanager
import.c          netmist_pro.in     sfs_rc
license.txt       netmist_proj       spec_license.txt
makefile          netmist.sln        specreport
map_share_script  notice             submission_template.xml
mempool.c         pdsm               token_config_file
mix_table.c       pdsmlib.c          win32lib
netmist.c         rcschangelog.txt   workload.c
netmist.h         readme.txt

[/mnt/iso]$ cd
[~]$ umount /mnt/iso
[~]$ mount -o loop,ro /srv/isos/specsfs/SPECsfs2014-1.0.iso /mnt/iso
[~]$ cd /mnt/iso
[/mnt/iso]$ ls
benchmarks.xml    netmist_modify     redistributable_sources
binaries          netmist_modify.c   sfs2014result.css
copyright.txt     netmist_monitor    sfs_ext_mon
docs              netmist_monitor.c  SfsManager
import.c          netmist_pro.in     sfs_rc
license.txt       netmist_proj       SPEC_LICENSE.txt
makefile          netmist.sln        SpecReport
Map_share_script  NOTICE             submission_template.xml
mempool.c         pdsm               token_config_file
mix_table.c       pdsmlib.c          win32lib
netmist.c         rcschangelog.txt   workload.c
netmist.h         README.txt

Get Ansible’s “pip” method to install the right version of Django

I was using Ansible to set up a bunch of Scientific Linux 6.6 servers running Django and I wanted to use a specific version of Django, version 1.6.5, on all servers.

Ansible makes this easy with the “pip” module:

  - name: Install pip package from yum
    yum: name={{ item }} state=present
    with_items:
    - python-pip
    - python-setuptools

  - name: Install Django 1.6.5
    pip: name=django version=1.6.5 state=present

This works great if you’re installing on a clean, empty server, but if you’re upgrading a server that had an older version of Django on it (1.6.4 in my case) Ansible will act as if it’s installing 1.6.5, but when it’s done I still had version 1.6.4.

If I try using straight PIP commands I get this:

$ pip install django==1.6.5
Downloading/unpacking django==1.6.5
  Running setup.py egg_info for package django
    warning: no previously-included files matching '__pycache__' found under directory '*'
    warning: no previously-included files matching '*.py[co]' found under directory '*'
  Requested django==1.6.5, but installing version 1.6.4
Installing collected packages: django
  Found existing installation: Django 1.6.4
    Uninstalling Django:
      Successfully uninstalled Django
  Running setup.py install for django
    warning: no previously-included files matching '__pycache__' found under directory '*'
    warning: no previously-included files matching '*.py[co]' found under directory '*'
    changing mode of /usr/bin/django-admin.py to 755
Successfully installed django
Cleaning up...

Note the line “Requested django==1.6.5, but installing version 1.6.4”. Thanks PIP!

It turned out to be a bug in PIP versions earlier than PIP 1.4, not Ansible. A little Googling turned up a page on Stackoverflow that pointed the finger at an old cached copy of 1.6.4 in the build directory, which I found in /tmp/pip-build-root.

I updated my Ansible YAML file to get rid of the temporary directory and now it works fine:

  - name: Install pip package from yum
    yum: name={{ item }} state=present
    with_items:
    - python-pip
    - python-setuptools

  - name: Remove PIP temp directory
    file: path=/tmp/pip-build-root state=absent

  - name: Install Django 1.6.5
    pip: name=django version=1.6.5 state=present

Hope you find this useful.

Getting rid of the “redirecting to systemctl” message in OpenSUSE

On OpenSUSE systems running systemd all rcX scripts now redirect start, stop, reload, restart, etc. service commands to systemctl. The messages that  used to appear on STDOUT telling you that a command is successful (or not) are now logged, but are no longer displayed on STDOUT.

That I can deal with, but every call to an rcX script now generates the message “redirecting to systemctl” to STDERR. I have a lot of scripts that call rcX scripts, and they interpret STDERR messages as “something just broke”.

The culprit is the new /etc/rc.status script that ships with OpenSUSE. It spews out the “redirecting to systemctl” message to STDERR for every operation that you do. The following command will modify the script and remove this stupid message:

if ( grep -q 'redirecting to systemctl' /etc/rc.status ) ; then
    # Save a copy of the original file
    cp -p /etc/rc.status /etc/rc.status.orig;

    # OpenSUSE 12.1:
    perl -i.bak -pe 's,echo "redirecting to systemctl" >/dev/stderr,,;' /etc/rc.status;

    # OpenSUSE 12.3:
    perl -i.bak -pe 's,echo "redirecting to systemctl \${SYSTEMCTL_OPTIONS} \$1 \${_rc_base}" 1>&2,,;' /etc/rc.status;
fi

This works for OpenSUSE 12.1 and 12.3. I did not have a 12.2 system available to test with.

Hope you find this useful.

Bring Pidgin’s window into front focus when there’s an inbound IM

I was talking to a co-worker about Pidgin not coming into focus when there’s a new, inbound IM. The Pidgin window used to come into focus, front and center, when I was running Ubuntu/Gnome and when running OpenSUSE/KDE, but when I upgraded my office desktop to Ubuntu/Unity it stopped behaving this way. My co-worker noticed the same behavior with Fedora17/Gnome. A new IM would come in, but the Pidgin IM window would remain in the background, hidden, unseen and unread.

I thought “There has to be a setting that controls this,” and there is…

  • Bring up Pidgin’s Buddy List
  • Click Tools > Plugins
  • Locate the Message Notification plugin and highlight it
  • At the bottom of the Plugins window is a Configure Plugin button. Click it
  • Under Notification Methods check both Raise conversation window and Present conversation window
  • Click Close

That’s it. The next time someone IM’s you, your Pidgin Conversation will pop up in the center of your screen, in front of all of your other windows.

Hope you find this useful.