Quickly get IP addresses of new VMs

I spin up a lot of VMs using VMware Fusion. I generally keep “clean” generic copies of a few different distros and versions of Linux servers ready to go with my login, an sshd server, ssh keys, and basic settings that I use already set up. When I need to quickly test something manually — usually some new, multi-VM distributed container orchestration or database system — I just make as many copies of the server’s *.vmwarevm file as I need, fire up the VM copies on my laptop, test whatever I need to test, then shut them down. Eventually I delete the copies and recover the disk space.

Depending on where my laptop is running I’ll get a completely random IP address for the VM from the local DHCP server. I would log into the consoles, get the IPs, then log into the various VMs from a terminal. (Cut and paste just works a whole lot better on a terminal than on the VMware console.)

However, since the console screens are up, and I repeat this pattern several times a week, I figured why not save a step and make the ephemeral VMs just show me their IP address on their consoles without having to login, so I added an “on reboot” file called /etc/cron.d/welcome on the master image which updates the /etc/issue file.

/etc/cron.d/welcome looks like this:

@reboot root (/bin/hostname; /bin/uname -a; echo; if [ -x /sbin/ip ]; then /sbin/ip addr; else /sbin/ifconfig; fi) > /etc/issue

When a new VM boots, it writes the hostname, kernel info, and the ethernet config to the /etc/issue file. /etc/issue is displayed on the screen before the login prompt, so now I can just glance at the console, see the IP address, and ssh to the new VM.

Ephemeral VM

Although you’d never want to do this on a production system, it works great for ephemeral, throw-away test VMs.

Hope you find this useful.

Share Button

Policy-based Cloud Storage

This is a talk I gave last week at the SF Microservices Meetup titled Policy-based Cloud Storage, Persisting Data in a Multi-Site, Multi-Cloud World. In it I cover Apcera‘s approach to storage for containers and how to use policy to manage very large scale application deployments.

Share Button

Why adding a .conf or .cfg file to /etc/sudoers.d doesn’t work

I needed to add some sudo access rights for support personnel on about a hundred Centos 6.6 servers. Currently no one one these hosts had sudo rights, so the /etc/sudoers file was the default file. I’m using Ansible to maintain these hosts, but rather than modify the default /etc/sudoers file using Ansible’s lineinfile: command, I decided to create a support.conf file and use Ansible’s copy: command to copy that file into /etc/sudoers.d/. That way if a future version of Centos changes the /etc/sudoers file I’m leaving that file untouched, so my changes should always work.

  - name: Add custom sudoers
    copy: src=files/support.conf dest=/etc/sudoers.d/support.conf owner=root group=root mode=0440 validate='visudo -cf %s'

The support.conf file I created copied over just fine, and the validation step of running “visudo -cf” on the file before moving it into place claimed that the file was error-free and should work just fine as a sudoers file.

I logged in as the support user and it didn’t work:

[support@c1n1 ~]$ sudo /bin/ls /var/log/*
support is not in the sudoers file.  This incident will be reported.

Not only did it not work, it was telling me that the support user wasn’t even in the file, which they clearly were.

After Googling around a bit and not finding much I saw this in the Sudoers Manual:

sudo will read each file in /etc/sudoers.d, skipping file names that end in ‘~’ or contain a ‘.’ character to avoid causing problems with package manager or editor temporary/backup files.

sudo was skipping the file because the file name contained a period!

I changed the name of the file from support.conf to support and it worked.

  - name: Add custom sudoers
    copy: src=files/support dest=/etc/sudoers.d/support owner=root group=root mode=0440 validate='visudo -cf %s'

Hope you find this useful.

Here’s a snippet from /etc/sudoers.d/support if you’re interested. The “support” user has already been created by a separate Ansible command.

# Networking
Cmnd_Alias NETWORKING = /sbin/route, /sbin/ifconfig, /bin/ping, /sbin/dhclient, /usr/bin/net, /sbin/iptables, /usr/bin/rfcomm, /usr/bin/wvdial, /sbin/iwconfig, /sbin/mii-tool

# Installation and management of software
Cmnd_Alias SOFTWARE = /bin/rpm, /usr/bin/up2date, /usr/bin/yum

# Services
Cmnd_Alias SERVICES = /sbin/service, /sbin/chkconfig

# Reading logs
Cmnd_Alias READ_LOGS = /usr/bin/less /var/log/*, /bin/more /var/log/*, /bin/ls /var/log/*, /bin/ls /var/log

support  ALL = NETWORKING, SOFTWARE, SERVICES, READ_LOGS
Share Button

Restarting network interfaces in Ansible

I’m using Ansible to set up the network interface cards of multiple racks of storage servers running Centos 6.6. Each server has four network interfaces to configure, a public 1GbE interface, a private 1GbE interface, and two 10GbE interfaces that are set up as a bonded 20GbE interface with two VLANs assigned to the bond.

If Ansible changes an interface on a server it calls a handler to restart the network interfaces so the changes go into effect. However, I don’t want the network interfaces of every single server in a cluster to restart at the same time, so at the beginning of my network.yml playbook I set:

  serial: 1

That way Ansible just updates the network config of one server at a time.

Also, if there are any failures I want Ansible to stop immediately, so if I screwed something up I don’t take out the networking to every computer in the cluster. For this reason I also set:

max_fail_percentage: 1

If a change is made to an interface I’ve been using the following handler to restart the interface:

- name: Restart Network
  service: name=network state=restarted

That works, but about half the time Ansible detects a failure and drops out with an error, even though the network restarted just fine. Checking the server immediately after Ansible says that there’s an error shows that the server is running and it’s network interfaces were configured correctly.

This behavior is annoying since you have to restart the entire playbook after one server fails. If you’re configuring many racks of servers and the network setup is just updating one server at a time I’d end up having to restart the playbook a half dozen times to get through it, even though nothing was actually wrong.

At first I thought that maybe the ssh connection was dropping (I was restarting the network after all) but you can log in via ssh and restart the network and never lose the connection, so that wasn’t the problem.

The connection does pause as the interface that you’re ssh-ing in over resets, but the connection comes right back.

I wrote a short script to repeatedly restart the network interfaces and check the exit code returned, but the exit code was always 0, “no errors”, so network restart wasn’t reporting an error, but for some reason Ansible thought there was a failure.

There’s obviously some sort of timing issue causing a problem, where Ansible is checking to see if all is well, but since the network is being reset the check times out.

I initially came up with this workaround:

- name: Restart Network
  shell: service network restart; sleep 3

That fixes the problem, however, since “sleep 3” will always exit with a 0 exit code (success), Ansible will always think this worked even when the network restart failed. (Ansible takes the last exit code returned as the success/failure of the entire shell operation.) If “service network restart” actually does fail, I want Ansible to stop processing.

In order to preserve the exit code, I wrote a one-line Perl script that restarts the network, sleeps 3 seconds, then exits with the same exit code returned by “service network restart”.

- name: Restart Network
  # Restart the network, sleep 3 seconds, return the
  # exit code returned by "service network restart".
  # This is to work-around a glitch in Ansible where
  # it detects a successful network restart as a failure.
  command: perl -e 'my $exit_code = system("service network restart"); sleep 3; $exit_code = $exit_code >> 8; exit($exit_code);'

Now Ansible grinds through the network configurations of all of the hosts in my racks without stopping.

Hope you find this useful.

Share Button

Stop mounting ISO files in Linux with “-t iso9660”

Google “How do I mount an ISO image in Linux” and most of the links still say to use “-t iso9660”. For example:

mount -t iso9660 -o loop,ro diskimage.iso /mnt/iso

That worked fine 10 years ago, but these days not all ISOs use ISO9660 file systems. Many use the UDF (Universal Disk Format) file system, and if you specify ISO9660 when mounting a UDF ISO file, subtle problems can occur. For instance, file names that contain upper case letters on a UDF file system will appear in lower case when that ISO is mounted using ISO9660.

On any modern Linux distro mount is smart enough to figure out what type of file system to use when mounting an ISO file, so it’s perfectly fine to let mount infer the type, e.g.:

mount -o loop,ro diskimage.iso /mnt/iso

Here’s an example of what happens when you try to mount a type UDF ISO as type ISO9660. Note that the case of the file names changes to all lower case when mounting as iso9660, which in this case causes subtle errors to occur within the software.

[~]$ blkid /srv/isos/specsfs/SPECsfs2014-1.0.iso
/srv/isos/specsfs/SPECsfs2014-1.0.iso: UUID="2014-10-22-15-52-41-00" LABEL="SPEC_SFS2014" TYPE="udf"

[~]$ mount -t iso9660 -o loop,ro /srv/isos/specsfs/SPECsfs2014-1.0.iso /mnt/iso
[~]$ cd /mnt/iso
[/mnt/iso]$ ls
benchmarks.xml    netmist_modify     redistributable_sources
binaries          netmist_modify.c   sfs2014result.css
copyright.txt     netmist_monitor    sfs_ext_mon
docs              netmist_monitor.c  sfsmanager
import.c          netmist_pro.in     sfs_rc
license.txt       netmist_proj       spec_license.txt
makefile          netmist.sln        specreport
map_share_script  notice             submission_template.xml
mempool.c         pdsm               token_config_file
mix_table.c       pdsmlib.c          win32lib
netmist.c         rcschangelog.txt   workload.c
netmist.h         readme.txt

[/mnt/iso]$ cd
[~]$ umount /mnt/iso
[~]$ mount -o loop,ro /srv/isos/specsfs/SPECsfs2014-1.0.iso /mnt/iso
[~]$ cd /mnt/iso
[/mnt/iso]$ ls
benchmarks.xml    netmist_modify     redistributable_sources
binaries          netmist_modify.c   sfs2014result.css
copyright.txt     netmist_monitor    sfs_ext_mon
docs              netmist_monitor.c  SfsManager
import.c          netmist_pro.in     sfs_rc
license.txt       netmist_proj       SPEC_LICENSE.txt
makefile          netmist.sln        SpecReport
Map_share_script  NOTICE             submission_template.xml
mempool.c         pdsm               token_config_file
mix_table.c       pdsmlib.c          win32lib
netmist.c         rcschangelog.txt   workload.c
netmist.h         README.txt
Share Button

Get Ansible’s “pip” method to install the right version of Django

I was using Ansible to set up a bunch of Scientific Linux 6.6 servers running Django and I wanted to use a specific version of Django, version 1.6.5, on all servers.

Ansible makes this easy with the “pip” module:

  - name: Install pip package from yum
    yum: name={{ item }} state=present
    with_items:
    - python-pip
    - python-setuptools

  - name: Install Django 1.6.5
    pip: name=django version=1.6.5 state=present

This works great if you’re installing on a clean, empty server, but if you’re upgrading a server that had an older version of Django on it (1.6.4 in my case) Ansible will act as if it’s installing 1.6.5, but when it’s done I still had version 1.6.4.

If I try using straight PIP commands I get this:

$ pip install django==1.6.5
Downloading/unpacking django==1.6.5
  Running setup.py egg_info for package django
    warning: no previously-included files matching '__pycache__' found under directory '*'
    warning: no previously-included files matching '*.py[co]' found under directory '*'
  Requested django==1.6.5, but installing version 1.6.4
Installing collected packages: django
  Found existing installation: Django 1.6.4
    Uninstalling Django:
      Successfully uninstalled Django
  Running setup.py install for django
    warning: no previously-included files matching '__pycache__' found under directory '*'
    warning: no previously-included files matching '*.py[co]' found under directory '*'
    changing mode of /usr/bin/django-admin.py to 755
Successfully installed django
Cleaning up...

Note the line “Requested django==1.6.5, but installing version 1.6.4”. Thanks PIP!

It turned out to be a bug in PIP versions earlier than PIP 1.4, not Ansible. A little Googling turned up a page on Stackoverflow that pointed the finger at an old cached copy of 1.6.4 in the build directory, which I found in /tmp/pip-build-root.

I updated my Ansible YAML file to get rid of the temporary directory and now it works fine:

  - name: Install pip package from yum
    yum: name={{ item }} state=present
    with_items:
    - python-pip
    - python-setuptools

  - name: Remove PIP temp directory
    file: path=/tmp/pip-build-root state=absent

  - name: Install Django 1.6.5
    pip: name=django version=1.6.5 state=present

Hope you find this useful.

Share Button

Getting rid of the “redirecting to systemctl” message in OpenSUSE

On OpenSUSE systems running systemd all rcX scripts now redirect start, stop, reload, restart, etc. service commands to systemctl. The messages that  used to appear on STDOUT telling you that a command is successful (or not) are now logged, but are no longer displayed on STDOUT.

That I can deal with, but every call to an rcX script now generates the message “redirecting to systemctl” to STDERR. I have a lot of scripts that call rcX scripts, and they interpret STDERR messages as “something just broke”.

The culprit is the new /etc/rc.status script that ships with OpenSUSE. It spews out the “redirecting to systemctl” message to STDERR for every operation that you do. The following command will modify the script and remove this stupid message:

if ( grep -q 'redirecting to systemctl' /etc/rc.status ) ; then
    # Save a copy of the original file
    cp -p /etc/rc.status /etc/rc.status.orig;

    # OpenSUSE 12.1:
    perl -i.bak -pe 's,echo "redirecting to systemctl" >/dev/stderr,,;' /etc/rc.status;

    # OpenSUSE 12.3:
    perl -i.bak -pe 's,echo "redirecting to systemctl \${SYSTEMCTL_OPTIONS} \$1 \${_rc_base}" 1>&2,,;' /etc/rc.status;
fi

This works for OpenSUSE 12.1 and 12.3. I did not have a 12.2 system available to test with.

Hope you find this useful.

Share Button

Bring Pidgin’s window into front focus when there’s an inbound IM

I was talking to a co-worker about Pidgin not coming into focus when there’s a new, inbound IM. The Pidgin window used to come into focus, front and center, when I was running Ubuntu/Gnome and when running OpenSUSE/KDE, but when I upgraded my office desktop to Ubuntu/Unity it stopped behaving this way. My co-worker noticed the same behavior with Fedora17/Gnome. A new IM would come in, but the Pidgin IM window would remain in the background, hidden, unseen and unread.

I thought “There has to be a setting that controls this,” and there is…

  • Bring up Pidgin’s Buddy List
  • Click Tools > Plugins
  • Locate the Message Notification plugin and highlight it
  • At the bottom of the Plugins window is a Configure Plugin button. Click it
  • Under Notification Methods check both Raise conversation window and Present conversation window
  • Click Close

That’s it. The next time someone IM’s you, your Pidgin Conversation will pop up in the center of your screen, in front of all of your other windows.

Hope you find this useful.

Share Button

Creating differential backups with hard links and rsync

You can use a hard link in Linux to create two file names that both point to the same physical location on a hard disk. For instance, if I type:

> echo xxxx > a
> cp -l a b
> cat a
xxxx
> cat b
xxxx

I create a file named “a” that contains the string “xxxx”. Then I create a hard link “b” that also points to the same spot on the disk. Now if I write to the file “a” whatever I write also appears in file “b” and vice versa:

> echo yyyy > b
> cat b
yyyy
> cat a
yyyy
> echo zzzz > a
> cat a
zzzz
> cat b
zzzz

Copying to a hard link updates the data on the disk that each hard link points to:

> rm a b c
> echo xxxx > a
> echo yyyy > c
> cp -l a b
> cat a b c
xxxx
xxxx
yyyy

“a” and “b” point to the same file on disk, “c” is a separate file. If I copy a file “c” to “b” that also updates “a”:

> cp c b 
> cat a b c
yyyy
yyyy
yyyy
> echo zzzz > c
> cat a b c
yyyy
yyyy
zzzz 

What most people don’t know is that rsync is an exception to this rule. If you use rsync to sync two files, and it sees that the target file is a hard link, it will create a new target file but only if the contents of the two files are not the same:

> rm a
> rm b
> echo xxxx > a
> cp -l a b
> cat a
xxxx
> cat b
xxxx
> echo yyyy > c
> cat c
yyyy
> rsync -av c b
sending incremental file list
c
sent 87 bytes  received 31 bytes  236.00 bytes/sec
total size is 5  speedup is 0.04
> cat b
yyyy
> cat c
yyyy
> cat a
xxxx

File “b” is no longer a hard link of “a”, it’s a new file. If I update “a” it no longer updates “b”:

> echo zzzz > a
> cat a b c
zzzz
yyyy
yyyy

However, if the file that I’m rsync-ing is the same as “b”, then rsync does NOT break the hard link, it leaves the file alone:

> rm a
> rm b
> rm c
> echo xxxx > a
> cp -al a b
> cp -p a c
> cat a b c
xxxx
xxxx
xxxx

At this point “a” and “b” both point to the same file on the disk, which contains the string “xxxx”. “c” is a separate file that also contains the string “xxxx” and has the same permissions and timestamp as “a”.

> rsync -av c b
sending incremental file list
sent 39 bytes  received 12 bytes  102.00 bytes/sec
total size is 5  speedup is 0.10
> cat a b c
xxxx
xxxx
xxxx

At this point I’ve rsynced file “c” to “b”, but since c has the same contents and timestamp as “a” and “b” rsync does nothing at all. It doesn’t break the hard link. If I change “b” it still updates “a”:

> echo yyyy > b
> cat a b c
yyyy
yyyy
xxxx

This is how many modern file system backup programs work. On day 1 you make an rsync copy of your entire file system:

backup@backup_server> DAY1=`date +%Y%m%d%H%M%S`
backup@backup_server> rsync -av -e ssh earl@192.168.1.20:/home/earl/ /var/backups/$DAY1/

On day 2 you make a hard link copy of the backup, then a fresh rsync:

backup@backup_server> DAY2=`date +%Y%m%d%H%M%S`
backup@backup_server> cp -al /var/backups/$DAY1 /var/backups/$DAY2
backup@backup_server> rsync -av -e ssh --delete earl@192.168.1.20:/home/earl/ /var/backups/$DAY2/

“cp -al” makes a hard link copy of the entire /home/earl/ directory structure from the previous day, then rsync runs against the copy of the tree. If a file remains unchanged then rsync does nothing — the file remains a hard link. However, if the file’s contents changed, then rsync will create a new copy of the file in the target directory. If a file was deleted from /home/earl then rsync deletes the hard link from that day’s copy.

In this way, the $DAY1 directory has a snapshot of the /home/earl tree as it existed on day 1, and the $DAY2 directory has a snapshot of the /home/earl tree as it existed on day 2, but only the files that changed take up additional disk space. If you need to find a file as it existed at some point in time you can look at that day’s tree. If you need to restore yesterday’s backup you can rsync the tree from yesterday, but you don’t have to store a copy of all of the data from each day, you only use additional disk space for files that changed or were added.

I use this technique to keep 90 daily backups of a 500GB file system on a 1TB drive.

One caveat: The hard links do use up inodes. If you’re using a file system such as ext3, which has a set number of inodes, you should allocate extra inodes on the backup volume when you create it. If you’re using a file system that can dynamically add inodes, such as ext4, zfs or btrfs, then you don’t need to worry about this.

Share Button

Workaround to fix the problem of KDE “forgetting” your multi-monitor setup

I originally reported this KDE4 bug as https://bugs.kde.org/show_bug.cgi?id=312190. It’s also reported as bugs #311641, #309356, and #307589.

In my case I have 3 monitors on one video card. The card and all three monitors are detected correctly, but after I reboot the “Position” settings have all reverted to what they were when I first installed KDE.

I can change the position settings back to the correct settings, click “Save as Default”, log  out, log in, and the position settings have once again reverted to what they were when I first installed KDE.

After trying several things I still haven’t fixed the problem (I suspect a timing issue in the KDE startup) but I did figure out a work-around that anyone can use to “fix” their system so their monitors come up correctly. I posted the work-around on bugs.kde.org and I’m also posting it here.

Here’s how you do it:

Get your monitors set up the way you want them using Configure Desktop > Display and Monitor and click “Save as Default.” This updates the file ~/.kde4/share/config/krandrrc.

Open ~/.kde4/share/config/krandrrc, copy everything on the line after “StartupCommands=”. In my case the line looks like this:

xrandr --output DVI-I-1 --pos 1680x0 --mode 1680x1050 --refresh 60\nxrandr --output DP-0 
  --pos 3360x0 --mode 1680x1050 --refresh 60\nxrandr --output DVI-D-0 --pos 0x0 --mode 1680x1050
  --refresh 60\nxrandr --output DVI-I-1 --primary

Create a new script called ~/bin/workaround-for-kde-bug-312190.sh:

mkdir -p ~/bin
vim ~/bin/workaround-for-kde-bug-312190.sh

(If you don’t like vim, use whatever editor you like.)

Paste the line into the script file.

Change the “\n” characters into actual newlines so you end up with each “xrandr” command on a separate line. In my case I ended up with:

xrandr --output DVI-I-1 --pos 1680x0 --mode 1680x1050 --refresh 60
xrandr --output DP-0 --pos 3360x0 --mode 1680x1050 --refresh 60
xrandr --output DVI-D-0 --pos 0x0 --mode 1680x1050 --refresh 60
xrandr --output DVI-I-1 --primary

These are the settings for MY desktop. Yours will look different!

Make it executable:

chmod +x ~/bin/workaround-for-kde-bug-312190.sh

Run the script ~/bin/workaround-for-kde-bug-312190.sh. Your monitors should still be set up correctly. If they’re messed up, you probably didn’t cut and paste the line correctly. Repeat the above steps again.

Pick Autostart from the KDE menu. (Use the Search function if you can’t figure out where it’s buried.)

Click “Add Script” and paste the line “~/bin/workaround-for-kde-bug-312190.sh” into the “Shell script path” text box.

Click OK, click OK.

The next time you restart KDE it will still start up with the wrong configuration, then Autostart will execute ~/bin/workaround-for-kde-bug-312190.sh and fix the problem.

Share Button