post

Use .iso and Kickstart files to automatically create Ubuntu VMs

I was looking for a way to automate the creation of VMs for testing various distributed system / cluster software packages. I’ve used Vagrant in the past but I wanted something that would:

  • Allow me to use raw ISO files as the basis for guest VMs.
  • Guest VMs should be set up with bridged IPs that are routable from the host.
  • Guest VMs should be able to reach the Internet.
  • Other hosts on the local network should be able to reach guest VMs. (Setting up additional routes is OK).
  • VM creation should work with any distro that supports Kickstart files.
  • Scripts should be able to create and delete VMs in a scripted, fully-automatic manner.
  • Guest VMs should be set up to allow passwordless ssh access from the “ansible” user.

I’ve previously used virsh’s virt-install tool to create VMs and I like how easy it is to set up things like extra network interfaces and attach existing disk images. The scripts in this repo fully automate the virsh VM creation process.

Scripts

I put all of my code into a Github repo containing these scripts:

  • create-vm – Use .iso and kickstart files to auto-generate a VM.
  • delete-vm – Delete a virtual machine created with create-vm.
  • get-vm-ip – Get the IP address of a VM managed by virsh.
  • encrypt-pw – Returns a SHA512 encrypted password suitable for pasting into Kickstart files.

I’ve also included a sample ubuntu.ks Kickstart file for creating an Ubuntu host.

Host setup

I’m running the scripts from a host with Ubuntu Linux 18.10 installed. I added the following to the host’s Ansible playbook to install the necessary virtualization packages:

  - name: Install virtualization packages
apt:
name: "{{item}}"
state: latest
with_items:
- qemu-kvm
- libvirt-bin
- libvirt-clients
- libvirt-daemon
- libvirt-daemon-driver-storage-zfs
- python-libvirt
- python3-libvirt
- system-config-kickstart
- vagrant-libvirt
- vagrant-sshfs
- virt-manager
- virtinst

If you’re not using Ansible just apt-get install the above packages.

Sample Kickstart file

There are plenty of documents on the Internet on how to set up Kickstart files.

A couple of things that are special about the included Kickstart file

The Ansible user: Although I’d prefer to create the “ansible” user as a locked account,with no password just an ssh public key, Kickstart on Ubuntu does not allow this, so I do set up an encrypted password.

To set up your own password, use the encrypt-pw script to create a SHA512-hashed password that you can copy and paste into the Kickstart file. After a VM is created you can use this password if you need to log into the VM via the console.

To use your own ssh key, replace the ssh key in the %post section with your own public key.

The %post section at the bottom of the Kickstart file does a couple of things:

  • It updates all packages with the latest versions.
  • To configure a VM with Ansible, you just need ssh access to a VM and Python installed. on the VM. So I use %post to install an ssh-server and Python.
  • I start the serial console, so that virsh console $vmname works.
  • I add a public key for Ansible, so I can configure the servers with Ansible without entering a password.

Despite the name, the commands in the %post section are not the last commands executed by Kickstart on an Ubuntu 18.10 server. The “ansible” user is added after the %post commands are executed. This means that the Ansible ssh public key gets added before the ansible user is created.

To make key-based logins work I set the UID:GID of authorized_keys to 1000:1000. The user is later created with UID=1000, GID=1000, which means that the authorized_keys file ends up being owned by the ansible user by the time the VM creation is complete.

Create an Ubuntu 18.10 server

This creates a VM using Ubuntu’s text-based installer. Since the `-d` parameter is used,progress of the install is shown on screen.

create-vm -n node1 \
-i ~/isos/ubuntu-18.10-server-amd64.iso \
-k ~/conf/ubuntu.ks \
-d

Create 8 Ubuntu 18.10 servers

This starts the VM creation process and exits. Creation of the VMs continues in the background.

for n in `seq 1 8`; do
create-vm -n node$n \
-i ~/isos/ubuntu-18.10-server-amd64.iso \
-k ~/conf/ubuntu.ks
done

Delete 8 virtual machines

for n in `seq 1 8`; do
delete-vm node$n
done

Connect to a VM via the console

virsh console node1

Connect to a VM via ssh

ssh ansible@`get-vm-ip node1`

Generate an Ansible hosts file

(
echo '[hosts]'
for n in `seq 1 8`; do
ip=`get-vm-ip node$n`
echo "node$n ansible_ip=$ip ansible_user=ansible"
done
) > hosts

Handy virsh commands

  • virsh list – List all running VMs.
  • virsh domifaddr node1 – Get a node’s IP address. Does not work with all network setups,which is why I wrote the get-vm-ip script.
  • virsh net-list – Show what networks were created by virsh.
  • virsh net-dhcp-leases $network – Shows current DHCP leases when virsh is acting as the DHCP server. Leases may be shown for machines that no longer exist.

Known Issues

  • VMs created without the -d (debug mode) parameter may be created in “stopped” mode. To start them up, run the command virsh start $vmname
  • Depending on how your host is set up, you may need to run these scripts as root.
  • Ubuntu text mode install messes up terminal screens. Run reset from the command line to restore a terminal’s functionality.
  • I use Ansible to set a guest’s hostname, not Kickstart, so all Ubuntu guests created have the host name “ubuntu”.

Hope you find this useful.

create-vm script

Download create-vm from Github

#!/bin/bash

# create-vm - Use .iso and kickstart files to auto-generate a VM.

# Copyright 2018 Earl C. Ruby III
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

HOSTNAME=
ISO_FQN=
KS_FQN=
RAM=1024
VCPUS=2
STORAGE=20
BRIDGE=virbr0
MAC="RANDOM"
VERBOSE=
DEBUG=
VM_IMAGE_DIR=/var/lib/libvirt

usage()
{
cat << EOF
usage: $0 options

This script will take an .iso file created by revisor and generate a VM from it.

OPTIONS:
-h Show this message
-n Host name (required)
-i Full path and name of the .iso file to use (required)
-k Full path and name of the Kickstart file to use (required)
-r RAM in MB (defaults to ${RAM})
-c Number of VCPUs (defaults to ${VCPUS})
-s Amount of storage to allocate in GB (defaults to ${STORAGE})
-b Bridge interface to use (defaults to ${BRIDGE})
-m MAC address to use (default is to use a randomly-generated MAC)
-v Verbose
-d Debug mode
EOF
}

while getopts "h:n:i:k:r:c:s:b:m:v:d" option; do
case "${option}"
in
h)
usage
exit 0
;;
n) HOSTNAME=${OPTARG};;
i) ISO_FQN=${OPTARG};;
k) KS_FQN=${OPTARG};;
r) RAM=${OPTARG};;
c) VCPUS=${OPTARG};;
s) STORAGE=${OPTARG};;
b) BRIDGE=${OPTARG};;
m) MAC=${OPTARG};;
v) VERBOSE=1;;
d) DEBUG=1;;
esac
done

if [[ -z $HOSTNAME ]]; then
echo "ERROR: Host name is required"
usage
exit 1
fi

if [[ -z $ISO_FQN ]]; then
echo "ERROR: ISO file name or http url is required"
usage
exit 1
fi

if [[ -z $KS_FQN ]]; then
echo "ERROR: Kickstart file name or http url is required"
usage
exit 1
fi

if ! [[ -f $ISO_FQN ]]; then
echo "ERROR: $ISO_FQN file not found"
usage
exit 1
fi

if ! [[ -f $KS_FQN ]]; then
echo "ERROR: $KS_FQN file not found"
usage
exit 1
fi
KS_FILE=$(basename "$KS_FQN")

if [[ ! -z $VERBOSE ]]; then
echo "Building ${HOSTNAME} using MAC ${MAC} on ${BRIDGE}"
echo "======================= $KS_FQN ======================="
cat "$KS_FQN"
echo "=============================================="
set -xv
fi

mkdir -p $VM_IMAGE_DIR/{images,xml}

virt-install \
--connect=qemu:///system \
--name="${HOSTNAME}" \
--bridge="${BRIDGE}" \
--mac="${MAC}" \
--disk="${VM_IMAGE_DIR}/images/${HOSTNAME}.img,bus=virtio,size=${STORAGE}" \
--ram="${RAM}" \
--vcpus="${VCPUS}" \
--autostart \
--hvm \
--arch x86_64 \
--accelerate \
--check-cpu \
--os-type=linux \
--force \
--watchdog=default \
--extra-args="ks=file:/${KS_FILE} console=tty0 console=ttyS0,115200n8 serial" \
--initrd-inject="${KS_FQN}" \
--graphics=none \
--noautoconsole \
--debug \
--location="${ISO_FQN}"

if [[ ! -z $DEBUG ]]; then
# Connect to the console and watch the install
virsh console "${HOSTNAME}"
virsh start "${HOSTNAME}"
fi

# Make a backup of the VM's XML definition file
virsh dumpxml "${HOSTNAME}" > "${VM_IMAGE_DIR}/xml/${HOSTNAME}.xml"

if [ ! -z $VERBOSE ]; then
set +xv
fi

ubuntu.ks Kickstart file

Download ubuntu.ks on Github.

# System language
lang en_US
# Language modules to install
langsupport en_US
# System keyboard
keyboard us
# System mouse
mouse
# System timezone
timezone --utc Etc/UTC
# Root password
rootpw --disabled
# Initial user
user ansible --fullname "ansible" --iscrypted --password $6$CfjrLvwGbzSPGq49$t./6zxk9D16P6J/nq2eBVWQ74aGgzKDrQ9LdbTfVA0IrHTQ7rQ8iq61JTE66cUjdIPWY3fN7lGyR4LzrGwnNP.
# Reboot after installation
reboot
# Use text mode install
text
# Install OS instead of upgrade
install
# Use CDROM installation media
cdrom
# System bootloader configuration
bootloader --location=mbr 
# Clear the Master Boot Record
zerombr yes
# Partition clearing information
clearpart --all 
# Disk partitioning information
part / --fstype ext4 --size 3700 --grow
part swap --size 200 
# System authorization infomation
auth  --useshadow  --enablemd5 
# Firewall configuration
firewall --enabled --ssh 
# Do not configure the X Window System
skipx
%post --interpreter=/bin/bash
echo ### Redirect output to console
exec < /dev/tty6 > /dev/tty6
chvt 6
echo ### Update all packages
apt-get update
apt-get -y upgrade
# Install packages
apt-get install -y openssh-server vim python
echo ### Enable serial console so virsh can connect to the console
systemctl enable serial-getty@ttyS0.service
systemctl start serial-getty@ttyS0.service
echo ### Add public ssh key for Ansible
mkdir -m0700 -p /home/ansible/.ssh
cat <<EOF >/home/ansible/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC14kgOfzuOA4+hD16JFTAtXWc0UkzvXw3TrPxivh+/t86FH1N1qlXLLZn60voysLHnyiwTXrC8Zy0H8rckRopepozMBRegeklnkLHaiFDBdbAkHm8DMOd5QuGedBZ+s80+05btzOVxZcN5M6m4A03vLGGoOqnJvewv1u/yP6et0hP2Bs6D9ycczOpXeOgKnUXt1rciVYTk9xwOXFWcZ5phnXSCGA1w0BACK2CZaCKmsAT5YR7PA4N+7xVMvhOgo3gt8dNxNtmEtkZSlYYqkJehuldt1IPpfQ5/QYngYKX1ZCKS2LHc9Ys3F8QX3djhOhqFL+kcDMrdTaT5GlAFcgqebIao5mTYpiqc72YbvbCMWRVomhE0TWgliIftR/65NzOf8N4b2fE/hakLkIsGyR7TQiNmAHgagqBX/qdBJ7QJdNN7GH2aGP/Ni7b9xX2jsWXRj8QcSee+IDgfm2k/uKGvI6+RotRVx/EjwOGVUeGp8txP8l4T0AmYsgiL1Phe5swAPaMj4R+m38dwzr2WF/PViI1upF/Weczoiu0dDODLsijdHBAIju9BEDBzcFbDPoLLKHOSMusy86CVGNSEaDZUYwZ2GHY6anfEaRzTJtqRKNvsGiSRJAeKIQrZ16e9QPihcQQQNM+Z9QW7Ppaum8f2QlFMit03UYMplw0EpNPAWQ== ansible@host
EOF
echo ### Set permissions for Ansible directory and key. Since the "Initial user"
echo ### is added *after* %post commands are executed, I use the UID:GID
echo ### as a hack since I know that the first user added will be 1000:1000.
chown -R 1000:1000 /home/ansible
chmod 0600 /home/ansible/.ssh/authorized_keys
echo ### Change back to terminal 1
chvt 1
Share Button

Quickly get IP addresses of new VMs

I spin up a lot of VMs using VMware Fusion. I generally keep “clean” generic copies of a few different distros and versions of Linux servers ready to go with my login, an sshd server, ssh keys, and basic settings that I use already set up. When I need to quickly test something manually — usually some new, multi-VM distributed container orchestration or database system — I just make as many copies of the server’s *.vmwarevm file as I need, fire up the VM copies on my laptop, test whatever I need to test, then shut them down. Eventually I delete the copies and recover the disk space.

Depending on where my laptop is running I’ll get a completely random IP address for the VM from the local DHCP server. I would log into the consoles, get the IPs, then log into the various VMs from a terminal. (Cut and paste just works a whole lot better on a terminal than on the VMware console.)

However, since the console screens are up, and I repeat this pattern several times a week, I figured why not save a step and make the ephemeral VMs just show me their IP address on their consoles without having to login, so I added an “on reboot” file called /etc/cron.d/welcome on the master image which updates the /etc/issue file.

/etc/cron.d/welcome looks like this:

@reboot root (/bin/hostname; /bin/uname -a; echo; if [ -x /sbin/ip ]; then /sbin/ip addr; else /sbin/ifconfig; fi) > /etc/issue

When a new VM boots, it writes the hostname, kernel info, and the ethernet config to the /etc/issue file. /etc/issue is displayed on the screen before the login prompt, so now I can just glance at the console, see the IP address, and ssh to the new VM.

Ephemeral VM

Although you’d never want to do this on a production system, it works great for ephemeral, throw-away test VMs.

Hope you find this useful.

Share Button


Policy-based Cloud Storage

This is a talk I gave last week at the SF Microservices Meetup titled Policy-based Cloud Storage, Persisting Data in a Multi-Site, Multi-Cloud World. In it I cover Apcera‘s approach to storage for containers and how to use policy to manage very large scale application deployments.

Share Button


Why adding a .conf or .cfg file to /etc/sudoers.d doesn’t work

I needed to add some sudo access rights for support personnel on about a hundred Centos 6.6 servers. Currently no one one these hosts had sudo rights, so the /etc/sudoers file was the default file. I’m using Ansible to maintain these hosts, but rather than modify the default /etc/sudoers file using Ansible’s lineinfile: command, I decided to create a support.conf file and use Ansible’s copy: command to copy that file into /etc/sudoers.d/. That way if a future version of Centos changes the /etc/sudoers file I’m leaving that file untouched, so my changes should always work.

  - name: Add custom sudoers
    copy: src=files/support.conf dest=/etc/sudoers.d/support.conf owner=root group=root mode=0440 validate='visudo -cf %s'

The support.conf file I created copied over just fine, and the validation step of running “visudo -cf” on the file before moving it into place claimed that the file was error-free and should work just fine as a sudoers file.

I logged in as the support user and it didn’t work:

[support@c1n1 ~]$ sudo /bin/ls /var/log/*
support is not in the sudoers file.  This incident will be reported.

Not only did it not work, it was telling me that the support user wasn’t even in the file, which they clearly were.

After Googling around a bit and not finding much I saw this in the Sudoers Manual:

sudo will read each file in /etc/sudoers.d, skipping file names that end in ‘~’ or contain a ‘.’ character to avoid causing problems with package manager or editor temporary/backup files.

sudo was skipping the file because the file name contained a period!

I changed the name of the file from support.conf to support and it worked.

  - name: Add custom sudoers
    copy: src=files/support dest=/etc/sudoers.d/support owner=root group=root mode=0440 validate='visudo -cf %s'

Hope you find this useful.

Here’s a snippet from /etc/sudoers.d/support if you’re interested. The “support” user has already been created by a separate Ansible command.

# Networking
Cmnd_Alias NETWORKING = /sbin/route, /sbin/ifconfig, /bin/ping, /sbin/dhclient, /usr/bin/net, /sbin/iptables, /usr/bin/rfcomm, /usr/bin/wvdial, /sbin/iwconfig, /sbin/mii-tool

# Installation and management of software
Cmnd_Alias SOFTWARE = /bin/rpm, /usr/bin/up2date, /usr/bin/yum

# Services
Cmnd_Alias SERVICES = /sbin/service, /sbin/chkconfig

# Reading logs
Cmnd_Alias READ_LOGS = /usr/bin/less /var/log/*, /bin/more /var/log/*, /bin/ls /var/log/*, /bin/ls /var/log

support  ALL = NETWORKING, SOFTWARE, SERVICES, READ_LOGS
Share Button


Restarting network interfaces in Ansible

I’m using Ansible to set up the network interface cards of multiple racks of storage servers running Centos 6.6. Each server has four network interfaces to configure, a public 1GbE interface, a private 1GbE interface, and two 10GbE interfaces that are set up as a bonded 20GbE interface with two VLANs assigned to the bond.

If Ansible changes an interface on a server it calls a handler to restart the network interfaces so the changes go into effect. However, I don’t want the network interfaces of every single server in a cluster to restart at the same time, so at the beginning of my network.yml playbook I set:

  serial: 1

That way Ansible just updates the network config of one server at a time.

Also, if there are any failures I want Ansible to stop immediately, so if I screwed something up I don’t take out the networking to every computer in the cluster. For this reason I also set:

max_fail_percentage: 1

If a change is made to an interface I’ve been using the following handler to restart the interface:

- name: Restart Network
  service: name=network state=restarted

That works, but about half the time Ansible detects a failure and drops out with an error, even though the network restarted just fine. Checking the server immediately after Ansible says that there’s an error shows that the server is running and it’s network interfaces were configured correctly.

This behavior is annoying since you have to restart the entire playbook after one server fails. If you’re configuring many racks of servers and the network setup is just updating one server at a time I’d end up having to restart the playbook a half dozen times to get through it, even though nothing was actually wrong.

At first I thought that maybe the ssh connection was dropping (I was restarting the network after all) but you can log in via ssh and restart the network and never lose the connection, so that wasn’t the problem.

The connection does pause as the interface that you’re ssh-ing in over resets, but the connection comes right back.

I wrote a short script to repeatedly restart the network interfaces and check the exit code returned, but the exit code was always 0, “no errors”, so network restart wasn’t reporting an error, but for some reason Ansible thought there was a failure.

There’s obviously some sort of timing issue causing a problem, where Ansible is checking to see if all is well, but since the network is being reset the check times out.

I initially came up with this workaround:

- name: Restart Network
  shell: service network restart; sleep 3

That fixes the problem, however, since “sleep 3” will always exit with a 0 exit code (success), Ansible will always think this worked even when the network restart failed. (Ansible takes the last exit code returned as the success/failure of the entire shell operation.) If “service network restart” actually does fail, I want Ansible to stop processing.

In order to preserve the exit code, I wrote a one-line Perl script that restarts the network, sleeps 3 seconds, then exits with the same exit code returned by “service network restart”.

- name: Restart Network
  # Restart the network, sleep 3 seconds, return the
  # exit code returned by "service network restart".
  # This is to work-around a glitch in Ansible where
  # it detects a successful network restart as a failure.
  command: perl -e 'my $exit_code = system("service network restart"); sleep 3; $exit_code = $exit_code >> 8; exit($exit_code);'

Now Ansible grinds through the network configurations of all of the hosts in my racks without stopping.

Hope you find this useful.

Share Button