post

Quickly create guest VMs using virsh, cloud image files, and cloud-init

After the latest updates to the code these scripts now create VMs from full Linux distros in a few seconds.

I was looking for a way to automate the creation of VMs for testing various distributed system / cluster software packages. I’ve used Vagrant in the past but I wanted something that would:

  • Allow me to use raw image files as the basis for guest VMs.
  • Guest VMs should be set up with bridged IPs that are routable from the host.
  • Guest VMs should be able to reach the Internet.
  • Other hosts on the local network should be able to reach guest VMs. (Setting up additional routes is OK).
  • VM creation should work with any distro that supports cloud-init.
  • Scripts should be able to create and delete VMs in a scripted, fully-automatic manner.
  • Guest VMs should be set up to allow passwordless ssh access from the “ansible” user, so that once a VM is running Ansible can be used for additional configuration and customization of the VM.

I’ve previously used virsh’s virt-install tool to create VMs and I like how easy it is to set up things like extra network interfaces and attach existing disk images. The scripts in this repo fully automate the virsh VM creation process.

cloud-init

The current version of the create-vm script uses cloud images, cloud-init, and virsh tooling to quickly create VMs from the command line. Using a single Linux host system you can create multiple guest VMs running on that host. Each guest VM has its own file system, memory, virtualized CPUs, IP address, etc.

Cloud Images

create-vm creates a QCOW2 file for your VM’s file system. The QCOW2 image uses the cloud image as a base filesystem, so instead of copying all of the files that come with a Linux distribution and installing them, QCOW will just use files directly from the base image as long as those files remain unchanged. QCOW stands for “QEMU Copy On Write”, so once you make a change to a file the changes are written to your VM’s QCOW2 file.

Cloud images have the extension .img or .qcow and are compiled for different system architectures.

Cloud images are available for the following distros:

Pick the base image for the distro and release that you want to install and download it onto your host system. Make sure that the base image uses the same hardware architecture as your host system, e.g. “x86_64” or “amd64” for Intel and AMD -based host systems, “arm64” for 64 bit ARM-based host systems.

cloud-init configuration

cloud-init reads in two configuration files, user-data and meta-data, to initialize a VM’s settings. One of the places it looks for these files is any attached disk volume labeled cidata.

The create-vm script creates an ISO disk called cidata with these two files and passes that in as a volume to virsh when it creates the VM. This is referred to as the “no cloud” method, so if you see a cloud image for “nocloud” that’s the one you want to use.

If you’re interested in other ways of doing this check out the Datasources documentation on for cloud-init.

Files

create-vm stores files as follows:

  • ${HOME}/vms/base/ – Place to store your base Linux cloud images.
  • ${HOME}/vms/images/ – your-vm-name.img and your-vm-name-cidata.img files.
  • ${HOME}/vms/init/ – user-data and meta-data.
  • ${HOME}/vms/xml/ – Backup copies of your VMs’ XML definition files.

QCOW2 filesystems allocate space as needed, so if you create a VM with 100GB of storage, the initial size of the your-vm-name.img and your-vm-name-cidata.img files is only about 700K total. The your-vm-name.img file will grow as you install packages and update files, but will never grow beyond the disk size that you set when you create the VM.

Scripts

The create-vm repo contains these scripts:

  • create-vm – Use .img and cloud-init files to auto-generate a VM.
  • delete-vm – Delete a virtual machine created with create-vm.
  • get-vm-ip – Get the IP address of a VM managed by virsh.

Host setup

I’m running the scripts from a host with Ubuntu Linux 22.04 installed. I added the following to the host’s Ansible playbook to install the necessary virtualization packages:

  - name: Install virtualization packages
    apt:
      name: "{{item}}"
      state: latest
    with_items:
    - libvirt-bin
    - libvirt-clients
    - libvirt-daemon
    - libvirt-daemon-system
    - libvirt-daemon-driver-storage-zfs
    - python-libvirt
    - python3-libvirt
    - virt-manager
    - virtinst

If you’re not using Ansible just apt-get install the above packages.

Permissions

The libvirtd daemon runs under the libvirt-qemu user service account. The libvirt-qemu user must be able to read the files in ${HOME}/vms/. If your ${HOME} directory has permissions set to 0x750 then libvirt-qemu won’t be able to read the ${HOME}/vms/ directory.

You could open up your home directory, e.g.:

chmod 755 ${HOME}

… but that allows anyone logged into your Linux host to read everything in your home directory. A better approach is just to add libvirt-qemu to your home directory’s group. For instance, on my host my home directory is /home/earl owned by user earl and group earl, permissions 0x750:

$ chmod 750 /home/earl
$ ls -al /home
total 24
drwxr-xr-x   6 root      root      4096 Aug 28 21:26 .
drwxr-xr-x  21 root      root      4096 Aug 28 21:01 ..
drwxr-x--- 142 earl      earl      4096 Feb 16 09:27 earl

To make sure that only the libvirt-qemu user can read my files I can add the user to the earl group:

$ sudo usermod --append --groups earl libvirt-qemu
$ sudo systemctl restart libvirtd
$ grep libvirt-qemu /etc/group
earl:x:1000:libvirt-qemu
libvirt-qemu:x:64055:libvirt-qemu

That shows that the group earl, group ID 1000, has a member libvirt-qemu. Since the group earl has read and execute permissions on my home directory, libvirt-qemu has read and execute permissions on my home directory.

Note: The libvirtd daemon will chown some of the files in the directory, including the files in the ~/vms/images directory, to be owned by libvirt-qemu group kvm. In order to delete these files without sudo, add yourself to the kvm group, e.g.:

$ sudo usermod --append --groups kvm earl

You’ll need to log out and log in again before the additional group is active.

create-vm options

create-vm supports the following options:

OPTIONS:
   -h      Show this message
   -n      Host name (required)
   -i      Full path and name of the base .img file to use (required)
   -k      Full path and name of the ansible user's public key file (required)
   -r      RAM in MB (defaults to 2048)
   -c      Number of VCPUs (defaults to 2)
   -s      Amount of storage to allocate in GB (defaults to 80)
   -b      Bridge interface to use (defaults to virbr0)
   -m      MAC address to use (default is to use a randomly-generated MAC)
   -v      Verbose

Create an Ubuntu 22.04 server VM

This creates an Ubuntu 22.04 “Jammy Jellyfish” VM with a 40G hard drive.

First download a copy of the Ubuntu 22.04 “Jammy Jellyfish” cloud image:

mkdir -p ~/vms/base
cd ~/vms/base
wget http://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img

Then create the VM:

create-vm -n node1 \
    -i ~/vms/base/jammy-server-cloudimg-amd64.img \
    -k ~/.ssh/id_rsa_ansible.pub \
    -s 40

Once created I can get the IP address and ssh to the VM as the user “ansible”:

$ get-vm-ip node1
192.168.122.219
$ ssh -i ~/.ssh/id_rsa_ansible ansible@192.168.122.219
The authenticity of host '192.168.122.219 (192.168.122.219)' can't be established.
ED25519 key fingerprint is SHA256:L88LPO9iDCGbowuPucV5Lt7Yf+9kKelMzhfWaNlRDxk.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.122.219' (ED25519) to the list of known hosts.
Welcome to Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-60-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Feb 15 20:05:45 UTC 2023

  System load:  0.47216796875     Processes:             105
  Usage of /:   3.7% of 38.58GB   Users logged in:       0
  Memory usage: 9%                IPv4 address for ens3: 192.168.122.219
  Swap usage:   0%

Expanded Security Maintenance for Applications is not enabled.

0 updates can be applied immediately.

Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status



The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ansible@node1:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           198M 1008K  197M   1% /run
/dev/sda1        39G  1.5G   38G   4% /
tmpfs           988M     0  988M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      105M  6.1M   99M   6% /boot/efi
tmpfs           198M  4.0K  198M   1% /run/user/1000
ansible@node1:~$

Note that this VM was created with a 40GB hard disk, and the total disk space shown is 40GB, but the actual hard drive space initially used by this VM was about 700K. The VM can consume up to 40GB, but will only use the space it actually needs.

Create 8 Ubuntu 22.04 servers

This starts the VM creation process and exits. Creation of the VMs continues in the background.

for n in `seq 1 8`; do
    create-vm -n node$n -i ~/vms/base/jammy-server-cloudimg-amd64.img -k ~/.ssh/id_rsa_ansible.pub
done

Delete 8 virtual machines

for n in `seq 1 8`; do
    delete-vm node$n
done

Connect to a VM via the console

virsh console node1

Connect to a VM via ssh

ssh ansible@$(get-vm-ip node1)

Generate an Ansible hosts file

(
    echo '[hosts]'
    for n in `seq 1 8`; do
        ip=$(get-vm-ip node$n)
        echo "node$n ansible_host=$ip ip=$ip ansible_user=ansible"
    done
) > hosts.ini

Handy virsh commands

virsh list – List all running VMs.

virsh domifaddr node1 – Get a node’s IP address. Does not work with all network setups, which is why I wrote the get-vm-ip script.

virsh net-list – Show what networks were created by virsh.

virsh net-dhcp-leases $network – Shows current DHCP leases when virsh is acting as the DHCP server. Leases may be shown for machines that no longer exist.

Hope you find this useful.

post

Gitlab: The source branch does not exist [SOLVED]

Got an interesting error on Gitlab today. On an MR that had passed tests and had been approved, Gitlab would not allow the branch to be merged because “The source branch [Branch Name] does not exist. Please restore it or use a different source branch.”

Checking the “Changes” tab in Gitlab showed all of the changes that I expected to see. So Gitlab could see the changes and knew the branch name.

I checked the commit SHA connected to the branch, and Gitlab showed all of the changes that I expected to see. Gitlab knew the correct SHA and was associating the SHA with the branch name, but the “Overview” tab still showed the error message “The source branch [Branch Name] does not exist. Please restore it or use a different source branch.”

I Googled to see if anyone else had had the issue. Many people had. This issue report on Gitlab.com shows that the problem goes back at least 3 years. The official response from Gitlab is that “This is no longer an active issue.”

I beg to differ. I’m using GitLab Enterprise Edition 14.8.6-ee and it still has this issue.

I cloned a fresh copy of the repo from Gitlab on my laptop and checked the branches with:

git branch -a

The branch was listed. I checked it out. It checked out just fine. I diffed it against the main branch. All of the changes were present.

So the branch exists on the repository on the server, it has all of the changes in it that I expect, there is nothing wrong with the Git repo itself, it’s just Gitlab that has some sort of disconnect.

To work around the problem I added an empty commit to the branch and pushed it to the Gitlab repo:

$ git commit --allow-empty -m "Empty commit"
$ git push

Once I did that the error message cleared. Gitlab had to re-run the tests, but once the tests passed I could merge the branch.

Hope you find this useful.

Update: I spoke to one of the people who manages GitLab Enterprise for my team and apparently Gitlab works by scraping the Git repo for updates and changes using a “scraper service”. If the service is off-line or reset at the time your commit is pushed it’s supposed to pick up any missed pushes when the service restarts, except sometimes it doesn’t work.

As long as the service is on-line the next time you push a commit then Gitlab will pick up the new commit and any missed commits, and everything will be back in sync.

post

Calculating the value for 64bitMMIOSizeGB

When adding a GPU to a vSphere VM using PCI passthrough there are a couple of additional settings that you need to make or your VM won’t boot.

When creating the VM you’ll need to set the Actions > Edit > VM Options > Boot Options > Firmware and select “EFI”. You need to do this before you install the operating system on the VM. If you don’t do this the GPUs won’t work and the VM won’t boot.

To add a GPU, in vCenter go to the VM, select Actions > Edit > Add New Device. Any GPUs set up as PCI passthrough devices should appear in a pick list. Add one or more GPUs to your VM.

Note that after adding one device, when you add additional GPUs the first GPU you selected still appears in the pick list. If you add the same GPU more than once your VM will not boot. If you add a GPU that’s being used by another running VM your VM will not boot. Pay attention to the PCI bus addresses displayed and make sure that the GPUs you pick are unique and not in use on another VM.

Finally you have to set up memory-mapped I/O (MMIO) to map system memory to the GPU’s framebuffer memory so that the CPU can pass data to the GPU. In vCenter go to the VM, select Actions > Edit > VM Options > Advanced > Edit configuration.

Once you’re on the Configuration parameters screen, add two more parameters:

pciPassthru.use64bitMMIO = TRUE
pciPassthru.64bitMMIOSizeGB = ????
Actions > Edit > VM Options > Advanced > Edit configuration

The 64bitMMIOSizeGB value is calculated by adding up the total GB of framebuffer memory on all GPUs attached to the VM.  If the total GPU framebuffer memory falls on a power-of-2, setting pciPassthru.64bitMMIOSizeGB to the next power of 2 works.

If the total GPU framebuffer memory falls between two powers-of-2, round up to the next power of 2, then round up again, to get a working setting.

Powers of 2 are 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 …

For example, two NVIDIA A100 cards with 40GB each = 80GB (in between 64GB and 128GB), so round up to the next power of 2 (128GB), then round up again to the next power of 2 after that (256GB) to get the correct setting. If you set it too low the VM won’t boot, but it won’t give you an error message telling you what the issue is either.

Here are some configurations that I’ve tested and verified:

  • 2 x 16GB NVIDIA V100 = 32GB, 32 is a power of 2, so round up to the next power of 2 which is 64, set pciPassthru.64bitMMIOSizeGB = 64 to boot.
  • 2 x 24GB NVIDIA P40 = 48GB, which is in-between 32 and 64, round up to 64 and again to 128, requires pciPassthru.64bitMMIOSizeGB = 128 to boot.
  • 8 x 16GB NVIDIA V100 = 128GB, 128 is a power of 2, so round up to the next power of 2 which is 256, set pciPassthru.64bitMMIOSizeGB = 256 to boot.
  • 10 x 16GB NVIDIA V100 = 160GB, which is in-between 128 and 256, round up to 256 and again to 512, set pciPassthru.64bitMMIOSizeGB = 512 to boot.

Hope you find this useful.

post

Setting up NFS FSID for multiple networks

The official documentation for creating an NFS /etc/exports file for multiple networks and FSIDs is unclear and confusing. Here’s what you need to know.

If you need to specify multiple subnets that are allowed to mount a volume, you can either use separate lines in /etc/exports, like so:

/opt/dir1 192.168.1.0/24(rw,sync)
/opt/dir1 10.10.0.0/22(rw,sync)

Or you can list each subnet on a single line, repeating all of the mount options for each subnet, like so:

/opt/dir1 192.168.1.0/24(rw,sync) 10.10.0.0/22(rw,sync)

These are both equivalent. They will allow clients in the 192.168.1.0/24 and 10.10.0.0/22 subnets to mount the /opt/dir1 directory via NFS. A client in a different subnet will not be able to mount the filesystem.

When I’m setting up NFS servers I like to assign each exported volume with a unique FSID. If you don’t use FSID, there is a chance that when you reboot your NFS server the way that the server identifies the volume will change between reboots, and your NFS clients will hang with “stale file handle” errors. I say “a chance” because it depends on how your server stores volumes, what version of NFS it’s running, and if it’s a fault tolerant / high availability server, how that HA ability was implemented. Using a unique FSID ensures that the volume that the server presents is always identified the same way, and it makes it easier for NFS clients to reconnect and resume operations after an NFS server gets rebooted.

If you are using FSID to define a unique filesystem ID for each mount point you must include the same FSID in the export options for a single volume. It’s the “file system identifier”, so it needs to be the same each time a single filesystem is exported. If I want to identify /opt/dir1 as fsid=1 I have to include that declaration in the options every time that filesystem is exported. So for the examples above:

/opt/dir1 192.168.1.0/24(rw,sync,fsid=1)
/opt/dir1 10.10.0.0/22(rw,sync,fsid=1)

Or:

/opt/dir1 192.168.1.0/24(rw,sync,fsid=1) 10.10.0.0/22(rw,sync,fsid=1)

If you use a different FSID for one of these entries, or if you declare the FSID for one subnet and not the other, your NFS server will slowly and mysteriously grind to a halt, sometimes over hours and sometimes over days.

For NFSv4, there is the concept of a “distinguished filesystem” which is the root of all exported filesystems. This is specified with fsid=root or fsid=0, which both mean exactly the same thing. Unless you have a good reason to create a distinguished filesystem don’t use fsid=0, it will just add unnecessary complexity to your NFS setup.

Hope you find this useful.

post

Updating ESXi root passwords and authorized ssh keys with Ansible

I manage a number of vCenter instances and a lot of ESXi hosts. Some of the hosts are production, some for test and development. Sometimes an ESXi host needs to be used by a different group or temporarily moved to a new cluster and then back again afterwards.

To automate the configuration of these systems and the VMs running on them I use Ansible. For a freshly-imaged, new installation of ESXi one of the first things I do it to run an Ansible playbook that sets up the ESXi host, and the first thing it does is to install the ssh keys of the people who need to log in as root, then it updates the root password.

I have ssh public keys for every user that needs root access. A short bash script combines those keys and my Ansible management public key into authorized_keys files for the ESXi hosts in each vCenter instance. In my Ansible group_vars/ directory is a file for each group of ESXi hosts, so all of the ESXi hosts in a group get the same root password and ssh keys. This also makes it easy to change root passwords and add and remove ssh keys of users as they are added to or leave different groups.

Here’s a portion of a group_vars/esxi_hosts_cicd/credentials.yml file for a production CICD cluster:

# ESXI Hosts (only Ops can ssh in)
esxi_root_authorized_keys_file: authkeys-ops

esxi_username: 'root'
esxi_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          34633832366431383630653735663739636466316262
          39363165663566323864373930386239380085373464
          32383863366463653365383533646437656664376365
          31623564336165626162616263613166643462356462
          34633832366431383630653735663739636466316262
          39363165663566323864373930386239380085373464
          32383863366463653365383533646437656664376365
          31623564336165626162616263613166643462356462
          3061

The password is encrypted using Ansible Vault.

In my main.yml file I call the esxi_host role for all of the hosts in the esxi_hosts inventory group. Since I use a different user to manage non-ESXi hosts, the play that calls the role tells Ansible to use the root user only when logging into ESXi hosts.

- name: Setup esxi_hosts
  gather_facts: False
  user: root
  hosts: esxi_hosts
  roles:
    - esxi_host

The esxi_host role has an esxi_host/tasks/main.yml playbook. The two plays that update the authorized_keys file and root password look like this:

- name: Set the authorized ssh keys for the root user
  copy:
    src: "{{ esxi_root_authorized_keys_file }}"
    dest: /etc/ssh/keys-root/authorized_keys
    owner: root
    group: root
    mode: '0600'

- name: Set the root password for ESXI Hosts
  shell: "echo '{{ esxi_password }}' | passwd -s"
  no_log: True

The first time I run this the password is set to some other value, so I start Ansible with:

ansible-playbook main.yml \
    --vault-id ~/path/to/vault/private/key/file \
    -i inventory/ \
    --limit [comma-separated list of new esxi hosts] \
    --ask-pass \
    --ask-become-pass

This will prompt me for the current root ssh password. Once I enter that it logs into each ESXi host, installs the new authorized_keys file, uses the vault private key to decrypt the password, then updates the root password.

After I’ve done this once, since the Ansible ssh key is also part of the authorized_keys file, subsequent Ansible updates just use the ssh key to login, and I don’t have to use --ask-pass or --ask-become-pass parameters.

This is also handy when switching a host from one cluster to another. As long as the ssh keys are installed I no longer need the current root password to update the root password.

Hope you find this useful.