Fixing Docker pull timeout errors in Jenkins

Jenkins has a Docker plugin that allows you to authenticate with a docker image registry, pull the container image that you want, and then run tests inside the container. The plugin works well, and you’re guaranteed a “known starting state” for your test since you’re running in a freshly-instantiated container.

The issue that I ran into is that my docker registry is in a different data center than my test environment, there are tests running 24×7, and occasionally there are temporary outages on the Internet in-between the two data centers. When that happens the plugin’s docker.image().pull() command will fail. Even if the image is stored locally it will fail because it attempts to verify the locally-stored image’s signature against the registry image’s signature.

Since the Jenkinsfile is written in Groovy my solution was to add a Groovy function to the top of the Jenkinsfile that wrapped the docker.image().pull() command inside of a retry loop:

// Since docker.image().pull() will fail during
// intermittent or temporary network outages wrap
// it in a retry loop.
def public static docker_image_pull(Object docker, String imageName) {
    def int attempt = 0
    def int max_attempts = 10
    def boolean success = false
    while ((! success) && (attempt < 10)) {
        try {
            success = true
        } catch (Exception err) {
            attempt += 1
            def sleep_sec = attempt * 30
            def sleep_ms = 1000 * sleep_sec
            println("Attempt #${attempt}: Failed to pull ${imageName}. Sleeping ${sleep_sec}s")
    if (! success) {
        throw new Exception("Failed to pull ${imageName}")

The function will attempt to pull the image up to 10 times. It has a back-off delay so it first sleeps 30s and retries, then 60s, then 90s, etc. For temporary / intermittent network outages this eventually succeeds. For longer network outages, or of the problem is something else such as “image doesn’t exist” the function will eventually time out and exit with an error.

After that I just had to replace every instance of docker.image('myImageName:latest').pull() inside the Jenkinsfile with:

docker_image_pull(docker, 'myImageName:latest')

Hope you find this useful.

Share Button

How to make the best drip-brewed coffee every time

My sister was visiting and I made a pot of coffee. My sister had a cup and said “That’s really good coffee. How did you make it?”

I’ve gotten that same response from many, many people who drink my coffee and I finally realized that there’s a lot of people who just don’t know how to make a pot of coffee. There’s no magic to it: start with good coffee and measure what you put in the coffee pot.

Pretentious coffee snobs are annoying. No one cares about your perfect cup of espresso. If you’re one of those people just go away and die somewhere. Really.

When I bought my first drip coffee maker years ago — a Cuisinart 12-Cup Brew Central Programmable Coffee Maker that’s still being manufactured and sold today — I wanted to make coffee like Starbucks, so I went to the Starbucks web site and read their instructions.

Step Zero – Buy good coffee. They don’t actually say that on the Starbucks site, but they’re assuming you’re buying Starbucks ground coffee. You can’t make good food from bad ingredients, and you can’t make good coffee from inferior beans.

Buy some good quality coffee. It doesn’t have to be from Starbucks, but it can’t be dry brown dust with bits of bean husk in it. The grounds should smell like the best cup of coffee you ever had and look like rich black loam. If you’re looking for suggestions try Starbucks Casi Cielo or Peet’s Sulawesi Kalosi.

Step One – Choose the right grind
For a flat bottom filter, use a medium grind that resembles sea salt. Cone filters use a finer grind that resembles granulated sugar.

Starbucks – How to brew coffee at home

That seems clear enough. My Cuisinart has a cone filter, so get ground coffee that looks like granulated sugar. Easy. Done.

Trigger alert for pretentious coffee snobs: I tend to buy Peet’s coffee and grind the whole pound at once. A pound lasts about a week at my house, so it’s not going to lose any of its vital essence. Get over yourself.

Step Two – Measure
Use 2 tablespoons of freshly ground coffee for every 6 ounces of water.

— Starbucks – How to brew coffee at home

I think that this is where they lose people. 6 ounces of water? My coffee pot says it makes 12 cups. How many tablespoons for a pot of coffee? What if I want to make a half a pot? Or 4 cups?

First off, if you try to make a pot of coffee and you’re measuring grounds by the tablespoon your measurements will be off every time. You will add too much coffee or not enough. Don’t use a tablespoon.

Second, the cups on a drip coffee pot are not the same as a 1 cup measure. A one cup measure is 8 fluid ounces. My “12 cup” Cuisinart pot holds 64 fluid ounces of coffee, so “1 Cuisinart cup” = 5-1/3 fluid ounces. WTF?

Using the Starbucks measuring method I’d need 10-2/3 tablespoons per pot. Try it and you’ll get weak, underwhelming coffee. That’s not how they make coffee at Starbucks.

Make a perfect pot of coffee

My method is simple: Use the lines on the pot to measure water, use measuring cups to measure coffee grounds.

Full pot of coffee – fill the pot to 12 “cups” of water, use 1 cup of ground coffee.

Half pot – 6 “cups” water, 1/2 cup ground coffee.

Third pot – 4 “cups” water, 1/3 cup ground coffee.

Quarter pot – 3 “cups” water, 1/4 cup ground coffee.

Easy, right? Full pot, one cup of ground coffee. Half pot, 1/2 cup of ground coffee. Quarter pot, 1/4 cup of ground coffee.

For advanced coffee preparers:

Two-thirds of a pot – 8 cups of water, add 1/3 cup of ground coffee TWICE!

10 cups of coffee – 10 cups of water, 1/2 and 1/3 cups of ground coffee.

Try this method and and I GUARANTEE you’ll never go back to whatever you were doing to make coffee before or NO MONEY BACK. You too will hear the words “That’s really good coffee. How did you make it?”

Hope you find this useful.

Share Button

Automatically decrypt multiple LUKS-encrypted volumes

I’ve written in the past on Adding an external encrypted drive with LVM to Ubuntu Linux and Adding a LUKS-encrypted iSCSI volume to Synology DS414 NAS but I neglected to mention how to automatically decrypt additional volumes.

When installing a fresh copy of Ubuntu one of the options is to install with a LUKS-encrypted Logical Volume Manager Volume Group (LVM VG). This puts your root volume on the encrypted LVM VG. When you power up your machine Ubuntu prompts you to enter the decryption passphrase in order to decrypt the VG and start your computer. Without the passphrase the contents of your hard drive are unreadable.

If you add encrypted external drives and/or additional VGs you will end up with multiple encrypted volumes. Ubuntu will prompt you for the passphrase of each additional encrypted volume when you boot up the machine.

If you don’t want to enter multiple, different passphrases each time you boot, you can store the passphrases for additional volumes on the encrypted root filesystem of your first drive using the /etc/crypttab file. You’ll just be prompted for one passphrase, of the first VG, and that decrypts the passphrases needed to decrypt the additional volumes.

Here’s how it works.

The /etc/crypttab file contains 4 fields per line: the name of the encrypted volume, a UUID identifying the storage device, the name of a file with the decryption passphrase, and encryption options.

nvme0n1p5   UUID=405d8c73-1cf9-4b2c-9b8e-c76b90d27c67 none                        luks,discard
datastorage UUID=f2d73ac8-1ef1-4735-9dd4-9e778fc9e781 /root/.luks-datastorage     luks,discard
external1   UUID=0140476b-dd0b-4aab-b7d4-2f5fa14d1a0c /root/.luks-backupexternal1 luks
external2   UUID=610a67d4-c4f6-4b73-a824-a437971e8d24 /root/.luks-backupexternal2 luks
iscsi       UUID=b106b749-f4ab-44be-8962-6ff867dc074e /root/.luks-backupiscsi     luks

The first volume, nvme0n1p5, is the encrypted boot volume. It contains the root filesystem and the /root home directory. The third field is “none” which means that Ubuntu will prompt you for a decryption passphrase in order to unlock and decrypt the drive.

The remaining volumes have files defined that contain the decryption passphrase for each volume. Those files are hidden files in the /root home directory. Once the nvme0n1p5 volume is decrypted and mounted, the remaining volumes are automatically decrypted using the passphrases stored in the hidden files.

The end result is that all of your drives are encrypted, but you only have to enter one passphrase to unlock all of your drives.

Hope you find this useful.

Share Button

Too many authentication failures

I was working with a new Linux distro and after creating a brand-new VM with a single login I attempted to ssh into the VM only to be greeted with:

Received disconnect from port 22:2: Too many authentication failures
Disconnected from port 22

It was a new VM, and I hadn’t loaded an ssh key (there was no option to do so in the install). I’d set up a user and password, so I expected to get a password prompt. I didn’t get to a password prompt, just an immediate disconnect.

I used ssh -vvv to connect and found that my ssh client was attempting to use my ssh keys, as ssh is supposed to, and on the third key the VM spat back the error:

Received disconnect from port 22:2: Too many authentication failures
Disconnected from port 22

Well, I wanted to connect with a password anyhow, so I tried:

ssh -o PubkeyAuthentication=no username@

I was greeted with a password: prompt.

I checked the /etc/ssh/sshd_config and found that someone who’d built the install image had changed the default setting for MaxAuthTries from 6 to 2.

The MaxAuthTries setting tells the ssh daemon how many different authentication attempts a user can try before it disconnects. Each ssh key loaded into ssh-agent counts as one authentication attempt. The default is 6 because many users (like me) have multiple ssh keys loaded into ssh-agent so that we can automatically log into different hosts that use different ssh keys. Trying more than one ssh key isn’t the same as thumb-fingering a password — ssh is designed to allow for multiple key attempts. After the ssh connection attempts all of your ssh keys and you haven’t run out of attempts and passwords are enabled you’ll eventually get a password prompt.

Setting MaxAuthTries back to the more reasonable default of 6 and reloading the sshd daemon fixed the issue. Apparently whoever tested the setup only has one ssh key and wasn’t aware of what changing the MaxAuthTries setting does when people with more than one key attempt to log in.

If you’re concerned about ssh security sshd_config allows you to control what versions of the ssh protocol are supported, which ciphers you trust (or don’t trust), and to tune other settings that lock down what you will or won’t allow ssh to do in your environment. It may be that for some applications in some environments setting MaxAuthTries 2 makes sense, but using it for an out of the box installation just breaks ssh for no good reason.

Hope you find this useful.

Share Button

Install a local .deb file and its dependencies

To install a local deb file and its dependencies use apt, not dpkg:

sudo apt install ./foo-1.2.3.deb

You’ll automatically get all of the dependencies installed with the package. (dpkg doesn’t understand dependencies or repos, apt does.)

The leading ./, or a full or relative path to the deb file, is required. The path is what tells apt that it’s a local file.

Hope you find this useful.

Share Button

Determine maximum MTU

I first started paying attention to network MTU settings when I was building petabyte-scale object storage systems. Tuning the network that backs your storage requires maximizing the size of the data packets and verifying that packets aren’t being fragmented. Currently I’m working on performance tuning the processing of image data using racks of GPU servers and verifying the network MTU came up again. I dug up a script I’d used before and thought I’d share it in case other people run into the same problem.

You can set the host network interface’s MTU setting to 9000 on all of the hosts in your network to enable jumbo frames, but how can you verify that the settings are working? If you’ve set up servers in a cloud environment using multiple availability zones or multiple regions, how can you verify that there isn’t a switch somewhere in the middle of your connection that doesn’t support MTU 9000 and fragments your packets?

Use this shell script:

while ping -s $size -M do -c1 $target_host >&/dev/null; do
echo "Max MTU size: $((size-4+28))"

-s $size sets the size of the packet being sent.

-M do prohibits fragmentation, so ping fails if the packet fragments.

-c1 sends 1 packet only.

size-4+28 = subtract the last 4 bytes added (that caused the fragmentation), add 28 bytes for the IP and ICMP headers.

If minimizing packet fragmentation is important to you, set MTU to 9000 on all hosts and then run this test between every pair of hosts in the network. If you get an unexpectedly low value, troubleshoot your switch and host settings and fix the issue.

Assuming that all of your hosts and switches are configured at their maximum MTU values, then the minimum value returned from the script is the actual maximum MTU you can support without fragmentation. Use the minimum value returned as your new host interface MTU setting.

If you’re operating in a cloud environment you may need to repeat this exercise from time to time as switches are changed and upgraded at your cloud provider.

Hope you find this useful.

Share Button

The Right Way to reboot a host with Ansible

For a long time rebooting a host with Ansible has been tricky. The steps are:

  • ssh to the host
  • Reboot the host
  • Disconnect before the host closes your ssh connection
  • Wait some number of seconds to ensure the host has really shut down
  • Attempt to ssh to the host and execute a command
  • Repeat ssh attempt until it works or you give up

Seems clear enough, but if you Google for an answer you may end up at this StackExchange page that gives lots of not-quite-correct answers from 2015 (and one correct answer). Some people suggest checking port 22, but just because ssh is listening doesn’t mean that it’s at state where it’s accepting connections.

The correct answer is use Ansible version 2.7 or greater. 2.7 introduced the reboot command, and now all you have to do is add this to your list of handlers:

- name: Reboot host and wait for it to restart
    msg: "Reboot initiated by Ansible"
    connect_timeout: 5
    reboot_timeout: 600
    pre_reboot_delay: 0
    post_reboot_delay: 30
    test_command: whoami

This handler will:

  • Reboot the host
  • Wait 30 seconds
  • Attempt to connect via ssh and run whoami
  • Disconnect after 5 seconds if it ssh isn’t working
  • Keep attempting to connect for 10 minutes (600 seconds)

Add the directive:

  notify: Reboot host and wait for it to restart

… to any Ansible command that requires a reboot after a change. The host will be rebooted when the playbook finishes, then Ansible will wait until the host is back up and ssh is working before continuing on to the next playbook.

If you need to reboot halfway through a playbook you can force all handlers to execute with the command:

- name: Reboot if necessary
  meta: flush_handlers

I sometimes do that to change something, force a reboot, then verify that the change worked, all within the same playbook.

Hope you found this useful.

Share Button

Creating AWS Elastic Filesystems (EFS) with Terraform

The AWS Elastic Filesystem (EFS) gives you an NFSv4-mountable file system with almost unlimited storage capacity. The filesystem I just created to write this article reports 9,007,199,254,739,968 bytes free. In human-readable format df -kh reports 8.0E (Exabytes) of available disk space. In the year 2019, that’s a lot of storage space.

In past articles I’ve shown how to create EFS resources manually, but this week I wanted to programmatically create EFS resources with Terraform so that I could easily create, test, and tear-down EFS and VM resources on AWS.

I also wanted to make sure that my EFS resources are secure, that only VMs within my Virtual Private Cloud (VPC) could access the EFS data, so that no one outside of my VPC could mount or otherwise access the data.

Creating an EFS resource is easy. The Terraform code looks like this:

resource "aws_efs_file_system" "efs-example" {
creation_token = "efs-example"
performance_mode = "generalPurpose"
throughput_mode = "bursting"
encrypted = "true"
tags = {
Name = "EfsExample"

This creates the EFS filesystem on AWS. EFS also requires a mount target, which gives your VMs a way to mount the EFS volume using NFS. The Terraform code to create a mount target looks like this:

// (continued)
resource "aws_efs_mount_target" "efs-mt-example" {
file_system_id = "${}"
subnet_id = "${}"
security_groups = ["${}"]

The file_system_id is automatically set to the efs-example resource’s ID, which ties the mount target to the EFS file system.

The subnet_id for subnet-efs is a separate /24 subnet I created from my VPC just for EFS. The ingress-efs security group is a separate security group I created for EFS. Let’s cover each one of these separately.

A separate EFS subnet

First off I’ve allocated a /16 subnet for my VPC and I carve out individual /24 subnets from that VPC for each cluster of VMs and/or EFS resources that I add to an AWS availability zone. Here’s how I’ve defined my test environment VPC and EFS subnet:

resource "aws_vpc" "test-env" {
cidr_block = ""
enable_dns_hostnames = true
enable_dns_support = true
tags {
Name = "test-env"

resource "aws_subnet" "subnet-efs" {
cidr_block = "${cidrsubnet(aws_vpc.test-env.cidr_block, 8, 8)}"
vpc_id = "${}"
availability_zone = "us-east-1a"

That will give me the subnet for my EFS subnet.

If you want to understand how to use Terraform’s cidrsubnet command to carve out separate subnets, see the article Terraform `cidrsubnet` Deconstructed by Lisa Hagemann. Her article gives excellent examples on how to do just that.

The EFS security group

Finally, I need a security group that only allows traffic between my test environment VMs and my test environment EFS volume. I already have a security group called ingress-test-env that is used to control security for my VMs. For EFS I create another security group that allows inbound traffic on port 2049 (the NFSv4 port), allows egress traffic on any port.

By setting the ingress-efs-test resource’s security_groups attribute to ingress-test-env this only allows network traffic to and from VMs in the ingress-test-env security group to talk to the EFS volume. If you use security_groups like this, you really lock down the EFS volume and you don’t need to set the cidr_blocks attribute at all.

resource "aws_security_group" "ingress-efs-test" {
name = "ingress-efs-test-sg"
vpc_id = "${}"

// NFS
ingress {
security_groups = ["${}"]
from_port = 2049
to_port = 2049
protocol = "tcp"

// Terraform removes the default rule
egress {
security_groups = ["${}"]
from_port = 0
to_port = 0
protocol = "-1"

After adding these Terraform files to my cluster configuration and running terraform apply, I end up with a new EFS filesystem that I can mount from any VM running in my VPC.

# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport /mnt/efs
# df -kh
Filesystem Size Used Avail Use% Mounted on
udev 481M 0 481M 0% /dev
tmpfs 99M 744K 98M 1% /run
/dev/xvda1 7.7G 3.0G 4.7G 40% /
tmpfs 492M 0 492M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 492M 0 492M 0% /sys/fs/cgroup
/dev/loop0 13M 13M 0 100% /snap/amazon-ssm-agent/150
/dev/loop1 87M 87M 0 100% /snap/core/4650
/dev/loop2 90M 90M 0 100% /snap/core/6130
/dev/loop3 18M 18M 0 100% /snap/amazon-ssm-agent/930
tmpfs 99M 0 99M 0% /run/user/1000 8.0E 0 8.0E 0% /mnt/efs

Hope you found this useful.

Share Button

Using Rook+Ceph for persistent storage on Kubernetes

I wanted to install Prometheus and Grafana on my new Kubernetes cluster, but in order for these packages to work they need someplace to store persistent data. I had run performance and scale tests on Ceph when I was working as a Cloud Architect at Seagate, and I’ve played with Rook during the past year, so I decided to install Rook+Ceph and use that for the Kubernetes cluster’s data storage.

Ceph is a distributed storage system that provides object, file, and block storage. On each storage node you’ll find a file system where Ceph stores objects and a Ceph OSD (Object storage daemon) process. On a Ceph cluster you’ll also find Ceph MON (monitoring) daemons, which ensure that the Ceph cluster remains highly available.

Rook acts as a Kubernetes orchestration layer for Ceph, deploying the OSD and MON processes as POD replica sets. From the Rook README file:

Rook turns storage software into self-managing, self-scaling, and self-healing storage services. It does this by automating deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management. Rook uses the facilities provided by the underlying cloud-native container management, scheduling and orchestration platform to perform its duties.

When I created the cluster I built VMs with 40GB hard drives, so with 5 Kubernetes nodes that gives me ~200GB of storage on my cluster, most of which I’ll use for Ceph.

Installing Rook+Ceph

Installing Rook+Ceph is pretty straightforward. On my personal cluster I installed Rook+Ceph v0.9.0 by following these steps:

git clone
cd rook
git checkout v0.9.0
cd cluster/examples/kubernetes/ceph
kubectl create -f operator.yaml
kubectl create -f cluster.yaml

Rook deploys the PODs in two namespaces, rook-ceph-system and rook-ceph. On my cluster it took about 2 minutes for the PODs to deploy, initialize, and get to a running state. While I was waiting for everything to finish I checked the POD status with:

$ kubectl -n rook-ceph-system get pod
rook-ceph-agent-8tsq7 1/1 Running 0 2d20h
rook-ceph-agent-b6mgs 1/1 Running 0 2d20h
rook-ceph-agent-nff8n 1/1 Running 0 2d20h
rook-ceph-agent-vl4zf 1/1 Running 0 2d20h
rook-ceph-agent-vtpbj 1/1 Running 0 2d20h
rook-ceph-agent-xq5dv 1/1 Running 0 2d20h
rook-ceph-operator-85d64cfb99-hrnbs 1/1 Running 0 2d20h
rook-discover-9nqrp 1/1 Running 0 2d20h
rook-discover-b62ds 1/1 Running 0 2d20h
rook-discover-k77gw 1/1 Running 0 2d20h
rook-discover-kqknr 1/1 Running 0 2d20h
rook-discover-v2hhb 1/1 Running 0 2d20h
rook-discover-wbkkq 1/1 Running 0 2d20h
$ kubectl -n rook-ceph get pod
rook-ceph-mgr-a-7d884ddc8b-kfxt9 1/1 Running 0 2d20h
rook-ceph-mon-a-77cbd865b8-ncg67 1/1 Running 0 2d20h
rook-ceph-mon-b-7cd4b9774f-js8n9 1/1 Running 0 2d20h
rook-ceph-mon-c-86778859c7-x2qg9 1/1 Running 0 2d20h
rook-ceph-osd-0-67fff79666-fcrss 1/1 Running 0 35h
rook-ceph-osd-1-58bd4ccbbf-lsxj9 1/1 Running 1 2d20h
rook-ceph-osd-2-bf99864b5-n4q7v 1/1 Running 0 2d20h
rook-ceph-osd-3-577466c968-j8gjr 1/1 Running 0 2d20h
rook-ceph-osd-4-6856c5c6c9-92tb6 1/1 Running 0 2d20h
rook-ceph-osd-5-8669577f6b-zqrq9 1/1 Running 0 2d20h
rook-ceph-osd-prepare-node1-xfbs7 0/2 Completed 0 2d20h
rook-ceph-osd-prepare-node2-c9f55 0/2 Completed 0 2d20h
rook-ceph-osd-prepare-node3-5g4nc 0/2 Completed 0 2d20h
rook-ceph-osd-prepare-node4-wj475 0/2 Completed 0 2d20h
rook-ceph-osd-prepare-node5-tf5bt 0/2 Completed 0 2d20h

Final tasks

Now I need to do two more things before I can install Prometheus and Grafana:

  • I need to make Rook the default storage provider for my cluster.
  • Since the Prometheus Helm chart requests volumes formatted with the XFS filesystem, I need to install XFS tools on all of my Ubuntu Kubernetes nodes. (XFS is not yet installed by Kubespray by default, although there’s currently a PR up that addresses that issue.)

Make Rook the default storage provider

To make Rook the default storage provider I just run a kubectl command:

kubectl patch storageclass rook-ceph-block -p '{"metadata": {"annotations":{"":"true"}}}'

That updates the rook-ceph-block storage class and makes it the default for storage on the cluster. Any applications that I install will use Rook+Ceph for their data storage if they don’t specify a specific storage class.

Install XFS tools

Normally I would not recommend running one-off commands on a cluster. If you want to make a change to a cluster, you should encode the change in a playbook so it’s applied every time you update the cluster or add a new node. That’s why I submitted a PR to Kubespray to address this problem.

However, since my Kubespray PR has not yet merged, and I built the cluster using Kubespray, and Kubespray uses Ansible, one of the easiest ways to install XFS tools on all hosts is by using the Ansible “run a single command on all hosts” feature:

cd kubespray
export ANSIBLE_REMOTE_USER=ansible
ansible kube-node -i inventory/mycluster/hosts.ini \
--become --become-user root \
-a 'apt-get install -y xfsprogs'

Deploy Prometheus and Grafana

Now that XFS is installed I can successfully deploy Prometheus and Grafana using Helm:

helm install --name prometheus stable/prometheus
helm install --name grafana stable/grafana

The Helm charts install Prometheus and Grafana and create persistent storage volumes on Rook+Ceph for Prometheus Server and Prometheus Alert Manager (formatted with XFS).

Prometheus dashboard

Grafana dashboard

Rook persistent volume for Prometheus Server

Want to learn more?

If you’re interested in learning more about Rook, watch these videos from KubeCon 2018:

Introduction to Rook

Rook Deep Dive

Hope you find this useful.

Share Button

Setting up a personal, production-quality Kubernetes cluster with Kubespray

I’ve been setting up and tearing down Kubernetes clusters for testing various things for the past year, mostly using Vagrant/Virtualbox but also some VMware vSphere and OpenStack deployments.

I wanted to set something a little more permanent up at my home lab — a cluster where I could add and remove nodes, run nodes on multiple physical machines, and use different types of compute hardware.

Set up the virtual machines

To get started I used a desktop System76 Wild Dog Pro Linux box (4.5 GHz i7-7700K, 64GB DDR4) and my create-vm script to create six Ubuntu 18.04 “Bionic Beaver” VMs for the cluster:

for n in $(seq 1 6); do
create-vm -n node$n \
-i ./ubuntu-18.04-server-amd64.iso \
-k ./ubuntu.ks \
-r 4096 \
-c 2 \
-s 40

With these parameters each VM will have 4GB RAM, 2 VCPUs, and a 40GB hard drive.

Install and configure Kubespray

I cloned Kubespray into a directory and created an Ansible inventory file following the instructions from the README.

git clone
cd kubespray
pip install -r requirements.txt
rm -Rf inventory/mycluster/
cp -rfp inventory/sample inventory/mycluster
declare -a IPS=($(for n in $(seq 1 6); do get-vm-ip node$n; done))
CONFIG_FILE=inventory/mycluster/hosts.ini \
python3 contrib/inventory_builder/ ${IPS[@]}

The get-vm-ip script is in the same repo as the create-vm script, and both are described in my Use .iso and Kickstart files to automatically create Ubuntu VMs article.

The script generates an Ansible hosts inventory file in inventory/mycluster/hosts.ini with all of your VM IP addresses.

I like to add one variable override to the bottom of hosts.ini which copies the kubectl credentials over to my host machine. That way I can run kubectl commands directly from my desktop. The extra lines to add to the bottom of hosts.ini are:


Install Kubernetes

To install Kubernetes on the VMs I run the Kubespray cluster.yaml playbook:

export ANSIBLE_REMOTE_USER=ansible
ansible-playbook -i inventory/mycluster/hosts.ini \
--become --become-user=root cluster.yml

Once the playbooks have finished, you should have a fully-operational Kubernetes cluster running on your desktop.

At this point you should be able to query the cluster from your desktop using kubectl. For example:

$ kubectl cluster-info
Kubernetes master is running at
coredns is running at
kubernetes-dashboard is running at
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$ kubectl get nodes
node1 Ready master,node 3d6h v1.13.0
node2 Ready master,node 3d6h v1.13.0
node3 Ready node 3d6h v1.13.0
node4 Ready node 3d6h v1.13.0
node5 Ready node 3d6h v1.13.0
node6 Ready node 3d6h v1.13.0
$ kubectl get pods --all-namespaces
kube-system calico-kube-controllers-67f89845f-6zbvx 1/1 Running 1 3d6h
kube-system calico-node-jh7ng 1/1 Running 2 3d6h
kube-system calico-node-l9vfb 1/1 Running 2 3d6h
kube-system calico-node-mqxjx 1/1 Running 2 3d6h

Set up the Kubernetes Dashboard

One of the first things I like to do is set up access to the Kubernetes dashboard. First I set up a service account for the admin user:

$ cat ~/Projects/k8s-cluster/dashboard-adminuser.yaml
apiVersion: v1
kind: ServiceAccount
name: admin-user
namespace: kube-system

kind: ClusterRoleBinding
name: admin-user
kind: ClusterRole
name: cluster-admin
- kind: ServiceAccount
name: admin-user
namespace: kube-system
$ kubectl apply -f ~/Projects/k8s-cluster/dashboard-adminuser.yaml

Next I get the bearer token for the user account:

$ kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')

Finally I plug the dashboard URL that I got from kubectl cluster-info into my browser, select “Token” authentication, and cut and paste in the bearer token to log into the system.

Once logged in, an overview of my cluster pops up:

With a minimal amount of working compute infrastructure, it’s easy to set up your own production-quality Kubernetes cluster using Kubespray.

Hope you find this useful.

Share Button