post

Updating ESXi root passwords and authorized ssh keys with Ansible

I manage a number of vCenter instances and a lot of ESXi hosts. Some of the hosts are production, some for test and development. Sometimes an ESXi host needs to be used by a different group or temporarily moved to a new cluster and then back again afterwards.

To automate the configuration of these systems and the VMs running on them I use Ansible. For a freshly-imaged, new installation of ESXi one of the first things I do it to run an Ansible playbook that sets up the ESXi host, and the first thing it does is to install the ssh keys of the people who need to log in as root, then it updates the root password.

I have ssh public keys for every user that needs root access. A short bash script combines those keys and my Ansible management public key into authorized_keys files for the ESXi hosts in each vCenter instance. In my Ansible group_vars/ directory is a file for each group of ESXi hosts, so all of the ESXi hosts in a group get the same root password and ssh keys. This also makes it easy to change root passwords and add and remove ssh keys of users as they are added to or leave different groups.

Here’s a portion of a group_vars/esxi_hosts_cicd/credentials.yml file for a production CICD cluster:

# ESXI Hosts (only Ops can ssh in)
esxi_root_authorized_keys_file: authkeys-ops

esxi_username: 'root'
esxi_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          34633832366431383630653735663739636466316262
          39363165663566323864373930386239380085373464
          32383863366463653365383533646437656664376365
          31623564336165626162616263613166643462356462
          34633832366431383630653735663739636466316262
          39363165663566323864373930386239380085373464
          32383863366463653365383533646437656664376365
          31623564336165626162616263613166643462356462
          3061

The password is encrypted using Ansible Vault.

In my main.yml file I call the esxi_host role for all of the hosts in the esxi_hosts inventory group. Since I use a different user to manage non-ESXi hosts, the play that calls the role tells Ansible to use the root user only when logging into ESXi hosts.

- name: Setup esxi_hosts
  gather_facts: False
  user: root
  hosts: esxi_hosts
  roles:
    - esxi_host

The esxi_host role has an esxi_host/tasks/main.yml playbook. The two plays that update the authorized_keys file and root password look like this:

- name: Set the authorized ssh keys for the root user
  copy:
    src: "{{ esxi_root_authorized_keys_file }}"
    dest: /etc/ssh/keys-root/authorized_keys
    owner: root
    group: root
    mode: '0600'

- name: Set the root password for ESXI Hosts
  shell: "echo '{{ esxi_password }}' | passwd -s"
  no_log: True

The first time I run this the password is set to some other value, so I start Ansible with:

ansible-playbook main.yml \
    --vault-id ~/path/to/vault/private/key/file \
    -i inventory/ \
    --limit [comma-separated list of new esxi hosts] \
    --ask-pass \
    --ask-become-pass

This will prompt me for the current root ssh password. Once I enter that it logs into each ESXi host, installs the new authorized_keys file, uses the vault private key to decrypt the password, then updates the root password.

After I’ve done this once, since the Ansible ssh key is also part of the authorized_keys file, subsequent Ansible updates just use the ssh key to login, and I don’t have to use --ask-pass or --ask-become-pass parameters.

This is also handy when switching a host from one cluster to another. As long as the ssh keys are installed I no longer need the current root password to update the root password.

Hope you find this useful.

post

Too many authentication failures

I was working with a new Linux distro and after creating a brand-new VM with a single login I attempted to ssh into the VM only to be greeted with:

Received disconnect from 10.0.0.180 port 22:2: Too many authentication failures
Disconnected from 10.0.0.180 port 22

It was a new VM, and I hadn’t loaded an ssh key (there was no option to do so in the install). I’d set up a user and password, so I expected to get a password prompt. I didn’t get to a password prompt, just an immediate disconnect.

I used ssh -vvv to connect and found that my ssh client was attempting to use my ssh keys, as ssh is supposed to, and on the third key the VM spat back the error:

Received disconnect from 10.0.0.180 port 22:2: Too many authentication failures
Disconnected from 10.0.0.180 port 22

Well, I wanted to connect with a password anyhow, so I tried:

ssh -o PubkeyAuthentication=no username@10.0.0.180

I was greeted with a password: prompt.

I checked the /etc/ssh/sshd_config and found that someone who’d built the install image had changed the default setting for MaxAuthTries from 6 to 2.

The MaxAuthTries setting tells the ssh daemon how many different authentication attempts a user can try before it disconnects. Each ssh key loaded into ssh-agent counts as one authentication attempt. The default is 6 because many users (like me) have multiple ssh keys loaded into ssh-agent so that we can automatically log into different hosts that use different ssh keys. Trying more than one ssh key isn’t the same as thumb-fingering a password — ssh is designed to allow for multiple key attempts. After the ssh connection attempts all of your ssh keys and you haven’t run out of attempts and passwords are enabled you’ll eventually get a password prompt.

Setting MaxAuthTries back to the more reasonable default of 6 and reloading the sshd daemon fixed the issue. Apparently whoever tested the setup only has one ssh key and wasn’t aware of what changing the MaxAuthTries setting does when people with more than one key attempt to log in.

Alternatively, if it’s someone else’s server and you can’t change the /etc/ssh/sshd_config file, you can also add these lines to your local ~/.ssh/config file:

Host 10.0.0.180
    PubkeyAuthentication no

If you’re concerned about ssh security sshd_config allows you to control what versions of the ssh protocol are supported, which ciphers you trust (or don’t trust), and to tune other settings that lock down what you will or won’t allow ssh to do in your environment. It may be that for some applications in some environments setting MaxAuthTries 2 makes sense, but using it for an out of the box installation just breaks ssh for no good reason.

Hope you find this useful.

Recovering from a lost connection when upgrading Ubuntu via ssh

I wanted to upgrade my desktop machine at work to the latest version of Ubuntu, but since it takes several hours to upgrade an Ubuntu host, and I have work to do during the day, I figured I could log into my workstation from home in using ssh and start the upgrade remotely.

So I logged into my workstation from home and ran:

> sudo apt-get install update-manager-core
> sudo do-release-upgrade

The upgrade script warned me that I was using ssh and asked if I was sure I wanted to continue. I said “Y”, and a little while later the upgrade manager was busy downloading upgrade packages.

I planned to check it a couple of times that night, answer any package upgrade questions that popped up, and then in the morning when I got to work the upgrade would be complete.

Of course what actually happened was that I got side-tracked onto some other problem that night, forgot about the upgrade in progress, and when I got to work the next day my workstation was in a state of limbo, with the upgrade halfway complete, waiting for me to answer some question on the screen — at my house.

Luckily the Ubuntu developers who created the ssh upgrade process run that upgrade inside of a screen session. As the screen pages states, “Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells).”

So at work all I had to do was get the list of current screen sessions:

> sudo screen -list
There are screens on:
        9129.ubuntu-release-upgrade-screen-window       (05/17/2011 08:50:08 PM)        (Attached)
2 Sockets in /var/run/screen/S-root.

Invoke screen using the “-d -r sessionowner/[pid.tty.host]” flags:

> sudo screen -d -r root/9129.ubuntu-release-upgrade-screen-window

… and I could pull up the screen at work that had been displaying at my home. Once I answered the remaining questions about whether to keep my custom configuration files or use the new, packaged configuration files my workstation rebooted and the latest version of Ubuntu booted right up.