post

Updating ESXi root passwords and authorized ssh keys with Ansible

I manage a number of vCenter instances and a lot of ESXi hosts. Some of the hosts are production, some for test and development. Sometimes an ESXi host needs to be used by a different group or temporarily moved to a new cluster and then back again afterwards.

To automate the configuration of these systems and the VMs running on them I use Ansible. For a freshly-imaged, new installation of ESXi one of the first things I do it to run an Ansible playbook that sets up the ESXi host, and the first thing it does is to install the ssh keys of the people who need to log in as root, then it updates the root password.

I have ssh public keys for every user that needs root access. A short bash script combines those keys and my Ansible management public key into authorized_keys files for the ESXi hosts in each vCenter instance. In my Ansible group_vars/ directory is a file for each group of ESXi hosts, so all of the ESXi hosts in a group get the same root password and ssh keys. This also makes it easy to change root passwords and add and remove ssh keys of users as they are added to or leave different groups.

Here’s a portion of a group_vars/esxi_hosts_cicd/credentials.yml file for a production CICD cluster:

# ESXI Hosts (only Ops can ssh in)
esxi_root_authorized_keys_file: authkeys-ops

esxi_username: 'root'
esxi_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          34633832366431383630653735663739636466316262
          39363165663566323864373930386239380085373464
          32383863366463653365383533646437656664376365
          31623564336165626162616263613166643462356462
          34633832366431383630653735663739636466316262
          39363165663566323864373930386239380085373464
          32383863366463653365383533646437656664376365
          31623564336165626162616263613166643462356462
          3061

The password is encrypted using Ansible Vault.

In my main.yml file I call the esxi_host role for all of the hosts in the esxi_hosts inventory group. Since I use a different user to manage non-ESXi hosts, the play that calls the role tells Ansible to use the root user only when logging into ESXi hosts.

- name: Setup esxi_hosts
  gather_facts: False
  user: root
  hosts: esxi_hosts
  roles:
    - esxi_host

The esxi_host role has an esxi_host/tasks/main.yml playbook. The two plays that update the authorized_keys file and root password look like this:

- name: Set the authorized ssh keys for the root user
  copy:
    src: "{{ esxi_root_authorized_keys_file }}"
    dest: /etc/ssh/keys-root/authorized_keys
    owner: root
    group: root
    mode: '0600'

- name: Set the root password for ESXI Hosts
  shell: "echo '{{ esxi_password }}' | passwd -s"
  no_log: True

The first time I run this the password is set to some other value, so I start Ansible with:

ansible-playbook main.yml \
    --vault-id ~/path/to/vault/private/key/file \
    -i inventory/ \
    --limit [comma-separated list of new esxi hosts] \
    --ask-pass \
    --ask-become-pass

This will prompt me for the current root ssh password. Once I enter that it logs into each ESXi host, installs the new authorized_keys file, uses the vault private key to decrypt the password, then updates the root password.

After I’ve done this once, since the Ansible ssh key is also part of the authorized_keys file, subsequent Ansible updates just use the ssh key to login, and I don’t have to use --ask-pass or --ask-become-pass parameters.

This is also handy when switching a host from one cluster to another. As long as the ssh keys are installed I no longer need the current root password to update the root password.

Hope you find this useful.

post

Allow ping from specific subnets to AWS EC2 instances using Terraform

If you’re using Terraform to set up EC2 instances on AWS you may be a little confused about how to allow ping through the AWS VPC firewall, especially if you want to limit ping so that it only works from specific IPs or subnets.

To do this just add a Terraform ingress security group rule to the aws_security_group:

ingress {
  cidr_blocks = ["1.2.3.4/32"]
  from_port   = 8
  to_port     = 0
  protocol    = "icmp"
  description = "Allow ping from 1.2.3.4"
}

The above rule will only allow ping from the single IPv4 address “1.2.3.4”. You can use the cidr_blocks setting to allow ping from any set of IPv4 IP address and subnets that you wish. If you want to allow IPv6 addresses use the ipv6_cidr_blocks setting:

ingress {
  cidr_blocks       = ["1.2.3.4/32"]
  ipv6_cidr_blocks  = [aws_vpc.example.ipv6_cidr_block]
  from_port         = 8
  to_port           = 0
  protocol          = "icmp"
  description       = "Allow ping from 1.2.3.4 and the example.ipv6_cidr_block"
}

Right about now you should be scratching your head and asking why a port range is specified from port 8 to port 0? Isn’t that backwards? Also, this is ICMP, so why are we specifying port ranges at all?

Well, for ICMP security group rules Terraform uses the from_port field to define the ICMP message type, and “ping” is an ICMP “echo request” type 8 message.

So why is to_port = 0? Since ICMP is a network-layer protocol there is no TCP or UDP port number associated with ICMP packets as these numbers are associated with the transport layer, which is above the network layer. So you might think it’s set to 0 because it’s a “don’t care” setting, but that is not the case.

It’s actually set to 0 because Terraform (and AWS) use the to_port field to define the ICMP code of the ICMP packet being allowed through the firewall, and “ping” is defined as a type 8, code 0 ICMP message.

I have no idea why Terraform chose to obscure the usage this way, but I suspect it’s because the AWS API reuses the from_port field for storing the ICMP message type, and reuses the to_port for storing the ICMP code, and Terraform just copied their bad design. A more user-friendly implementation of Terraform would have created an icmp_message_type and icmp_message_code fields (or aliases) that are mapped to the AWS from_port and to_port fields to make it obvious what you’re setting and why it works.

Hope you find this useful.

post

Generate a crypted password for Ansible

The Ansible user: command allows you to add a user to a Linux system with a password. The password must be passed to Ansible in a hashed password format using one of the hash formats supported by /etc/shadow.

Some Ansible docs suggest storing your passwords in plain text and using the Ansible SHA512 filter to hash the plaintext passwords before passing them to the user module. This is a bad practice for a number of reasons.

Storing your passwords in plain text is a bad idea

  • Storing your passwords in plain text is a bad idea, since anyone who can read your Ansible playbook now knows your password.
  • The play is not idempotent, since the SHA512 filter will re-hash the password every time you run the play, resetting the password each time the play is run.
  • Attempting to make the play idempotent, by using update_password: on_create, means that you can no longer update the password using Ansible. This might be OK if you’re just updating one machine. It’s a giant pain in the ass if you need to update the password on many machines.

A better way is to hash the password once using openssl and store the hashed version of the password in your Ansible playbooks:

- name: Set the user's password
  user:
    name: earl
    password: "$6$wLZ77bHhLVJsHaMz$WqJhNW2VefjhnupK0FBj5LDPaONaAMRoiaWle4rU5DkXz7hxhl3Gxcwshuy.KQWRFt6YPWXNbdKq9B/Rk9q7A."

To generate the hashed password use the openssl passwd command on any Linux host:

openssl passwd -6 -stdin

This opens an interactive shell to openssl. Just enter the password that you want to use, hit enter, and openssl will respond with the hashed version. Copy and paste the hashed version of the password into Ansible, and the next time you run Ansible on a host the user’s password will be updated.

Type Ctrl-D to exit the interactive openssl shell.

Since you used the interactive shell the plaintext password that you entered is not saved into the Linux host’s history file and is not visible to anyone running ps.

The crypted password is encrypted with a SHA512 one-way hash and a random 16 character salt, so you can check the playbook into a Git repository without revealing your password.

Hope you find this useful.

post

The Right Way to reboot a host with Ansible

For a long time rebooting a host with Ansible has been tricky. The steps are:

  • ssh to the host
  • Reboot the host
  • Disconnect before the host closes your ssh connection
  • Wait some number of seconds to ensure the host has really shut down
  • Attempt to ssh to the host and execute a command
  • Repeat ssh attempt until it works or you give up

Seems clear enough, but if you Google for an answer you may end up at this StackExchange page that gives lots of not-quite-correct answers from 2015 (and one correct answer). Some people suggest checking port 22, but just because ssh is listening doesn’t mean that it’s at state where it’s accepting connections.

The correct answer is use Ansible version 2.7 or greater. 2.7 introduced the reboot command, and now all you have to do is add this to your list of handlers:

- name: Reboot host and wait for it to restart
  reboot:
    msg: "Reboot initiated by Ansible"
    connect_timeout: 5
    reboot_timeout: 600
    pre_reboot_delay: 0
    post_reboot_delay: 30
    test_command: whoami

This handler will:

  • Reboot the host
  • Wait 30 seconds
  • Attempt to connect via ssh and run whoami
  • Disconnect after 5 seconds if it ssh isn’t working
  • Keep attempting to connect for 10 minutes (600 seconds)

Add the directive:

  notify: Reboot host and wait for it to restart

… to any Ansible command that requires a reboot after a change. The host will be rebooted when the playbook finishes, then Ansible will wait until the host is back up and ssh is working before continuing on to the next playbook.

If you need to reboot halfway through a playbook you can force all handlers to execute with the command:

- name: Reboot if necessary
  meta: flush_handlers

I sometimes do that to change something, force a reboot, then verify that the change worked, all within the same playbook.

Hope you found this useful.

Why adding a .conf or .cfg file to /etc/sudoers.d doesn’t work

I needed to add some sudo access rights for support personnel on about a hundred Centos 6.6 servers. Currently no one one these hosts had sudo rights, so the /etc/sudoers file was the default file. I’m using Ansible to maintain these hosts, but rather than modify the default /etc/sudoers file using Ansible’s lineinfile: command, I decided to create a support.conf file and use Ansible’s copy: command to copy that file into /etc/sudoers.d/. That way if a future version of Centos changes the /etc/sudoers file I’m leaving that file untouched, so my changes should always work.

  - name: Add custom sudoers
    copy: src=files/support.conf dest=/etc/sudoers.d/support.conf owner=root group=root mode=0440 validate='visudo -cf %s'

The support.conf file I created copied over just fine, and the validation step of running “visudo -cf” on the file before moving it into place claimed that the file was error-free and should work just fine as a sudoers file.

I logged in as the support user and it didn’t work:

[support@c1n1 ~]$ sudo /bin/ls /var/log/*
support is not in the sudoers file.  This incident will be reported.

Not only did it not work, it was telling me that the support user wasn’t even in the file, which they clearly were.

After Googling around a bit and not finding much I saw this in the Sudoers Manual:

sudo will read each file in /etc/sudoers.d, skipping file names that end in ‘~’ or contain a ‘.’ character to avoid causing problems with package manager or editor temporary/backup files.

sudo was skipping the file because the file name contained a period!

I changed the name of the file from support.conf to support and it worked.

  - name: Add custom sudoers
    copy: src=files/support dest=/etc/sudoers.d/support owner=root group=root mode=0440 validate='visudo -cf %s'

Hope you find this useful.

Here’s a snippet from /etc/sudoers.d/support if you’re interested. The “support” user has already been created by a separate Ansible command.

# Networking
Cmnd_Alias NETWORKING = /sbin/route, /sbin/ifconfig, /bin/ping, /sbin/dhclient, /usr/bin/net, /sbin/iptables, /usr/bin/rfcomm, /usr/bin/wvdial, /sbin/iwconfig, /sbin/mii-tool

# Installation and management of software
Cmnd_Alias SOFTWARE = /bin/rpm, /usr/bin/up2date, /usr/bin/yum

# Services
Cmnd_Alias SERVICES = /sbin/service, /sbin/chkconfig

# Reading logs
Cmnd_Alias READ_LOGS = /usr/bin/less /var/log/*, /bin/more /var/log/*, /bin/ls /var/log/*, /bin/ls /var/log

support  ALL = NETWORKING, SOFTWARE, SERVICES, READ_LOGS