post

Configure BIOS settings with Ansible and Redfish

If you’re using Ansible and trying to configure the BIOS settings of a bunch of hosts in a data center, take a look at Ansible’s community.general.redfish_config module.

The Redfish standard is a suite of specifications that deliver an industry standard protocol providing a RESTful interface for the management of servers, storage, networking, and converged infrastructure. In practice this means that if you have a host with iLO/iDRAC capabilities that also supports the Redfish standard (which includes most datacenter-class servers from Dell, Supermicro, Lenovo, HPE, Fujitsu, IBM, Cisco, etc.), then in addition to a UI where you can log in and configure the hardware, that host also has a Redfish API that accepts JSON payloads to configure the hardware.

The basic format of the Ansible play to change a BIOS setting is this:

- name: Make sure that SR-IOV is enabled
  community.general.redfish_config:
    category: Systems
    command: SetBiosAttributes
    baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
    username: "{{ ilo_username }}"
    password: "{{ ilo_password }}"
    bios_attributes:
      SriovGlobalEnable: "Enabled"
  register: update_sriov

- name: Schedule BIOS setting updates
  community.general.idrac_redfish_command:
    category: Systems
    command: CreateBiosConfigJob
    baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
    username: "{{ ilo_username }}"
    password: "{{ ilo_password }}"
  when: update_sriov.changed

In this case I’m changing the BIOS SriovGlobalEnable setting to “Enabled”. The baseuri is the DNS name or IP address of the iLO / iDRAC interface, and the username and password are the same user name and password that you use to log into iLO / iDRAC.

Once this play is applied to a host, if the host’s “SR-IOV Global Enable setting” wasn’t enabled before, the setting is now Enabled (pending reboot). The “Schedule BIOS setting updates” play ensures that the next time the host is rebooted the new BIOS setting will be applied. If you want to reboot immediately the community.general.redfish_command_module will let you do that too.

BIOS updated pending reboot

The hardest part about setting this up is figuring out what Ansible redfish_config expects the setting to be called. I could see in the IDRAC > Configuration > BIOS Settings that there was a “SR-IOV Global Enable” setting, but I had no idea what attribute name redfish_config used for that setting. Luckily, there’a a Redfish API that lists the BIOS settings current keys and values used by Redfish on your host. Just navigate to https://[your iLO/iDRAC IP or DNS name]/redfish/v1/Systems/System.Embedded.1/Bios and you’ll get a list of all of the BIOS setting keys and values.

Redfish API showing current BIOS settings on a host

I hope you find this useful.

post

Run a Linux systemd service during shutdown

I recently needed to add a cleanup service that runs at shutdown to a hundred AWS servers. My requirements were:

  • Run the script /usr/local/sbin/ec2-cleanup.sh when a VM shuts down (poweroff or reboot).
  • Send the output from the script to the syslog service.

So I needed to create a systemd service file that would call the script when the VM shuts down. This is the ec2-cleanup.service file I created:

# ec2-cleanup.service

[Unit]
Description=Run cleanup at shutdown
After=syslog.service network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStop=/usr/local/sbin/ec2-cleanup.sh
Restart=on-failure
RestartSec=1s

[Install]
WantedBy=multi-user.target

Type=oneshot means that the command runs once. Normally since this is a oneshot service the service would exit after the ExecStart command runs, but since I don’t want to do anything when the service starts, there is is no ExecStart command. That’s why I use RemainAfterExit=yes, which keeps the service running even though there’s no ExecStart command.

Finally I use ExecStop to run the command at shutdown time.

After=syslog.service network.target ensures that the ec2-cleanup.service doesn’t start until after syslog service is running and the network has started. More importantly, since systemd stops services in the reverse order that they’re started, this also ensures that syslog and the network service are still running when systemd runs the ec2-cleanup.service‘s ExecStop command.

Although there are many different available syslog services, most use “syslog” as a service alias, so After=syslog.service should work regardless of which syslog service you actually use. (e.g. If you use rsyslog this still works, because rsyslog declares syslog as an alias.)

Finally, I just needed to install the service on my AWS VMs, so I added this to an Ansible playbook that runs on my AWS VMs:

  - name: Install the ec2-cleanup.sh script
    copy:
      src: ec2-cleanup.sh
      dest: /usr/local/sbin/ec2-cleanup.sh
      owner: root
      group: root
      mode: 0755

  - name: Install a service to run ec2-cleanup.sh at shutdown
    copy:
      src: ec2-cleanup.service
      dest: /lib/systemd/system/ec2-cleanup.service
      owner: root
      group: root
      mode: 0644
    register: ec2_cleanup_service

  - name: Restart ec2-cleanup service if the service file changed
    systemd:
      name: ec2-cleanup
      daemon_reload: True
      state: restarted
    when: ec2_cleanup_service.changed

  - name: Enable ec2-cleanup service so it starts on boot
    systemd:
      name: ec2-cleanup
      enabled: True
      state: started

To verify that all of this works I ran the Ansible playbook on a VM, then logged in and checked the status of the service:

eruby@i-056ac231adeb1f930:~$ systemctl status ec2-cleanup
● ec2-cleanup.service - Run cleanup at shutdown
     Loaded: loaded (/lib/systemd/system/ec2-cleanup.service; enabled; vendor preset: enabled)
     Active: active (exited) since Tue 2023-03-14 17:04:37 UTC; 44s ago

Mar 14 17:04:37 i-056ac221aceb1f830 systemd[1]: Finished Run cleanup at shutdown.

The service is active (exited), which I expected (exited because ExecStart has completed, active because RemainAfterExit=yes is keeping the service running until shutdown.

If I reboot the VM and log back in I can check syslog with:

journalctl -u ec2-cleanup.service -n 20

… and see the last 20 lines of output from the script. The log output shows that the script ran when I rebooted.

Hope you find this useful.

post

Calculating the value for 64bitMMIOSizeGB

When adding a GPU to a vSphere VM using PCI passthrough there are a couple of additional settings that you need to make or your VM won’t boot.

When creating the VM you’ll need to set the Actions > Edit > VM Options > Boot Options > Firmware and select “EFI”. You need to do this before you install the operating system on the VM. If you don’t do this the GPUs won’t work and the VM won’t boot.

To add a GPU, in vCenter go to the VM, select Actions > Edit > Add New Device. Any GPUs set up as PCI passthrough devices should appear in a pick list. Add one or more GPUs to your VM.

Note that after adding one device, when you add additional GPUs the first GPU you selected still appears in the pick list. If you add the same GPU more than once your VM will not boot. If you add a GPU that’s being used by another running VM your VM will not boot. Pay attention to the PCI bus addresses displayed and make sure that the GPUs you pick are unique and not in use on another VM.

Finally you have to set up memory-mapped I/O (MMIO) to map system memory to the GPU’s framebuffer memory so that the CPU can pass data to the GPU. In vCenter go to the VM, select Actions > Edit > VM Options > Advanced > Edit configuration.

Once you’re on the Configuration parameters screen, add two more parameters:

pciPassthru.use64bitMMIO = TRUE
pciPassthru.64bitMMIOSizeGB = ????
Actions > Edit > VM Options > Advanced > Edit configuration

The 64bitMMIOSizeGB value is calculated by adding up the total GB of framebuffer memory on all GPUs attached to the VM.  If the total GPU framebuffer memory falls on a power-of-2, setting pciPassthru.64bitMMIOSizeGB to the next power of 2 works.

If the total GPU framebuffer memory falls between two powers-of-2, round up to the next power of 2, then round up again, to get a working setting.

Powers of 2 are 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 …

For example, two NVIDIA A100 cards with 40GB each = 80GB (in between 64GB and 128GB), so round up to the next power of 2 (128GB), then round up again to the next power of 2 after that (256GB) to get the correct setting. If you set it too low the VM won’t boot, but it won’t give you an error message telling you what the issue is either.

Here are some configurations that I’ve tested and verified:

  • 2 x 16GB NVIDIA V100 = 32GB, 32 is a power of 2, so round up to the next power of 2 which is 64, set pciPassthru.64bitMMIOSizeGB = 64 to boot.
  • 2 x 24GB NVIDIA P40 = 48GB, which is in-between 32 and 64, round up to 64 and again to 128, requires pciPassthru.64bitMMIOSizeGB = 128 to boot.
  • 8 x 16GB NVIDIA V100 = 128GB, 128 is a power of 2, so round up to the next power of 2 which is 256, set pciPassthru.64bitMMIOSizeGB = 256 to boot.
  • 10 x 16GB NVIDIA V100 = 160GB, which is in-between 128 and 256, round up to 256 and again to 512, set pciPassthru.64bitMMIOSizeGB = 512 to boot.

Hope you find this useful.

post

Updating ESXi root passwords and authorized ssh keys with Ansible

I manage a number of vCenter instances and a lot of ESXi hosts. Some of the hosts are production, some for test and development. Sometimes an ESXi host needs to be used by a different group or temporarily moved to a new cluster and then back again afterwards.

To automate the configuration of these systems and the VMs running on them I use Ansible. For a freshly-imaged, new installation of ESXi one of the first things I do it to run an Ansible playbook that sets up the ESXi host, and the first thing it does is to install the ssh keys of the people who need to log in as root, then it updates the root password.

I have ssh public keys for every user that needs root access. A short bash script combines those keys and my Ansible management public key into authorized_keys files for the ESXi hosts in each vCenter instance. In my Ansible group_vars/ directory is a file for each group of ESXi hosts, so all of the ESXi hosts in a group get the same root password and ssh keys. This also makes it easy to change root passwords and add and remove ssh keys of users as they are added to or leave different groups.

Here’s a portion of a group_vars/esxi_hosts_cicd/credentials.yml file for a production CICD cluster:

# ESXI Hosts (only Ops can ssh in)
esxi_root_authorized_keys_file: authkeys-ops

esxi_username: 'root'
esxi_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          34633832366431383630653735663739636466316262
          39363165663566323864373930386239380085373464
          32383863366463653365383533646437656664376365
          31623564336165626162616263613166643462356462
          34633832366431383630653735663739636466316262
          39363165663566323864373930386239380085373464
          32383863366463653365383533646437656664376365
          31623564336165626162616263613166643462356462
          3061

The password is encrypted using Ansible Vault.

In my main.yml file I call the esxi_host role for all of the hosts in the esxi_hosts inventory group. Since I use a different user to manage non-ESXi hosts, the play that calls the role tells Ansible to use the root user only when logging into ESXi hosts.

- name: Setup esxi_hosts
  gather_facts: False
  user: root
  hosts: esxi_hosts
  roles:
    - esxi_host

The esxi_host role has an esxi_host/tasks/main.yml playbook. The two plays that update the authorized_keys file and root password look like this:

- name: Set the authorized ssh keys for the root user
  copy:
    src: "{{ esxi_root_authorized_keys_file }}"
    dest: /etc/ssh/keys-root/authorized_keys
    owner: root
    group: root
    mode: '0600'

- name: Set the root password for ESXI Hosts
  shell: "echo '{{ esxi_password }}' | passwd -s"
  no_log: True

The first time I run this the password is set to some other value, so I start Ansible with:

ansible-playbook main.yml \
    --vault-id ~/path/to/vault/private/key/file \
    -i inventory/ \
    --limit [comma-separated list of new esxi hosts] \
    --ask-pass \
    --ask-become-pass

This will prompt me for the current root ssh password. Once I enter that it logs into each ESXi host, installs the new authorized_keys file, uses the vault private key to decrypt the password, then updates the root password.

After I’ve done this once, since the Ansible ssh key is also part of the authorized_keys file, subsequent Ansible updates just use the ssh key to login, and I don’t have to use --ask-pass or --ask-become-pass parameters.

This is also handy when switching a host from one cluster to another. As long as the ssh keys are installed I no longer need the current root password to update the root password.

Hope you find this useful.

post

Allow ping from specific subnets to AWS EC2 instances using Terraform

If you’re using Terraform to set up EC2 instances on AWS you may be a little confused about how to allow ping through the AWS VPC firewall, especially if you want to limit ping so that it only works from specific IPs or subnets.

To do this just add a Terraform ingress security group rule to the aws_security_group:

ingress {
  cidr_blocks = ["1.2.3.4/32"]
  from_port   = 8
  to_port     = 0
  protocol    = "icmp"
  description = "Allow ping from 1.2.3.4"
}

The above rule will only allow ping from the single IPv4 address “1.2.3.4”. You can use the cidr_blocks setting to allow ping from any set of IPv4 IP address and subnets that you wish. If you want to allow IPv6 addresses use the ipv6_cidr_blocks setting:

ingress {
  cidr_blocks       = ["1.2.3.4/32"]
  ipv6_cidr_blocks  = [aws_vpc.example.ipv6_cidr_block]
  from_port         = 8
  to_port           = 0
  protocol          = "icmp"
  description       = "Allow ping from 1.2.3.4 and the example.ipv6_cidr_block"
}

Right about now you should be scratching your head and asking why a port range is specified from port 8 to port 0? Isn’t that backwards? Also, this is ICMP, so why are we specifying port ranges at all?

Well, for ICMP security group rules Terraform uses the from_port field to define the ICMP message type, and “ping” is an ICMP “echo request” type 8 message.

So why is to_port = 0? Since ICMP is a network-layer protocol there is no TCP or UDP port number associated with ICMP packets as these numbers are associated with the transport layer, which is above the network layer. So you might think it’s set to 0 because it’s a “don’t care” setting, but that is not the case.

It’s actually set to 0 because Terraform (and AWS) use the to_port field to define the ICMP code of the ICMP packet being allowed through the firewall, and “ping” is defined as a type 8, code 0 ICMP message.

I have no idea why Terraform chose to obscure the usage this way, but I suspect it’s because the AWS API reuses the from_port field for storing the ICMP message type, and reuses the to_port for storing the ICMP code, and Terraform just copied their bad design. A more user-friendly implementation of Terraform would have created an icmp_message_type and icmp_message_code fields (or aliases) that are mapped to the AWS from_port and to_port fields to make it obvious what you’re setting and why it works.

Hope you find this useful.