post

Update the firmware on Dell hosts using Ansible

Dell provides an https website with firmware updates. If your host can reach https://downloads.dell.com you can update Dell firmware using Ansible and the dellemc.openmanage module.

First install the dellemc.openmanage module:

ansible-galaxy collection install \
    --force-with-deps dellemc.openmanage

Then add the following playbook to your roles:

---
# ansible/roles/firmware_update/tasks/main.yml

- name: Update Dell firmware
  delegate_to: localhost
  become: False
  block:

  - name: Update firmware from downloads.dell.com
    dellemc.openmanage.idrac_firmware:
      share_name: "https://downloads.dell.com"
      idrac_ip: "{{ inventory_hostname }}.{{ subdomain }}"
      idrac_user: "{{ ilo_admin_user_name }}"
      idrac_password: "{{ ilo_admin_password }}"
      validate_certs: False
      reboot: True
      job_wait: True
      apply_update: True
    register: firmware

- name: Pause for 5 minutes before trying the next host
  ansible.builtin.pause:
    minutes: 5
  when: firmware.changed

This playbook will check to see if any firmware updates are available and if there are any it will immediately apply them, possibly rebooting the host.

You do not want to apply this to an entire cluster of hosts running some distributed application, such as Kubernetes or vSAN, because it could reboot all of the hosts at the same time, trashing your environment. To ensure that Ansible only updates one host at a time use the serial command in the playbook that calls this role:

# main.yml

- name: Update Dell firmware
  gather_facts: False
  hosts: firmware_update_hosts
  serial: 1
  roles:
    - firmware_update

Hope you find this useful.

Want to learn Ansible? Start on the Ansible Community Documentation page and just start automating your environment. Want to level-up your Ansible skills? I highly recommend the O’Reilly book Ansible: Up and Running.

post

Updating local IDRAC passwords on Dell hosts using Ansible

Dell hosts have the ability to add multiple local users to the Integrated Dell Remote Access Controller (iDRAC) with different levels of access.

Each IDRAC user has an Account ID number 1 through 2^8. If you have an datacenter operations team that needs admin access, developers that need to check BIOS settings, and automation scripts that need to update firmware, you need to agree on which Account ID you want to use for what task and then assign a user name and password to that Account ID.

You also want to have a root account with admin access that you use to update the other accounts. Setting up the root account and not giving that login and password out to anyone else allows you to manage all of the other accounts. Having two accounts with admin access means you can use root to update the admin account’s password and the admin account can update the root password.

We’ll use the Ansible community.general.redfish_command module to set passwords. This module uses the Dell Redfish API to interact with IDRAC. The Ansible playbook update passwords looks like this:

---
# ansible/roles/idrac_host/tasks/main.yml

- name: Update IDRAC Passwords
  delegate_to: localhost
  become: False
  block:

  # Create the ILO users and update their passwords
  - name: Use the root account to add and enable the devops user
    community.general.redfish_command:
      category: Accounts
      command: AddUser,EnableUser
      baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
      username: "{{ ilo_root_user_name }}"
      password: "{{ ilo_root_initial_password }}"
      account_id: "{{ ilo_devops_id }}"
      account_username: "{{ ilo_devops_user_name }}"
      account_password: "{{ ilo_devops_password }}"
      roleid: "Administrator"

  - name: Use the root account to set the password of the devops user
    community.general.redfish_command:
      category: Accounts
      command: UpdateUserPassword
      baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
      username: "{{ ilo_root_user_name }}"
      password: "{{ ilo_root_initial_password }}"
      account_username: "{{ ilo_devops_user_name }}"
      account_password: "{{ ilo_devops_password }}"

  - name: Update root user password (if needed)
    community.general.redfish_command:
      category: Accounts
      command: UpdateUserPassword
      baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
      username: "{{ ilo_root_user_name }}"
      password: "{{ ilo_root_initial_password }}"
      account_username: "{{ ilo_root_user_name }}"
      account_password: "{{ ilo_root_password }}"
    when: ilo_root_initial_password != ilo_root_password

  - name: Use the devops account to add and enable the os_deploy user
    community.general.redfish_command:
      category: Accounts
      command: AddUser,EnableUser
      baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
      username: "{{ ilo_devops_user_name }}"
      password: "{{ ilo_devops_password }}"
      account_id: "{{ ilo_os_deploy_id }}"
      account_username: "{{ ilo_os_deploy_user_name }}"
      account_password: "{{ ilo_os_deploy_password }}"
      roleid: "Administrator"

  - name: Use the devops account to set the password of the os_deploy user
    community.general.redfish_command:
      category: Accounts
      command: UpdateUserPassword
      baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
      username: "{{ ilo_devops_user_name }}"
      password: "{{ ilo_devops_password }}"
      account_username: "{{ ilo_os_deploy_user_name }}"
      account_password: "{{ ilo_os_deploy_password }}"

This playbook uses the root account to create a devops account (if it does not exist), then updates the password of the devops account (if it needs to be updated), updates the root password (if it changed), and creates an os_deploy user to use for automation tasks.

This sets up all three as administrator accounts, but other security roles are available.

All of the variables are stored in the ansible/roles/idrac_host/vars/main.yml file and are encrypted using Ansible vault, so you can store your playbooks in Git without worrying about password leakage.

---
# ansible/roles/idrac_host/vars/main.yml

subdomain: dc.example.com

ilo_root_id: '1'
ilo_root_user_name: 'root'
ilo_root_initial_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          23437326137343536343735666633656266326666353366643633
          31326331613538303739623164313338396365666362623166613
          38336262306666646531663034333338396233363261323039430
          3261
ilo_root_password: "{{ ilo_root_initial_password }}"

ilo_devops_id: '3'
ilo_devops_user_name: 'devops'
ilo_devops_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          3163633564663962303270a303335353435396337393931643032
          3064303161616239390a333865376133346539333233365313566
          30383466656665643661564306330393461326438303332636633
          3255


ilo_os_deploy_id: '4'
ilo_os_deploy_user_name: 'os_deploy'
ilo_os_deploy_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          30383466656665643661373923164626564306338303332636633
          3163633563346466396230a303335353435396337393931643032
          30643039390a33386332343437632333535393136363565313566
          3262

If you want to update the root password, change ilo_root_password, run the playbook on all hosts to update the root password, then set ilo_root_initial_password to the new (encrypted) root password and set ilo_root_password back to "{{ ilo_root_initial_password }}".

Hope you find this useful.

Want to learn Ansible? Start on the Ansible Community Documentation page and just start automating your environment. Want to level-up your Ansible skills? I highly recommend the O’Reilly book Ansible: Up and Running.

post

Configure BIOS settings with Ansible and Redfish

If you’re using Ansible and trying to configure the BIOS settings of a bunch of hosts in a data center, take a look at Ansible’s community.general.redfish_config module.

The Redfish standard is a suite of specifications that deliver an industry standard protocol providing a RESTful interface for the management of servers, storage, networking, and converged infrastructure. In practice this means that if you have a host with iLO/iDRAC capabilities that also supports the Redfish standard (which includes most datacenter-class servers from Dell, Supermicro, Lenovo, HPE, Fujitsu, IBM, Cisco, etc.), then in addition to a UI where you can log in and configure the hardware, that host also has a Redfish API that accepts JSON payloads to configure the hardware.

The basic format of the Ansible play to change a BIOS setting is this:

- name: Make sure that SR-IOV is enabled
  community.general.redfish_config:
    category: Systems
    command: SetBiosAttributes
    baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
    username: "{{ ilo_username }}"
    password: "{{ ilo_password }}"
    bios_attributes:
      SriovGlobalEnable: "Enabled"
  register: update_sriov

- name: Schedule BIOS setting updates
  community.general.idrac_redfish_command:
    category: Systems
    command: CreateBiosConfigJob
    baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
    username: "{{ ilo_username }}"
    password: "{{ ilo_password }}"
  when: update_sriov.changed

In this case I’m changing the BIOS SriovGlobalEnable setting to “Enabled”. The baseuri is the DNS name or IP address of the iLO / iDRAC interface, and the username and password are the same user name and password that you use to log into iLO / iDRAC.

Once this play is applied to a host, if the host’s “SR-IOV Global Enable setting” wasn’t enabled before, the setting is now Enabled (pending reboot). The “Schedule BIOS setting updates” play ensures that the next time the host is rebooted the new BIOS setting will be applied. If you want to reboot immediately the community.general.redfish_command_module will let you do that too.

BIOS updated pending reboot

The hardest part about setting this up is figuring out what Ansible redfish_config expects the setting to be called. I could see in the IDRAC > Configuration > BIOS Settings that there was a “SR-IOV Global Enable” setting, but I had no idea what attribute name redfish_config used for that setting. Luckily, there’a a Redfish API that lists the BIOS settings current keys and values used by Redfish on your host. Just navigate to https://[your iLO/iDRAC IP or DNS name]/redfish/v1/Systems/System.Embedded.1/Bios and you’ll get a list of all of the BIOS setting keys and values.

Redfish API showing current BIOS settings on a host

I hope you find this useful.

post

Run a Linux systemd service during shutdown

I recently needed to add a cleanup service that runs at shutdown to a hundred AWS servers. My requirements were:

  • Run the script /usr/local/sbin/ec2-cleanup.sh when a VM shuts down (poweroff or reboot).
  • Send the output from the script to the syslog service.

So I needed to create a systemd service file that would call the script when the VM shuts down. This is the ec2-cleanup.service file I created:

# ec2-cleanup.service

[Unit]
Description=Run cleanup at shutdown
After=syslog.service network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStop=/usr/local/sbin/ec2-cleanup.sh
Restart=on-failure
RestartSec=1s

[Install]
WantedBy=multi-user.target

Type=oneshot means that the command runs once. Normally since this is a oneshot service the service would exit after the ExecStart command runs, but since I don’t want to do anything when the service starts, there is is no ExecStart command. That’s why I use RemainAfterExit=yes, which keeps the service running even though there’s no ExecStart command.

Finally I use ExecStop to run the command at shutdown time.

After=syslog.service network.target ensures that the ec2-cleanup.service doesn’t start until after syslog service is running and the network has started. More importantly, since systemd stops services in the reverse order that they’re started, this also ensures that syslog and the network service are still running when systemd runs the ec2-cleanup.service‘s ExecStop command.

Although there are many different available syslog services, most use “syslog” as a service alias, so After=syslog.service should work regardless of which syslog service you actually use. (e.g. If you use rsyslog this still works, because rsyslog declares syslog as an alias.)

Finally, I just needed to install the service on my AWS VMs, so I added this to an Ansible playbook that runs on my AWS VMs:

  - name: Install the ec2-cleanup.sh script
    copy:
      src: ec2-cleanup.sh
      dest: /usr/local/sbin/ec2-cleanup.sh
      owner: root
      group: root
      mode: 0755

  - name: Install a service to run ec2-cleanup.sh at shutdown
    copy:
      src: ec2-cleanup.service
      dest: /lib/systemd/system/ec2-cleanup.service
      owner: root
      group: root
      mode: 0644
    register: ec2_cleanup_service

  - name: Restart ec2-cleanup service if the service file changed
    systemd:
      name: ec2-cleanup
      daemon_reload: True
      state: restarted
    when: ec2_cleanup_service.changed

  - name: Enable ec2-cleanup service so it starts on boot
    systemd:
      name: ec2-cleanup
      enabled: True
      state: started

To verify that all of this works I ran the Ansible playbook on a VM, then logged in and checked the status of the service:

eruby@i-056ac231adeb1f930:~$ systemctl status ec2-cleanup
● ec2-cleanup.service - Run cleanup at shutdown
     Loaded: loaded (/lib/systemd/system/ec2-cleanup.service; enabled; vendor preset: enabled)
     Active: active (exited) since Tue 2023-03-14 17:04:37 UTC; 44s ago

Mar 14 17:04:37 i-056ac221aceb1f830 systemd[1]: Finished Run cleanup at shutdown.

The service is active (exited), which I expected (exited because ExecStart has completed, active because RemainAfterExit=yes is keeping the service running until shutdown.

If I reboot the VM and log back in I can check syslog with:

journalctl -u ec2-cleanup.service -n 20

… and see the last 20 lines of output from the script. The log output shows that the script ran when I rebooted.

Hope you find this useful.

post

Calculating the value for 64bitMMIOSizeGB

When adding a GPU to a vSphere VM using PCI passthrough there are a couple of additional settings that you need to make or your VM won’t boot.

When creating the VM you’ll need to set the Actions > Edit > VM Options > Boot Options > Firmware and select “EFI”. You need to do this before you install the operating system on the VM. If you don’t do this the GPUs won’t work and the VM won’t boot.

To add a GPU, in vCenter go to the VM, select Actions > Edit > Add New Device. Any GPUs set up as PCI passthrough devices should appear in a pick list. Add one or more GPUs to your VM.

Note that after adding one device, when you add additional GPUs the first GPU you selected still appears in the pick list. If you add the same GPU more than once your VM will not boot. If you add a GPU that’s being used by another running VM your VM will not boot. Pay attention to the PCI bus addresses displayed and make sure that the GPUs you pick are unique and not in use on another VM.

Finally you have to set up memory-mapped I/O (MMIO) to map system memory to the GPU’s framebuffer memory so that the CPU can pass data to the GPU. In vCenter go to the VM, select Actions > Edit > VM Options > Advanced > Edit configuration.

Once you’re on the Configuration parameters screen, add two more parameters:

pciPassthru.use64bitMMIO = TRUE
pciPassthru.64bitMMIOSizeGB = ????
Actions > Edit > VM Options > Advanced > Edit configuration

The 64bitMMIOSizeGB value is calculated by adding up the total GB of framebuffer memory on all GPUs attached to the VM.  If the total GPU framebuffer memory falls on a power-of-2, setting pciPassthru.64bitMMIOSizeGB to the next power of 2 works.

If the total GPU framebuffer memory falls between two powers-of-2, round up to the next power of 2, then round up again, to get a working setting.

Powers of 2 are 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 …

For example, two NVIDIA A100 cards with 40GB each = 80GB (in between 64GB and 128GB), so round up to the next power of 2 (128GB), then round up again to the next power of 2 after that (256GB) to get the correct setting. If you set it too low the VM won’t boot, but it won’t give you an error message telling you what the issue is either.

Here are some configurations that I’ve tested and verified:

  • 2 x 16GB NVIDIA V100 = 32GB, 32 is a power of 2, so round up to the next power of 2 which is 64, set pciPassthru.64bitMMIOSizeGB = 64 to boot.
  • 2 x 24GB NVIDIA P40 = 48GB, which is in-between 32 and 64, round up to 64 and again to 128, requires pciPassthru.64bitMMIOSizeGB = 128 to boot.
  • 8 x 16GB NVIDIA V100 = 128GB, 128 is a power of 2, so round up to the next power of 2 which is 256, set pciPassthru.64bitMMIOSizeGB = 256 to boot.
  • 10 x 16GB NVIDIA V100 = 160GB, which is in-between 128 and 256, round up to 256 and again to 512, set pciPassthru.64bitMMIOSizeGB = 512 to boot.

Hope you find this useful.