post

Configure BIOS settings with Ansible and Redfish

If you’re using Ansible and trying to configure the BIOS settings of a bunch of hosts in a data center, take a look at Ansible’s community.general.redfish_config module.

The Redfish standard is a suite of specifications that deliver an industry standard protocol providing a RESTful interface for the management of servers, storage, networking, and converged infrastructure. In practice this means that if you have a host with iLO/iDRAC capabilities that also supports the Redfish standard (which includes most datacenter-class servers from Dell, Supermicro, Lenovo, HPE, Fujitsu, IBM, Cisco, etc.), then in addition to a UI where you can log in and configure the hardware, that host also has a Redfish API that accepts JSON payloads to configure the hardware.

The basic format of the Ansible play to change a BIOS setting is this:

- name: Make sure that SR-IOV is enabled
  community.general.redfish_config:
    category: Systems
    command: SetBiosAttributes
    baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
    username: "{{ ilo_username }}"
    password: "{{ ilo_password }}"
    bios_attributes:
      SriovGlobalEnable: "Enabled"
  register: update_sriov

- name: Schedule BIOS setting updates
  community.general.idrac_redfish_command:
    category: Systems
    command: CreateBiosConfigJob
    baseuri: "{{ inventory_hostname }}.{{ subdomain }}"
    username: "{{ ilo_username }}"
    password: "{{ ilo_password }}"
  when: update_sriov.changed

In this case I’m changing the BIOS SriovGlobalEnable setting to “Enabled”. The baseuri is the DNS name or IP address of the iLO / iDRAC interface, and the username and password are the same user name and password that you use to log into iLO / iDRAC.

Once this play is applied to a host, if the host’s “SR-IOV Global Enable setting” wasn’t enabled before, the setting is now Enabled (pending reboot). The “Schedule BIOS setting updates” play ensures that the next time the host is rebooted the new BIOS setting will be applied. If you want to reboot immediately the community.general.redfish_command_module will let you do that too.

BIOS updated pending reboot

The hardest part about setting this up is figuring out what Ansible redfish_config expects the setting to be called. I could see in the IDRAC > Configuration > BIOS Settings that there was a “SR-IOV Global Enable” setting, but I had no idea what attribute name redfish_config used for that setting. Luckily, there’a a Redfish API that lists the BIOS settings current keys and values used by Redfish on your host. Just navigate to https://[your iLO/iDRAC IP or DNS name]/redfish/v1/Systems/System.Embedded.1/Bios and you’ll get a list of all of the BIOS setting keys and values.

Redfish API showing current BIOS settings on a host

I hope you find this useful.

post

Run a Linux systemd service during shutdown

I recently needed to add a cleanup service that runs at shutdown to a hundred AWS servers. My requirements were:

  • Run the script /usr/local/sbin/ec2-cleanup.sh when a VM shuts down (poweroff or reboot).
  • Send the output from the script to the syslog service.

So I needed to create a systemd service file that would call the script when the VM shuts down. This is the ec2-cleanup.service file I created:

# ec2-cleanup.service

[Unit]
Description=Run cleanup at shutdown
After=syslog.service network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStop=/usr/local/sbin/ec2-cleanup.sh
Restart=on-failure
RestartSec=1s

[Install]
WantedBy=multi-user.target

Type=oneshot means that the command runs once. Normally since this is a oneshot service the service would exit after the ExecStart command runs, but since I don’t want to do anything when the service starts, there is is no ExecStart command. That’s why I use RemainAfterExit=yes, which keeps the service running even though there’s no ExecStart command.

Finally I use ExecStop to run the command at shutdown time.

After=syslog.service network.target ensures that the ec2-cleanup.service doesn’t start until after syslog service is running and the network has started. More importantly, since systemd stops services in the reverse order that they’re started, this also ensures that syslog and the network service are still running when systemd runs the ec2-cleanup.service‘s ExecStop command.

Although there are many different available syslog services, most use “syslog” as a service alias, so After=syslog.service should work regardless of which syslog service you actually use. (e.g. If you use rsyslog this still works, because rsyslog declares syslog as an alias.)

Finally, I just needed to install the service on my AWS VMs, so I added this to an Ansible playbook that runs on my AWS VMs:

  - name: Install the ec2-cleanup.sh script
    copy:
      src: ec2-cleanup.sh
      dest: /usr/local/sbin/ec2-cleanup.sh
      owner: root
      group: root
      mode: 0755

  - name: Install a service to run ec2-cleanup.sh at shutdown
    copy:
      src: ec2-cleanup.service
      dest: /lib/systemd/system/ec2-cleanup.service
      owner: root
      group: root
      mode: 0644
    register: ec2_cleanup_service

  - name: Restart ec2-cleanup service if the service file changed
    systemd:
      name: ec2-cleanup
      daemon_reload: True
      state: restarted
    when: ec2_cleanup_service.changed

  - name: Enable ec2-cleanup service so it starts on boot
    systemd:
      name: ec2-cleanup
      enabled: True
      state: started

To verify that all of this works I ran the Ansible playbook on a VM, then logged in and checked the status of the service:

eruby@i-056ac231adeb1f930:~$ systemctl status ec2-cleanup
● ec2-cleanup.service - Run cleanup at shutdown
     Loaded: loaded (/lib/systemd/system/ec2-cleanup.service; enabled; vendor preset: enabled)
     Active: active (exited) since Tue 2023-03-14 17:04:37 UTC; 44s ago

Mar 14 17:04:37 i-056ac221aceb1f830 systemd[1]: Finished Run cleanup at shutdown.

The service is active (exited), which I expected (exited because ExecStart has completed, active because RemainAfterExit=yes is keeping the service running until shutdown.

If I reboot the VM and log back in I can check syslog with:

journalctl -u ec2-cleanup.service -n 20

… and see the last 20 lines of output from the script. The log output shows that the script ran when I rebooted.

Hope you find this useful.

post

The Right Way to reboot a host with Ansible

For a long time rebooting a host with Ansible has been tricky. The steps are:

  • ssh to the host
  • Reboot the host
  • Disconnect before the host closes your ssh connection
  • Wait some number of seconds to ensure the host has really shut down
  • Attempt to ssh to the host and execute a command
  • Repeat ssh attempt until it works or you give up

Seems clear enough, but if you Google for an answer you may end up at this StackExchange page that gives lots of not-quite-correct answers from 2015 (and one correct answer). Some people suggest checking port 22, but just because ssh is listening doesn’t mean that it’s at state where it’s accepting connections.

The correct answer is use Ansible version 2.7 or greater. 2.7 introduced the reboot command, and now all you have to do is add this to your list of handlers:

- name: Reboot host and wait for it to restart
  reboot:
    msg: "Reboot initiated by Ansible"
    connect_timeout: 5
    reboot_timeout: 600
    pre_reboot_delay: 0
    post_reboot_delay: 30
    test_command: whoami

This handler will:

  • Reboot the host
  • Wait 30 seconds
  • Attempt to connect via ssh and run whoami
  • Disconnect after 5 seconds if it ssh isn’t working
  • Keep attempting to connect for 10 minutes (600 seconds)

Add the directive:

  notify: Reboot host and wait for it to restart

… to any Ansible command that requires a reboot after a change. The host will be rebooted when the playbook finishes, then Ansible will wait until the host is back up and ssh is working before continuing on to the next playbook.

If you need to reboot halfway through a playbook you can force all handlers to execute with the command:

- name: Reboot if necessary
  meta: flush_handlers

I sometimes do that to change something, force a reboot, then verify that the change worked, all within the same playbook.

Hope you found this useful.