For a long time rebooting a host with Ansible has been tricky. The steps are:
- ssh to the host
- Reboot the host
- Disconnect before the host closes your ssh connection
- Wait some number of seconds to ensure the host has really shut down
- Attempt to ssh to the host and execute a command
- Repeat ssh attempt until it works or you give up
Seems clear enough, but if you Google for an answer you may end up at this StackExchange page that gives lots of not-quite-correct answers from 2015 (and one correct answer). Some people suggest checking port 22, but just because ssh is listening doesn’t mean that it’s at state where it’s accepting connections.
The correct answer is use Ansible version 2.7 or greater. 2.7 introduced the reboot command, and now all you have to do is add this to your list of handlers:
- name: Reboot host and wait for it to restart reboot: msg: "Reboot initiated by Ansible" connect_timeout: 5 reboot_timeout: 600 pre_reboot_delay: 0 post_reboot_delay: 30 test_command: whoami
This handler will:
- Reboot the host
- Wait 30 seconds
- Attempt to connect via ssh and run whoami
- Disconnect after 5 seconds if it ssh isn’t working
- Keep attempting to connect for 10 minutes (600 seconds)
Add the directive:
notify: Reboot host and wait for it to restart
… to any Ansible command that requires a reboot after a change. The host will be rebooted when the playbook finishes, then Ansible will wait until the host is back up and ssh is working before continuing on to the next playbook.
If you need to reboot halfway through a playbook you can force all handlers to execute with the command:
- name: Reboot if necessary meta: flush_handlers
I sometimes do that to change something, force a reboot, then verify that the change worked, all within the same playbook.
Hope you found this useful.
Thank you for the reboot section.
My environment:
Platform : Vagrant
Instance : VM running on Virtual box
Initiated ansible playbook from localhost.
Followed your steps and noticed issue as below:
fatal: [localhost]: FAILED! => {
“changed”: false,
“elapsed”: 0,
“msg”: “Running reboot with local connection would reboot the control node.”,
“rebooted”: false
}
Is there an issue with config. Why all other commands work on remote machine but this reboot command considered to be executing on localhost ?
Because an Ansible session cannot survive after a reboot to check the result if the control node is the same as the managed node, as a result, Ansible will not allow you to use the reboot module.
Can You share ypur reboot task code
Thanks a lot… very useful to me.
Excellent snippet! One question, does this support running asynchronous? I want to reboot my servers automatically one by one, without taking down my kubernetes cluster.
For a Kubernetes cluster, set “serial: 1” in the playbook that calls the handler, so it only runs on one node at a time, and add a health check to verify that all nodes are back on-line before advancing to the next node.