Why adding a .conf or .cfg file to /etc/sudoers.d doesn’t work

I needed to add some sudo access rights for support personnel on about a hundred Centos 6.6 servers. Currently no one one these hosts had sudo rights, so the /etc/sudoers file was the default file. I’m using Ansible to maintain these hosts, but rather than modify the default /etc/sudoers file using Ansible’s lineinfile: command, I decided to create a support.conf file and use Ansible’s copy: command to copy that file into /etc/sudoers.d/. That way if a future version of Centos changes the /etc/sudoers file I’m leaving that file untouched, so my changes should always work.

  - name: Add custom sudoers
    copy: src=files/support.conf dest=/etc/sudoers.d/support.conf owner=root group=root mode=0440 validate='visudo -cf %s'

The support.conf file I created copied over just fine, and the validation step of running “visudo -cf” on the file before moving it into place claimed that the file was error-free and should work just fine as a sudoers file.

I logged in as the support user and it didn’t work:

[support@c1n1 ~]$ sudo /bin/ls /var/log/*
support is not in the sudoers file.  This incident will be reported.

Not only did it not work, it was telling me that the support user wasn’t even in the file, which they clearly were.

After Googling around a bit and not finding much I saw this in the Sudoers Manual:

sudo will read each file in /etc/sudoers.d, skipping file names that end in ‘~’ or contain a ‘.’ character to avoid causing problems with package manager or editor temporary/backup files.

sudo was skipping the file because the file name contained a period!

I changed the name of the file from support.conf to support and it worked.

  - name: Add custom sudoers
    copy: src=files/support dest=/etc/sudoers.d/support owner=root group=root mode=0440 validate='visudo -cf %s'

Hope you find this useful.

Here’s a snippet from /etc/sudoers.d/support if you’re interested. The “support” user has already been created by a separate Ansible command.

# Networking
Cmnd_Alias NETWORKING = /sbin/route, /sbin/ifconfig, /bin/ping, /sbin/dhclient, /usr/bin/net, /sbin/iptables, /usr/bin/rfcomm, /usr/bin/wvdial, /sbin/iwconfig, /sbin/mii-tool

# Installation and management of software
Cmnd_Alias SOFTWARE = /bin/rpm, /usr/bin/up2date, /usr/bin/yum

# Services
Cmnd_Alias SERVICES = /sbin/service, /sbin/chkconfig

# Reading logs
Cmnd_Alias READ_LOGS = /usr/bin/less /var/log/*, /bin/more /var/log/*, /bin/ls /var/log/*, /bin/ls /var/log

support  ALL = NETWORKING, SOFTWARE, SERVICES, READ_LOGS
Share Button

Restarting network interfaces in Ansible

I’m using Ansible to set up the network interface cards of multiple racks of storage servers running Centos 6.6. Each server has four network interfaces to configure, a public 1GbE interface, a private 1GbE interface, and two 10GbE interfaces that are set up as a bonded 20GbE interface with two VLANs assigned to the bond.

If Ansible changes an interface on a server it calls a handler to restart the network interfaces so the changes go into effect. However, I don’t want the network interfaces of every single server in a cluster to restart at the same time, so at the beginning of my network.yml playbook I set:

  serial: 1

That way Ansible just updates the network config of one server at a time.

Also, if there are any failures I want Ansible to stop immediately, so if I screwed something up I don’t take out the networking to every computer in the cluster. For this reason I also set:

max_fail_percentage: 1

If a change is made to an interface I’ve been using the following handler to restart the interface:

- name: Restart Network
  service: name=network state=restarted

That works, but about half the time Ansible detects a failure and drops out with an error, even though the network restarted just fine. Checking the server immediately after Ansible says that there’s an error shows that the server is running and it’s network interfaces were configured correctly.

This behavior is annoying since you have to restart the entire playbook after one server fails. If you’re configuring many racks of servers and the network setup is just updating one server at a time I’d end up having to restart the playbook a half dozen times to get through it, even though nothing was actually wrong.

At first I thought that maybe the ssh connection was dropping (I was restarting the network after all) but you can log in via ssh and restart the network and never lose the connection, so that wasn’t the problem.

The connection does pause as the interface that you’re ssh-ing in over resets, but the connection comes right back.

I wrote a short script to repeatedly restart the network interfaces and check the exit code returned, but the exit code was always 0, “no errors”, so network restart wasn’t reporting an error, but for some reason Ansible thought there was a failure.

There’s obviously some sort of timing issue causing a problem, where Ansible is checking to see if all is well, but since the network is being reset the check times out.

I initially came up with this workaround:

- name: Restart Network
  shell: service network restart; sleep 3

That fixes the problem, however, since “sleep 3” will always exit with a 0 exit code (success), Ansible will always think this worked even when the network restart failed. (Ansible takes the last exit code returned as the success/failure of the entire shell operation.) If “service network restart” actually does fail, I want Ansible to stop processing.

In order to preserve the exit code, I wrote a one-line Perl script that restarts the network, sleeps 3 seconds, then exits with the same exit code returned by “service network restart”.

- name: Restart Network
  # Restart the network, sleep 3 seconds, return the
  # exit code returned by "service network restart".
  # This is to work-around a glitch in Ansible where
  # it detects a successful network restart as a failure.
  command: perl -e 'my $exit_code = system("service network restart"); sleep 3; $exit_code = $exit_code >> 8; exit($exit_code);'

Now Ansible grinds through the network configurations of all of the hosts in my racks without stopping.

Hope you find this useful.

Share Button