Running a 6-node CockroachDB cluster with AWS EFS storage

CockroachDB is new distributed database which, like its namesake, is really hard to kill.

CockroachDB implements SQL DML commands for creating schemas, tables, and indexes using the same syntax as PostgreSQL, and it supports the PostgreSQL wire protocol, meaning that any PostgreSQL database driver or client can be used to connect to a CockroachDB database. If you’re currently using PostgresSQL and you want an easier scale-out, highly-available way to deploy a database, you should take a look at CockroachDB. In many cases you can just repoint your application at a CockroachDB server and your application will run the same as it did using PostgreSQL.

The first day I tried using CockroachDB I got a six-node system up and running using CockroachDB’s Docker image on my Apcera cluster using AWS EFS as a backing store in less than an hour. This is what I did to get it working.

Set up an NFS provider for EFS

I already had an Apcera cluster for deploying Docker images running on AWS. This is the same cluster I used for my article on Mounting AWS EFS volumes inside Docker Containers. In fact, I set up the EFS provider using the same steps:

Set up the EFS volume using the AWS console.

Create an NFS provider that targets the EFS volume.

apc provider register apcfs-ha --type nfs \
    --url "nfs://10.0.0.112/" \
    --description 'Amazon EFS' \
    --batch \
    -- --version 4.1

Create a namespace and a private network

Create a namespace and a private network named “roachnet”.

apc namespace /sandbox/cockroach
apc network create roachnet

“roachnet” is a private VxLAN created by the Apcera platform that only containers that I’ve joined to the network can see.

Create the first CockroachDB node

Next I create a container instance called “roach1” from the latest Docker image, open ports 8080 and 26257, tell it to use the EFS provider for storage, and to advertise itself to other CockroachDB nodes so they can find it and join the DB cluster.

apc docker run roach1 --image cockroachdb/cockroach:v1.1.2 --port 8080 --port 26257 --provider /apcera/providers::apcfs-ha --start-cmd "/cockroach/cockroach.sh start --insecure --advertise-host roach1.apcera.local"
apc network join roachnet --job roach1 --discovery-address roach1
apc app start roach1
apc route add http://cockroach.earlruby.apcera-platform.io --https-only --app roach1 --port 8080 --batch

Create 5 more nodes

Create 5 more nodes and add them to roachnet:

for x in `seq 2 6`; do
    apc docker run roach$x --image cockroachdb/cockroach:v1.1.2 --port 8080 --port 26257 --provider  /apcera/providers::apcfs-ha --start-cmd "/cockroach/cockroach.sh start --insecure --join roach1.apcera.local:26257"
    apc network join roachnet --job roach$x --discovery-address roach$x
    apc app start roach$x
    sleep 3
done

I added the “sleep 3” command because when I originally tested this (on CockroachDB 1.1.0) the platform started the containers so fast that the DB got confused and didn’t add all of them to the cluster. All nodes started, but only some joined the cluster. After I added the delay all nodes joined the cluster.

Verify that the containers are all running.:

After that the cluster was up and running. I could connect to the database, create schemas, create tables, add, update, and delete records. I’m pretty happy with the initial results. Next step is automatically generating secure certificates so I’m not operating in insecure mode, then I’m going to run actual applications against the cluster.

Hope you found this useful.

CockroachDB overview screen

CockroachDB Storage Screen

CockroachDB Queues

 

Share Button

Quickly get IP addresses of new VMs

I spin up a lot of VMs using VMware Fusion. I generally keep “clean” generic copies of a few different distros and versions of Linux servers ready to go with my login, an sshd server, ssh keys, and basic settings that I use already set up. When I need to quickly test something manually — usually some new, multi-VM distributed container orchestration or database system — I just make as many copies of the server’s *.vmwarevm file as I need, fire up the VM copies on my laptop, test whatever I need to test, then shut them down. Eventually I delete the copies and recover the disk space.

Depending on where my laptop is running I’ll get a completely random IP address for the VM from the local DHCP server. I would log into the consoles, get the IPs, then log into the various VMs from a terminal. (Cut and paste just works a whole lot better on a terminal than on the VMware console.)

However, since the console screens are up, and I repeat this pattern several times a week, I figured why not save a step and make the ephemeral VMs just show me their IP address on their consoles without having to login, so I added an “on reboot” file called /etc/cron.d/welcome on the master image which updates the /etc/issue file.

/etc/cron.d/welcome looks like this:

@reboot root (/bin/hostname; /bin/uname -a; echo; if [ -x /sbin/ip ]; then /sbin/ip addr; else /sbin/ifconfig; fi) > /etc/issue

When a new VM boots, it writes the hostname, kernel info, and the ethernet config to the /etc/issue file. /etc/issue is displayed on the screen before the login prompt, so now I can just glance at the console, see the IP address, and ssh to the new VM.

Ephemeral VM

Although you’d never want to do this on a production system, it works great for ephemeral, throw-away test VMs.

Hope you find this useful.

Share Button

Mounting AWS EFS volumes inside Docker Containers

Amazon announced the development of the Amazon Elastic File System (AWS EFS) in 2015. EFS was designed to provide multiple EC2 instances with shared, low-latency access to a fully-managed file system. On June 28, 2016 Amazon announced that EFS is now available for production use in the US East (Northern Virginia), US West (Oregon), and Europe (Ireland) Regions.

Apcera‘s NFS Service Gateway can be used to access AWS EFS storage volumes within containers. You can use EFS to provide persistent storage to your containers running on AWS-hosted clouds in regions where EFS is available.

Gathering information

Before you begin you will need to know:

  • The name of the AWS Region where your Apcera Platform is running
  • The name/ID of the AWS VPC where your Apcera Platform is running
  • The name/ID of the AWS security group for your Apcera Platform

Setting up an EFS volume

  1. Log into your AWS console.
  2. Select the name of the AWS Region where your Apcera Platform is running on the upper right side of the screen.
  3. Select Elastic File System.
  4. Click Create File System.
  5. Configure the file system access:
    • Select the name of the VPC.
    • The availability zone and subnet should be selected for you automatically.
    • If your VPC has more than one subnet (unusual) then select the subnet containing the Instance Managers that will be connecting to the EFS volume.
    • Leave IP address set to Automatic.
    • The first EFS volume you create will create a new security group. Use that security group for this and all future EFS volumes. Write down the name of the new EFS security group – we’ll configure it in the next few steps.
    • Click Next Step
  6. Configure optional settings:
    • Set the name of the EFS volume.
    • Choose the performance mode.
    • Click Next Step
      mounting-aws-with-esf-inside-docker
  7. Review and create:
  8. If everything looks OK, click Create File System.
    mounting-aws-with-esf-inside-docker
  9. You should see a “Success!” message and a new EFS volume with “Life Cycle State” = “Creating”.
  10. Write down the IP address of the EFS volume.

Update the EFS security group

  • Go back to the main console menu and select EC2.
  • Click Security Groups in the left hand nav menu.
  • Type the name of the new EFS security group into the search filter list.
  • On the bottom half of the screen delete the default inbound and outbound rules.
  • Add one inbound rule to allow all TCP traffic on port 2049 from the source “name/ID of the AWS security group for your Apcera Platform”
  • Add one outbound rule to allow all TCP traffic on port 2049 to the destination “name/ID of the AWS security group for your Apcera Platform”
  • This allows all VMs within your Apcera Platform security group to connect to your EFS volume on port 2049 (NFS).
  • No other traffic from any other source or to any other destination is allowed.|

Create an NFS Provider for the EFS volume

We’re going to create a single provider for the EFS volume. Each time you have a container or set of containers that need a persistent file system, just create a new service from the same provider. Each new service will carve out a new namespace on the EFS volume, keeping the files associated with that service separate from the files in all other services that use the same provider.

According to the EFS FAQ, When you create a file system, you create endpoints in your VPC called “mount targets.” Each mount target provides an IP address and a DNS name, and you use this IP address or DNS name in your mount command. Only resources that can access a mount target can access your file system. Since the Apcera Platform isn’t using Amazon DNS services internally, we’ll use the IP address to connect to the EFS volume.

To create the provider, you need to construct a URL describing the volume. In this case, we’ll use the internal IP address of the EFS volume as the hostname and / as the exported volume name. All EFS volumes use the NFS v4.1 protocol. If the IP address of the EFS volume is 10.0.0.112 we’d construct a provider using:

apc provider register awsefs --type nfs \
    --url "nfs://10.0.0.112/" \
    --description 'Amazon EFS' \
    --batch \
    -- --version 4.1

Create a service from the provider:

apc service create efs-service-1 \
    --provider awsefs \
    --description 'Amazon EFS Service' \
    --batch

Create a capsule, bind the service to the capsule, and connect to the capsule:

apc capsule create efs-capsule1 --image linux -ae --batch
apc service bind efs-service-1 --job efs-capsule1 \
    --batch -- --mountpath /an/unlimited/supply
apc capsule connect efs-capsule1

Once connected, type df -k to see the mounted file system.

You can bind this service to any container that needs a shared, persistent file system. Each time you need a new shared, persistent file system for a container or group of containers just create a new service using the same provider and bind the service to your job or jobs.

Persistence for Docker

Now that we have a provider that can carve out EFS storage for containers, let’s try spinning up some Docker images.

On the Apcera Platform, if the specification for a Docker image (Dockerfile) specifies that the app requires persisted volumes, you must do one of the following when creating the job:

  • Include the –provider flag when you create or run the Docker job. You must include this flag if you include the –volume flag when creating or running the Docker job.
  • Include the –ignore-volumes flag when you create or run the Docker job.

Here is an example of running NGINX inside a Docker container on the Apcera platform, where the content for the site is stored on an EFS volume:

I’m using the Apcera “apc” command-line tool to build the container, pulling the nginx image directly off hub.docker.com, telling it to use the awsefs EFS volume provider I created earlier for persistence, and to mount the EFS volume at the mount point “/usr/share/nginx/html”.

Now connect to the container:

/proc/mounts contains a list of all of the container’s mount points. I can verify that the container does indeed have an EFS volume by grepping /proc/mounts for the mount point:

Grepping for “/usr/share/nginx/html” shows the IP address 10.0.0.112, which is the IP of the EFS volume, the log directory name after is the unique namespace for the service, the mountpoint is “/usr/share/nginx/html”, and the mount type “nfs4”.

There is no content in the directory, so I add some by echoing some HTML code to an index.html file. My container will proclaim to the world “NGINX in a Docker container on Apcera with content stored on EFS” in an H3 typeface!

Now that I have some content I need to add a route to the content. Right now the NGINX container is running, and listening on ports 80 and 443, but it’s completely isolated from the outside world — no one can connect to those ports unless there’s a route (a URL) set up.

My cluster is running on the domain earlruby.apcera-platform.io, so I add a route like so:

I have successfully added the http route http://nginx.earlruby.apcera-platform.io/ to my NGINX container. This is a real public DNS entry. To verify that it works I point my browser at the route I just added:

Success!

Such an amazing app is bound to go viral, and a single NGINX container may not be able to keep up with the load. I want to ensure that my app can keep up and remain highly-available, and that it keeps running even if one or more VMs in my cluster get killed off, so I add more NGINX containers:

Now I’ve got 20 containers running my NGINX app, all serving up the same content, running on multiple VMs across my cluster, all load-balanced under the single URL http://nginx.earlruby.apcera-platform.io/. If any container gets killed off, the Apcera platform will spin up a new one. If any VM in the cluster dies, any containers running on it will automatically be migrated to new hosts. If I want to scale up the app to 100 or 1000 containers, or back down to 1, it’s a one-line command to make the change.

In terms of resources, I’m using slightly less than 45 MiB to run those 20 containers. That’s not a typo — 45MiB! Containers are much more efficient users of RAM than VMs.

I hope you find this useful.

This article originally appeared as an Apcera blog post on July 21, 2016.

Share Button

5 User Engagement Problems Twitter Should Fix this Week

Twitter’s mobile app needs help.

Of all of the social networks, engagement on Twitter is dismally low. Even the people who like the app don’t spend nearly as much time on Twitter as they do on other social media. There are some obvious problems with the app that Twitter could fix, but they don’t.

iPhone with badged icons

Which one will you click?

Until a few months ago, Twitter’s iPhone app didn’t support badge notifications. A badge is the small red number that appears on the app’s icon letting you know that you have Notifications. Twitter’s iPhone app didn’t have them. You could look at your phone’s screen and see that Facebook, LinkedIn, MeetUp, and NextDoor had messages waiting for you, but not Twitter. A glance at your screen and the small red numbers taunt you – Check FaceBook! Check your email! Check Messages! Badges are a simple way to get you to start up that app and engage.

1. Fix Notifications

With a recent update, Twitter finally added badge notifications. Only problem is, they don’t actually work. The badge will appear with a “2” on it, I’ll start the Twitter app, and the “Notifications” icon indicates that there’s something new. I click it, and there are no updates. I can check the “Me” link and see that I have 2 more followers, but they’re not listed on the Notifications screen. If I log into the TweetDeck web app I can see who the new followers are, but the iPhone mobile app pretends they don’t exist.

2. Make it easy to engage with friends

Ever have a conversation with a friend on Twitter? It’s next to impossible to follow replies, comments, or have any sort of conversation using the tools they provide on their mobile app. If Twitter wants to increase user engagement, they should get rid of the Messages tab and make it a “Mentions” tab that shows private messages and allows for threaded conversations. Alternately they could add a “swipe left” feature or even a “view replies” icon to view the replies to a message, in threaded order. Make it possible for people to have a conversation about the things that they’re posting, and they’ll stay on engaged for longer.

3. Show me what posts are trending

The Home button shows me the latest messages from everyone I’m following in order by time posted. What if I want to see the posts with the most retweets? Or the most hearts? Or the most replies? You know, the messages from the people I’m following that are the most interesting/funny/relevant? Get rid of the “Moments” section and give me a “Trending” section that shows the items from the people I follow with the most retweets, likes, and replies. I guarantee I’ll spend more time in that section than I do looking at “Moments”.

4. Load more items in the “Home” feed

I use an iPhone with “service” from AT&T. I also ride BART, which means that I spend about half my commute with no data service. (Newsflash to AT&T: People on trains spend most of their time on their phones. If you cared about your customers you’d send a tech to ride a train with a signal strength meter a couple of times a year and fix the dead zones.)

Since mobile data service is spotty, you’d think the Twitter app would start downloading items for my “Home” feed as whenever I have a signal, so I’d never run out of items to read. Unfortunately that’s not the case, and I routinely hit the end of the list of things on “Home” to read just as BART enters another AT&T dead zone. I sit there watching the spinner for a few seconds, then quit Twitter and load another app – one that was smart enough to download content in the background so it’s ready for me to view.

5. Cache some damn profile pictures

I only follow 361 people on Twitter. Each one of them has a small profile picture that rarely gets updated, so why does Twitter download and render a the same, identical profile photos every time I open the app? I’ll be scrolling along, and I can see it download and render the photos one by one. If I’m in an AT&T dead zone, I’ll just see a bunch of empty boxes instead of profile pictures in my feed. How hard could it be to cache a copy of the photos on my phone? The app can always check for new photos and update them if one is available, so why is it downloading them every time I open the app? If I don’t have a connection at the moment it’s OK to show me someone’s 24-hour-old profile picture – it’s better than showing me an empty box.

That’s it. 5 simple things that Twitter engineers could fix this week to increase the amount of time people spend using their mobile app.


Full disclosure: I own shares of Twitter stock. If someone at Twitter fixed these problems I might be making less of of a loss on those shares. In addition, the Twitter stockholders meeting is this week. I won’t be there, but if you are feel free to share this article with the people in attendance.

Share Button

Policy-based Cloud Storage

This is a talk I gave last week at the SF Microservices Meetup titled Policy-based Cloud Storage, Persisting Data in a Multi-Site, Multi-Cloud World. In it I cover Apcera‘s approach to storage for containers and how to use policy to manage very large scale application deployments.

Share Button

Adding a LUKS-encrypted iSCSI volume to Synology DS414 NAS and Ubuntu 15.04

I have an Ubuntu 15.04 “Vivid” workstation already set up with LUKS full disk encryption, and I have a Synology DS414 NAS with 12TB raw storage on my home network. I wanted to add a disk volume on the Synology DS414 that I could mount on the Ubuntu server, but NFS doesn’t support “at rest” encrypted file systems, and using EncFS over NFS seemed like the wrong way to go about it, so I decided to try setting up an iSCSI volume and encrypting it with LUKS. Using this type of setup, all data is encrypted both “on the wire” and “at rest”.

Log into the Synology Admin Panel and select Main Menu > Storage Manager:

  • Add an iSCSI LUN
    • Set Thin Provisioning = No
    • Advanced LUN Features = No
    • Make the volume as big as you need
  • Add an iSCSI Target
    • Use CHAP authentication
    • Write down the login name and password you choose

On your Ubuntu box switch over to a root prompt:

sudo /bin/bash

Install the open-iscsi drivers. (Since I’m already running LUKS on my Ubuntu box I don’t need to install LUKS.)

apt-get install open-iscsi

Edit the conf file

vi /etc/iscsi/iscsid.conf

Edit these lines:

node.startup = automatic
node.session.auth.username = [CHAP user name on Synology box]
node.session.auth.password = [CHAP password on Synology box]

Restart the open-iscsi service:

service open-iscsi restart
service open-iscsi status

Start open-iscsi at boot time:

systemctl enable open-iscsi

Now find the name of the iSCSI target on the Synology box:

iscsiadm -m discovery -t st -p $SYNOLOGY_IP
iscsiadm -m node

The target name should look something like “iqn.2000-01.com.synology:boxname.target-1.62332311”

Still on the Ubuntu workstation, log into the iSCSI target:

iscsiadm -m node --targetname "$TARGET_NAME" --portal "$SYNOLOGY_IP:3260" --login

Look for new devices:

fdisk -l

At this point fdisk should show you a new block device which is the iSCSI disk volume on the Synology box. In my case it was /dev/sdd.

Partition the device. I made one big /dev/sdd1 partition, type 8e (Linux LVM):

fdisk /dev/sdd

Set up the device as a LUKS-encrypted device:

cryptsetup --verbose --verify-passphrase luksFormat /dev/sdd1

Open the LUKS volume:

cryptsetup luksOpen /dev/sdd1 backupiscsi

Create a physical volume from the LUKS volume:

pvcreate /dev/mapper/backupiscsi

Add that to a new volume group:

vgcreate ibackup /dev/mapper/backupiscsi

Create a logical volume within the volume group:

lvcreate -L 1800GB -n backupvol /dev/ibackup

Put a file system on the logical volume:

mkfs.ext4 /dev/ibackup/backupvol

Add the logical volume to /etc/fstab to mount it on startup:

# Synology iSCSI target LUN-1
/dev/ibackup/backupvol /mnt/backup ext4 defaults,nofail,nobootwait 0 6

Get the UUID of the iSCSI drive:

ls -l /dev/disk/by-uuid | grep sdd1

Add the UUID to /etc/crypttab to be automatically prompted for the decrypt passphrase when you boot up Ubuntu:

backupiscsi UUID=693568ca-9334-4c19-8b01-881f2247ae0d none luks

If you found this interesting, you might want to check out my article Adding an external encrypted drive with LVM to Ubuntu Linux.

Hope you found this useful.

Share Button

Why adding a .conf or .cfg file to /etc/sudoers.d doesn’t work

I needed to add some sudo access rights for support personnel on about a hundred Centos 6.6 servers. Currently no one one these hosts had sudo rights, so the /etc/sudoers file was the default file. I’m using Ansible to maintain these hosts, but rather than modify the default /etc/sudoers file using Ansible’s lineinfile: command, I decided to create a support.conf file and use Ansible’s copy: command to copy that file into /etc/sudoers.d/. That way if a future version of Centos changes the /etc/sudoers file I’m leaving that file untouched, so my changes should always work.

  - name: Add custom sudoers
    copy: src=files/support.conf dest=/etc/sudoers.d/support.conf owner=root group=root mode=0440 validate='visudo -cf %s'

The support.conf file I created copied over just fine, and the validation step of running “visudo -cf” on the file before moving it into place claimed that the file was error-free and should work just fine as a sudoers file.

I logged in as the support user and it didn’t work:

[support@c1n1 ~]$ sudo /bin/ls /var/log/*
support is not in the sudoers file.  This incident will be reported.

Not only did it not work, it was telling me that the support user wasn’t even in the file, which they clearly were.

After Googling around a bit and not finding much I saw this in the Sudoers Manual:

sudo will read each file in /etc/sudoers.d, skipping file names that end in ‘~’ or contain a ‘.’ character to avoid causing problems with package manager or editor temporary/backup files.

sudo was skipping the file because the file name contained a period!

I changed the name of the file from support.conf to support and it worked.

  - name: Add custom sudoers
    copy: src=files/support dest=/etc/sudoers.d/support owner=root group=root mode=0440 validate='visudo -cf %s'

Hope you find this useful.

Here’s a snippet from /etc/sudoers.d/support if you’re interested. The “support” user has already been created by a separate Ansible command.

# Networking
Cmnd_Alias NETWORKING = /sbin/route, /sbin/ifconfig, /bin/ping, /sbin/dhclient, /usr/bin/net, /sbin/iptables, /usr/bin/rfcomm, /usr/bin/wvdial, /sbin/iwconfig, /sbin/mii-tool

# Installation and management of software
Cmnd_Alias SOFTWARE = /bin/rpm, /usr/bin/up2date, /usr/bin/yum

# Services
Cmnd_Alias SERVICES = /sbin/service, /sbin/chkconfig

# Reading logs
Cmnd_Alias READ_LOGS = /usr/bin/less /var/log/*, /bin/more /var/log/*, /bin/ls /var/log/*, /bin/ls /var/log

support  ALL = NETWORKING, SOFTWARE, SERVICES, READ_LOGS
Share Button

Use Web of Trust (WOT) to thwart scammy web sites

My friend Shannon Phillips recently updated her Facebook status with:

Word to travelers: do not book hotel rooms through TripAdvisor. They will funnel you through sketchy third-party sites (“Amoma” is the one who burned me) who advertise made-up rates, take your money, and then get back in touch two weeks later to tell you oopsie, they can’t make a reservation at that hotel after all.

I guess it’s a nice scam while it lasts, but in this age of networked, instant word-of-mouth reviews, that kind of business model won’t hold up long.

I suggested Shannon try installing the Web of Trust (WOT) plug-in for her browser. I use it in all of mine, and it’s stopped scam sites from being loaded into my browser.

WOT works for the web like Waze works for driving. Here’s the explanation from the Web of Trust home page:

WOT displays a colored traffic light next to website links to show you which sites people trust for safe searching, surfing and shopping online: green for good, red for bad, and yellow as a warning to be cautious. The icons are shown in popular search engine results, social media, online email, shortened URL’s, and lots of other sites.

The cool part is, the rating is based on the aggregate ratings of all of the people who use a plug-in. Get burned by a site? Click the WOT icon and rate the site as untrustworthy. Have an excellent experience? Click the WOT icon and rate the site as trustworthy. The more that people use it, the more accurate and reliable the ratings become.

If a site is really untrustworthy, WOT will stop your browser from loading the site unless you tell it that you really want to go to that site. You can still go anywhere you want, but you’ll be warned about sites that others have had problems with.

Share Button

Restarting network interfaces in Ansible

I’m using Ansible to set up the network interface cards of multiple racks of storage servers running Centos 6.6. Each server has four network interfaces to configure, a public 1GbE interface, a private 1GbE interface, and two 10GbE interfaces that are set up as a bonded 20GbE interface with two VLANs assigned to the bond.

If Ansible changes an interface on a server it calls a handler to restart the network interfaces so the changes go into effect. However, I don’t want the network interfaces of every single server in a cluster to restart at the same time, so at the beginning of my network.yml playbook I set:

  serial: 1

That way Ansible just updates the network config of one server at a time.

Also, if there are any failures I want Ansible to stop immediately, so if I screwed something up I don’t take out the networking to every computer in the cluster. For this reason I also set:

max_fail_percentage: 1

If a change is made to an interface I’ve been using the following handler to restart the interface:

- name: Restart Network
  service: name=network state=restarted

That works, but about half the time Ansible detects a failure and drops out with an error, even though the network restarted just fine. Checking the server immediately after Ansible says that there’s an error shows that the server is running and it’s network interfaces were configured correctly.

This behavior is annoying since you have to restart the entire playbook after one server fails. If you’re configuring many racks of servers and the network setup is just updating one server at a time I’d end up having to restart the playbook a half dozen times to get through it, even though nothing was actually wrong.

At first I thought that maybe the ssh connection was dropping (I was restarting the network after all) but you can log in via ssh and restart the network and never lose the connection, so that wasn’t the problem.

The connection does pause as the interface that you’re ssh-ing in over resets, but the connection comes right back.

I wrote a short script to repeatedly restart the network interfaces and check the exit code returned, but the exit code was always 0, “no errors”, so network restart wasn’t reporting an error, but for some reason Ansible thought there was a failure.

There’s obviously some sort of timing issue causing a problem, where Ansible is checking to see if all is well, but since the network is being reset the check times out.

I initially came up with this workaround:

- name: Restart Network
  shell: service network restart; sleep 3

That fixes the problem, however, since “sleep 3” will always exit with a 0 exit code (success), Ansible will always think this worked even when the network restart failed. (Ansible takes the last exit code returned as the success/failure of the entire shell operation.) If “service network restart” actually does fail, I want Ansible to stop processing.

In order to preserve the exit code, I wrote a one-line Perl script that restarts the network, sleeps 3 seconds, then exits with the same exit code returned by “service network restart”.

- name: Restart Network
  # Restart the network, sleep 3 seconds, return the
  # exit code returned by "service network restart".
  # This is to work-around a glitch in Ansible where
  # it detects a successful network restart as a failure.
  command: perl -e 'my $exit_code = system("service network restart"); sleep 3; $exit_code = $exit_code >> 8; exit($exit_code);'

Now Ansible grinds through the network configurations of all of the hosts in my racks without stopping.

Hope you find this useful.

Share Button
post

Peerio promises privacy for everyone

A new company called Peerio is promising secure, easy messaging and file sharing for everyone. They’re building apps that encrypt everything you send or share, making the code for these apps open source, and paying for security audits to peer-review the source code, looking for security weaknesses.

They’ve put together a short video to explain the basics of what they offer. I thought I’d give it a try and see how it works.

I went to Peerio.com using the Chrome browser, so the home page automatically offered to install Peerio on Chrome.

I clicked the install button and Peerio popped up as a new Chrome app.

peerio-on-chrome

Clicking the app brought up the new account screen, with the word “beta” displayed in small type just under the company logo, so they’re letting me know up front that this is going to be a little rough.

peerio-sign-up

I clicked Sign Up, added a user name and email address, and was prompted for a pass phrase.

I have a couple of pass phrases I use. I typed one in, but apparently it wasn’t long enough. I tried another and another. Not long enough. The words “ALMOST THERE. JUST A FEW MORE LETTERS…” appeared on screen. One phrase I typed in had 40+ letters in it, but still the words “ALMOST THERE. JUST A FEW MORE LETTERS…” persisted. Tried again, this time putting spaces between the words. Phrase accepted! Maybe the check is trying to verify the number of space-separated words, not the total number of characters? Anyhow, got past that hurdle.

Next it sends you an email with a confirmation code and gives you 10 minutes (with a second by second countdown) to enter the confirmation code. I guess if you don’t enter it within 10 minutes your account is toast?

Once past that step I was prompted to create a shorter PIN code that can be used to login to the site. The long pass phrase is only needed to log in the first time you use a new device, after that your PIN can be used. I tried entering a few short number sequences. All were rejected as “too weak” so I used a strong, unique password with a mix of upper and lowercase letters, numbers, and special characters. The screen hid what I was typing and only asked for the PIN once, so if I thumb-fingered it, my account was going to be rendered useless pretty quickly. Hopefully I typed what I thought I typed.

peerio-all-set-up

Of course to use the service to send messages to people you have to load your contacts in. I added a friend’s email and Peerio sent him an invite. Tried adding another email address and the “Add Contact” form cut me off at the “.c” in “.com” — looks like the folks at Peerio only let you have friends with email addresses that are less than 16 characters long. My friends at monkeybots.com, you’re out of luck.

peerio-add-contact

The Contacts tab has sub-tabs for “All Contacts”, “Confirmed Contacts”, and “Pending Contacts”, but the one email address I entered that was less than 16 characters long didn’t show up anywhere (I expected to see it under “Pending Contacts”). With my entries disappearing or truncated, I stopped trying to use the system.

It’s an interesting idea for a service, the source code for the clients is supposed to be available on Github, but the Peerio.com site directed me to https://github.com/TeamPeerio for the source, and that link is 404. Searching Github for “Peerio” shows https://github.com/PeerioTechnologies/peerio-client and https://github.com/PeerioTechnologies/peerio-website, so it looks like this is just a case of a BETA web site with a broken link.

Before the developers pay for another security audit, they really ought to try doing some basic usability testing — set up a new user in front of a laptop, and make two videos — one of the keyboard and screen and one of the user’s face, and then watch them try to log in and set up an account. I think they’d find the experience invaluable.

Anyhow, if you’re interested and feel like trying out their very BETA (feels like ALPHA) release, head over to Peerio.com and sign up. If you want to send me a message, you can reach me on Peerio as “earl”.

Share Button