post

Fixing Docker pull timeout errors in Jenkins

Jenkins has a Docker plugin that allows you to authenticate with a docker image registry, pull the container image that you want, and then run tests inside the container. The plugin works well, and you’re guaranteed a “known starting state” for your test since you’re running in a freshly-instantiated container.

The issue that I ran into is that my docker registry is in a different data center than my test environment, there are tests running 24×7, and occasionally there are temporary outages on the Internet in-between the two data centers. When that happens the plugin’s docker.image().pull() command will fail. Even if the image is stored locally it will fail because it attempts to verify the locally-stored image’s signature against the registry image’s signature.

Since the Jenkinsfile is written in Groovy my solution was to add a Groovy function to the top of the Jenkinsfile that wrapped the docker.image().pull() command inside of a retry loop:

// Since docker.image().pull() will fail during
// intermittent or temporary network outages wrap
// it in a retry loop.
def public static docker_image_pull(Object docker, String imageName) {
    def int attempt = 0
    def int max_attempts = 10
    def boolean success = false
    while ((! success) && (attempt < 10)) {
        try {
            docker.image(imageName).pull()
            success = true
        } catch (Exception err) {
            attempt += 1
            def sleep_sec = attempt * 30
            println("Attempt #${attempt}: Failed to pull ${imageName}. Sleeping ${sleep_sec}s")
            sleep(sleep_sec)
        }
    }
    if (! success) {
        throw new Exception("Failed to pull ${imageName}")
    }
}

The function will attempt to pull the image up to 10 times. It has a back-off delay so it first sleeps 30s and retries, then 60s, then 90s, etc. For temporary / intermittent network outages this eventually succeeds. For longer network outages, or of the problem is something else such as “image doesn’t exist” the function will eventually time out and exit with an error.

After that I just had to replace every instance of docker.image('myImageName:latest').pull() inside the Jenkinsfile with:

docker_image_pull(docker, 'myImageName:latest')

Hope you find this useful.