'container did not start before the specified timeout' ERRORS during docker-puppet config generation

Bug #1713188 reported by Dan Prince on 2017-08-26
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
High
Dan Prince

Bug Description

I've been getting errors like this when pulling images from a local Undercloud registry (heavily loaded during provisioning):

            "2017-08-25 17:48:01,380 INFO: 22634 -- Pulling image: 172.19.0.3:8787/tripleo/openstack-mariadb-docker:latest",
            "2017-08-25 17:52:41,003 ERROR: 22634 -- Failed running docker-puppet.py for mysql",
            "2017-08-25 17:52:41,003 ERROR: 22634 -- /usr/bin/docker-current: Error response from daemon:
containerd: container did not start before the specified timeout.",

----

I tried setting the PROCESS_COUNT back to 3 (which matches docker daemon's pull default as well) and all issues went away.

Dan Prince (dan-prince) on 2017-08-26
Changed in tripleo:
assignee: nobody → Dan Prince (dan-prince)
importance: Undecided → High
status: New → In Progress
milestone: none → pike-rc2

Reviewed: https://review.openstack.org/498139
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=949d367ddeb42eff913cdbed733ccf6239b4864b
Submitter: Jenkins
Branch: master

commit 949d367ddeb42eff913cdbed733ccf6239b4864b
Author: Dan Prince <email address hidden>
Date: Fri Aug 25 23:01:24 2017 -0400

    Add DockerPuppetProcessCount defaults to 3

    docker-puppet.py is very aggressive about running concurrently.
    It uses python multiprocessing to run multiple config generating
    containers at once. This seems to work well in general, but
    in some cases... perhaps when the registry is slow or under
    heavy load can cause timeouts to occur. Lately I'm seeing
    several 'container did not start before the specified timeout'
    errors that always seem to occur when config files are generated
    (docker-puppet.py is initially executed.

    A couple of things:

     -when config files are generated this is the first time
      most of the containers are pulled to each host machine
      during deployment

     -docker-puppet.py runs many of these processes at once. Some
      of them run faster, other not.

     -docker daemon's pull limit defaults to 3. This would throttle
      the above a bit perhaps contributing the the likelyhood of a timeout.

    One solution that seems to work for me is to set the PROCESS_COUNT
    in docker-puppet.py to 3. As this matches docker daemon's default
    it is probably safer at the cost of being slightly slower in some
    cases.

    Change-Id: I17feb3abd9d36fe7c95865a064502ce9902a074e
    Closes-bug: #1713188

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/498867
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=3ebb05d9877e1961e5df53e05eae4f2b7a96a836
Submitter: Jenkins
Branch: stable/pike

commit 3ebb05d9877e1961e5df53e05eae4f2b7a96a836
Author: Dan Prince <email address hidden>
Date: Fri Aug 25 23:01:24 2017 -0400

    Add DockerPuppetProcessCount defaults to 3

    docker-puppet.py is very aggressive about running concurrently.
    It uses python multiprocessing to run multiple config generating
    containers at once. This seems to work well in general, but
    in some cases... perhaps when the registry is slow or under
    heavy load can cause timeouts to occur. Lately I'm seeing
    several 'container did not start before the specified timeout'
    errors that always seem to occur when config files are generated
    (docker-puppet.py is initially executed.

    A couple of things:

     -when config files are generated this is the first time
      most of the containers are pulled to each host machine
      during deployment

     -docker-puppet.py runs many of these processes at once. Some
      of them run faster, other not.

     -docker daemon's pull limit defaults to 3. This would throttle
      the above a bit perhaps contributing the the likelyhood of a timeout.

    One solution that seems to work for me is to set the PROCESS_COUNT
    in docker-puppet.py to 3. As this matches docker daemon's default
    it is probably safer at the cost of being slightly slower in some
    cases.

    Change-Id: I17feb3abd9d36fe7c95865a064502ce9902a074e
    Closes-bug: #1713188
    (cherry picked from commit 949d367ddeb42eff913cdbed733ccf6239b4864b)

tags: added: in-stable-pike

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0rc2 release candidate.

This issue was fixed in the openstack/tripleo-heat-templates 8.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers