podman / docker-puppet.py : concurrency is unstable

Bug #1811383 reported by Emilien Macchi on 2019-01-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Medium
Emilien Macchi

Bug Description

The whole context is described here: https://github.com/containers/libpod/issues/1844

When running Podman and docker-puppet with multi-process, we randomly hit a bug where the container fails to execute the bind-mounted entrypoint.
The bug is extremely hard to reproduce and therefore to debug, however it has been proven that disabling the concurrency helped to not hit the bug.

tags: added: tech-debt
Changed in tripleo:
status: Triaged → In Progress
Bogdan Dobrelya (bogdando) wrote :

just to note that the surface area might be more than docker-puppet tooling, but paunch and pacemaker bundles that running podman CLI with multi-process as well

Changed in tripleo:
milestone: stein-2 → stein-3

Reviewed: https://review.openstack.org/614639
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=fda5b5ab3cc4f2eb92330e05751730c6c6a11825
Submitter: Zuul
Branch: master

commit fda5b5ab3cc4f2eb92330e05751730c6c6a11825
Author: Emilien Macchi <email address hidden>
Date: Wed Oct 31 17:16:11 2018 -0400

    docker-puppet: retry container run command

    Context: https://github.com/containers/libpod/issues/1844
    We have concurrency issue when podman is enabled, where
    the bind-mounted entrypoint can't be found.

    This patch will retry the podman run commands 3 times before declaring
    a failure.
    Also, everytime it fails we'll log the number of attempts to configure
    the container. So we can track these numbers in CI.

    I'll allow us to keep doing concurrent calls, but with less chance
    to fail with the issue #1844.

    Note: we hate this patch and we hope to revert it soon. But now it's how
    we'll reduce issues in CI.

    Change-Id: I6af89bf54e562e7c6bbcdb82041a7274789dcf28
    Related-Bug: #1811383

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/629546
Reason: We finally went with the retry option.

Reviewed: https://review.openstack.org/631215
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=704b6870ba339345dc1de5d698df38f5467e4dd4
Submitter: Zuul
Branch: master

commit 704b6870ba339345dc1de5d698df38f5467e4dd4
Author: Cédric Jeanneret <email address hidden>
Date: Wed Jan 16 14:03:16 2019 +0100

    Reuse the container in case we have a temporary podman failure

    The "retry" patch[1] didn't take care of the existing container. This patch
    intends to allow to reuse the container in case it has failed, in order to
    avoid an error when the container is already existing.

    [1] https://review.openstack.org/#/c/614639/

    Change-Id: I5c7258c8687582f56b59ed410c0cc8f6ba4c2d4f
    Context: https://github.com/containers/libpod/issues/1844
    Related-Bug: #1811383

Changed in tripleo:
status: In Progress → Fix Released
status: Fix Released → Triaged
milestone: stein-3 → train-1
status: Triaged → In Progress
importance: High → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.