paunch: podman exec shouldn't be run if the container isn't active

Bug #1839559 reported by Emilien Macchi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Emilien Macchi

Bug Description

Originally reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1739224

Description of problem:

Running stack update(noop or for scaling out) fails while running podman exec --user=root keystone /usr/bin/bootstrap_host_exec keystone keystone-manage bootstrap --bootstrap-password with stderr: cannot exec into container that is not running: container state improper:

        "Running container: keystone_bootstrap",
        "$ podman ps -a --filter label=container_name=keystone --filter label=config_id=tripleo_step3 --format {{.Names}}",
        "b'keystone\\n'",
        "$ podman exec --user=root keystone /usr/bin/bootstrap_host_exec keystone keystone-manage bootstrap --bootstrap-password hSWHWHHeD2HgCVqprTITYMZCB",
        "b'cannot exec into container that is not running: container state improper\\n'",
        "Error running ['podman', 'exec', '--user=root', 'keystone', '/usr/bin/bootstrap_host_exec', 'keystone', 'keystone-manage', 'bootstrap', '--bootstrap-password', 'hSWHWHHeD2HgCVqprTITYMZCB']. [126]",
        "stderr: cannot exec into container that is not running: container state improper",
        "Running container: nova_db_sync",
        "Skipping existing container: nova_db_sync",
        "Running container: keystone_cron",
        "Skipping existing container: keystone_cron"

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-10.6.1-0.20190806190500.bdcffcd.el8ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud
2. Re-run overcloud deploy

Actual results:
Fails while running podman exec --user=root keystone /usr/bin/bootstrap_host_exec keystone keystone-manage bootstrap --bootstrap-password

Expected results:
No failure.

As you can see, we only use "podman ps -a" to find the container name and then run the "podman exec". We should make sure the container is actually running first.

Changed in tripleo:
milestone: none → train-3
assignee: nobody → Emilien Macchi (emilienm)
importance: Undecided → High
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (master)

Fix proposed to branch: master
Review: https://review.opendev.org/675494

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/675637

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/675692

Changed in tripleo:
assignee: Emilien Macchi (emilienm) → Luke Short (ekultails)
Changed in tripleo:
assignee: Luke Short (ekultails) → Bogdan Dobrelya (bogdando)
Changed in tripleo:
assignee: Bogdan Dobrelya (bogdando) → Luke Short (ekultails)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on paunch (stable/stein)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/675692

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on paunch (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/675637

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/675494

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to paunch (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/677756

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/677757

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/677758

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (master)

Fix proposed to branch: master
Review: https://review.opendev.org/677860

Changed in tripleo:
assignee: Luke Short (ekultails) → Emilien Macchi (emilienm)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on paunch (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/677757
Reason: yeah it's wrong, I thought it might break something but I found the reason of our bugs and this patch isn't part of it so far.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/678071

Changed in tripleo:
assignee: Emilien Macchi (emilienm) → Cédric Jeanneret (cjeanner)
Changed in tripleo:
assignee: Cédric Jeanneret (cjeanner) → Emilien Macchi (emilienm)
Changed in tripleo:
assignee: Emilien Macchi (emilienm) → Cédric Jeanneret (cjeanner)
Changed in tripleo:
assignee: Cédric Jeanneret (cjeanner) → Emilien Macchi (emilienm)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (master)

Reviewed: https://review.opendev.org/677860
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=983ab98f61f530eec7cac31e3a0556bf6722e2a0
Submitter: Zuul
Branch: master

commit 983ab98f61f530eec7cac31e3a0556bf6722e2a0
Author: Emilien Macchi <email address hidden>
Date: Wed Aug 21 22:06:34 2019 -0400

    Check if container is running before doing an exec

    container_running is a new method which will allow to return True if a
    container is detected as running or False if not running.
    There is a retry mechanism which if "podman ps" is used, will add "--sync" to
    the command so we synchronize the state of OCI runtime. Before doing a
    "podman ps", we try to check if the service is running in systemd.
    There is a very short sleep between the retries to give a chance to
    podman to find the container if it takes a bit of time to start or seen
    as started.

    It will be used by the builder when a container is configured to
    run "podman exec"; we'll first verify that the container exist otherwise
    return an error and stop the deployment.

    This patch is mainly a workaround against a race condition where in
    heavy-loaded environments, an exec can be run too early in a step where
    the container is still starting.

    It also consolidate the discover_container_name method in order to get
    more chance to actually get a name.

    Co-Authored-By: Cédric Jeanneret <email address hidden>
    Closes-Bug: #1839559
    Change-Id: If4d8c268218bf83abed877a699fc583fb55ccbed

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (stable/stein)

Reviewed: https://review.opendev.org/678071
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=17a7432947e2c2f3e95a3b2b878021d437a3cfcf
Submitter: Zuul
Branch: stable/stein

commit 17a7432947e2c2f3e95a3b2b878021d437a3cfcf
Author: Emilien Macchi <email address hidden>
Date: Wed Aug 21 22:06:34 2019 -0400

    Check if container is running before doing an exec

    container_running is a new method which will allow to return True if a
    container is detected as running or False if not running.
    There is a retry mechanism which if "podman ps" is used, will add "--sync" to
    the command so we synchronize the state of OCI runtime. Before doing a
    "podman ps", we try to check if the service is running in systemd.
    There is a very short sleep between the retries to give a chance to
    podman to find the container if it takes a bit of time to start or seen
    as started.

    It will be used by the builder when a container is configured to
    run "podman exec"; we'll first verify that the container exist otherwise
    return an error and stop the deployment.

    This patch is mainly a workaround against a race condition where in
    heavy-loaded environments, an exec can be run too early in a step where
    the container is still starting.

    It also consolidate the discover_container_name method in order to get
    more chance to actually get a name.

    Co-Authored-By: Cédric Jeanneret <email address hidden>
    Closes-Bug: #1839559
    Change-Id: If4d8c268218bf83abed877a699fc583fb55ccbed
    (cherry picked from commit 983ab98f61f530eec7cac31e3a0556bf6722e2a0)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on paunch (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/677758

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/677756

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch 4.5.1

This issue was fixed in the openstack/paunch 4.5.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch 5.2.0

This issue was fixed in the openstack/paunch 5.2.0 release.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

this is probably also needed for docker exec in Queens

tags: added: queens-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/702456

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (stable/queens)

Reviewed: https://review.opendev.org/702456
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=d66ba71100d81b88f9700b3e053a4014848e5d49
Submitter: Zuul
Branch: stable/queens

commit d66ba71100d81b88f9700b3e053a4014848e5d49
Author: Emilien Macchi <email address hidden>
Date: Wed Aug 21 22:06:34 2019 -0400

    Check if container is running before doing an exec

    (a partial backport limited only to discover_container_name)

    It only consolidates the discover_container_name method in order to get
    more chance to actually get a name.

    Co-Authored-By: Cédric Jeanneret <email address hidden>
    Closes-Bug: #1839559
    Change-Id: If4d8c268218bf83abed877a699fc583fb55ccbed
    (cherry picked from commit 983ab98f61f530eec7cac31e3a0556bf6722e2a0)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch queens-eol

This issue was fixed in the openstack/paunch queens-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.