Injecting certificate with "podman cp" can break cluster monitoring and operation

Bug #1917868 reported by Damien Ciabrini
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Unassigned

Bug Description

When internal or public certificates are re-generated, we notify the
impacted services by triggering a "config reload" when possible, to
avoid restarting them entirely and incurring a temporary service
disruption.

Since the certificates are re-generated on the host, they must be
injected in running containers before the config reload takes place.
We inject the file into the running container with "podman cp".

"podman cp" internally uses "podman pause", which freezes the
container execution. While the container is frozen, any attempt at
running "podman exec" or "podman stop" concurrently will fail with
a generic error:

Feb 24 02:05:58 controller-0 podman(haproxy-bundle-podman-0)[352861]: ERROR: Error: can only stop created or running containers. 75ee35ce1b0912645d42d46967a372cf0a3edf49a5d9d5d121ce60fe3acc6144 is in state paused: container state improper
Feb 24 02:05:58 controller-0 podman(haproxy-bundle-podman-0)[352861]: ERROR: Failed to stop container, haproxy-bundle-podman-0, based on image, cluster.common.tag/rhosp16-openstack-haproxy:pcmklatest.

During this time window, it may happen that pacemaker cannot monitor the
state of a container and decide to stop it. Worse, it the "podman
stop" cannot be run, pacemaker will consider it a stop failure and
will end up fencing the node hosting the running container.

Revision history for this message
Luigi Toscano (ltoscano) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/783942
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/63001263ad011d5a1dcca42fc7c79599fe6c78c8
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 63001263ad011d5a1dcca42fc7c79599fe6c78c8
Author: Damien Ciabrini <email address hidden>
Date: Mon Mar 22 18:04:08 2021 +0100

    HA: inject public certificates without blocking container

    Do not inject public certificates in pacemaker bundles by means
    of "podman cp", as this pauses the container for a short amount
    of time and can make pacemaker operation fail during that time
    window and impact cluster for no reason.

    Keep "podman cp" for non-HA containers, as the freeze is short
    and doesn't seem to impact podman monitoring anyway.

    The new certificate injection only works for podman 1.9+, lower
    version won't overwrite the existing certificate.

    (cherry-picked from 93e53b74293cb4478ea415255fee96e7fddda004)
    (squashed with Ic6e4264c5ad46bd2589cc907c365af2d42fde63d)
    (removed a part that should stay in puppet-tripleo before wallaby)

    Closes-Bug: #1917868

    Change-Id: Id7308f028f33716be5e3df6699c3f2c12e33e344

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/puppet-tripleo/+/783901
Committed: https://opendev.org/openstack/puppet-tripleo/commit/f6c88d0146a0ac35944433b11673f0bc036051ff
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit f6c88d0146a0ac35944433b11673f0bc036051ff
Author: Damien Ciabrini <email address hidden>
Date: Fri Mar 5 13:26:45 2021 +0100

    HA: inject public certificates without blocking container

    Do not inject public certificates in pacemaker bundles by means
    of "podman cp", as this pauses the container for a short amount
    of time and can make pacemaker operation fail during that time
    window and impact cluster for no reason.

    Keep "podman cp" for non-HA containers, as the freeze is short
    and doesn't seem to impact podman monitoring anyway.

    The new certificate injection only works for podman 1.9+, lower
    version won't overwrite the existing certificate.

    Adapted from Id7308f028f33716be5e3df6699c3f2c12e33e344, as the
    same behaviour is implemented in puppet-tripleo before wallaby.

    Change-Id: I14be16052677bf3426a88ec4b5299f9502007472
    Related-Bug: #1917868

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 14.1.0

This issue was fixed in the openstack/tripleo-heat-templates 14.1.0 release.

Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Revision history for this message
Luigi Toscano (ltoscano) wrote :

See the previous comment: this is released as part of openstack/tripleo-heat-templates 14.1.0. Retargeting and fixing the status.

Changed in tripleo:
status: Confirmed → Fix Released
Changed in tripleo:
milestone: xena-1 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/puppet-tripleo/+/789875

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ussuri)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/train)

Reviewed: https://review.opendev.org/c/openstack/puppet-tripleo/+/783913
Committed: https://opendev.org/openstack/puppet-tripleo/commit/49728d24087ef5b3a07eba288bccbbc16f6717af
Submitter: "Zuul (22348)"
Branch: stable/train

commit 49728d24087ef5b3a07eba288bccbbc16f6717af
Author: Damien Ciabrini <email address hidden>
Date: Fri Mar 5 13:26:45 2021 +0100

    HA: inject public certificates without blocking container

    Do not inject public certificates in pacemaker bundles by means
    of "podman cp", as this pauses the container for a short amount
    of time and can make pacemaker operation fail during that time
    window and impact cluster for no reason.

    Keep "podman cp" for non-HA containers, as the freeze is short
    and doesn't seem to impact podman monitoring anyway.

    The new certificate injection only works for podman 1.9+, lower
    version won't overwrite the existing certificate.

    Adapted from Id7308f028f33716be5e3df6699c3f2c12e33e344, as the
    same behaviour is implemented in puppet-tripleo before wallaby.

    Change-Id: I14be16052677bf3426a88ec4b5299f9502007472
    Related-Bug: #1917868

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/783949
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/ab5d866cbc8bd61e04010611b028f9d20292bbe5
Submitter: "Zuul (22348)"
Branch: stable/train

commit ab5d866cbc8bd61e04010611b028f9d20292bbe5
Author: Damien Ciabrini <email address hidden>
Date: Mon Mar 22 18:04:08 2021 +0100

    HA: inject public certificates without blocking container

    Do not inject public certificates in pacemaker bundles by means
    of "podman cp", as this pauses the container for a short amount
    of time and can make pacemaker operation fail during that time
    window and impact cluster for no reason.

    Keep "podman cp" for non-HA containers, as the freeze is short
    and doesn't seem to impact podman monitoring anyway.

    The new certificate injection only works for podman 1.9+, lower
    version won't overwrite the existing certificate.

    (cherry-picked from 93e53b74293cb4478ea415255fee96e7fddda004)
    (squashed with Ic6e4264c5ad46bd2589cc907c365af2d42fde63d)
    (removed a part that should stay in puppet-tripleo before wallaby)

    Closes-Bug: #1917868

    Change-Id: Id7308f028f33716be5e3df6699c3f2c12e33e344

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/puppet-tripleo/+/789875
Committed: https://opendev.org/openstack/puppet-tripleo/commit/e09d2a192c888aa84f288a4742e81bab74067dff
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit e09d2a192c888aa84f288a4742e81bab74067dff
Author: Damien Ciabrini <email address hidden>
Date: Fri Mar 5 13:26:45 2021 +0100

    HA: inject public certificates without blocking container

    Do not inject public certificates in pacemaker bundles by means
    of "podman cp", as this pauses the container for a short amount
    of time and can make pacemaker operation fail during that time
    window and impact cluster for no reason.

    Keep "podman cp" for non-HA containers, as the freeze is short
    and doesn't seem to impact podman monitoring anyway.

    The new certificate injection only works for podman 1.9+, lower
    version won't overwrite the existing certificate.

    Adapted from Id7308f028f33716be5e3df6699c3f2c12e33e344, as the
    same behaviour is implemented in puppet-tripleo before wallaby.

    Change-Id: I14be16052677bf3426a88ec4b5299f9502007472
    Related-Bug: #1917868
    (cherry picked from commit f6c88d0146a0ac35944433b11673f0bc036051ff)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 13.3.0

This issue was fixed in the openstack/tripleo-heat-templates 13.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.6.0

This issue was fixed in the openstack/tripleo-heat-templates 11.6.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/790174
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/f43f11f7dcbaa077b1b1934473c9143f0e89f90d
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit f43f11f7dcbaa077b1b1934473c9143f0e89f90d
Author: Damien Ciabrini <email address hidden>
Date: Mon Mar 22 18:04:08 2021 +0100

    HA: inject public certificates without blocking container

    Do not inject public certificates in pacemaker bundles by means
    of "podman cp", as this pauses the container for a short amount
    of time and can make pacemaker operation fail during that time
    window and impact cluster for no reason.

    Keep "podman cp" for non-HA containers, as the freeze is short
    and doesn't seem to impact podman monitoring anyway.

    The new certificate injection only works for podman 1.9+, lower
    version won't overwrite the existing certificate.

    (cherry-picked from 93e53b74293cb4478ea415255fee96e7fddda004)
    (squashed with Ic6e4264c5ad46bd2589cc907c365af2d42fde63d)
    (removed a part that should stay in puppet-tripleo before wallaby)

    Closes-Bug: #1917868

    Change-Id: Id7308f028f33716be5e3df6699c3f2c12e33e344
    (cherry picked from commit 63001263ad011d5a1dcca42fc7c79599fe6c78c8)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 12.4.5

This issue was fixed in the openstack/tripleo-heat-templates 12.4.5 release.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

please disregard, upstream revert would flow to 16.2, not to 16.1. So we cannot revert at this time and place.

Revision history for this message
Luigi Toscano (ltoscano) wrote :

As far as RDO/train/centos8* is concerned (where IIRC podman 3.x is available) everything should be fine and awesome as it is right now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.