Injecting certificate with "podman cp" can break cluster monitoring and operation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Medium
|
Unassigned |
Bug Description
When internal or public certificates are re-generated, we notify the
impacted services by triggering a "config reload" when possible, to
avoid restarting them entirely and incurring a temporary service
disruption.
Since the certificates are re-generated on the host, they must be
injected in running containers before the config reload takes place.
We inject the file into the running container with "podman cp".
"podman cp" internally uses "podman pause", which freezes the
container execution. While the container is frozen, any attempt at
running "podman exec" or "podman stop" concurrently will fail with
a generic error:
Feb 24 02:05:58 controller-0 podman(
Feb 24 02:05:58 controller-0 podman(
During this time window, it may happen that pacemaker cannot monitor the
state of a container and decide to stop it. Worse, it the "podman
stop" cannot be run, pacemaker will consider it a stop failure and
will end up fencing the node hosting the running container.
Changed in tripleo: | |
milestone: | wallaby-rc1 → xena-1 |
Changed in tripleo: | |
milestone: | xena-1 → none |
Fix on master: https:/ /review. opendev. org/c/openstack /tripleo- heat-templates/ +/782539