HA: reorder init_bundle and restart_bundle for improved updates
A pacemaker bundle can be restarted either because:
. a tripleo config has been updated (from /var/lib/config-data)
. the bundle config has been updated (container image, bundle parameter,...)
In HA services, special container "*_restart_bundle" is in charge
of restarting the HA service on tripleo config change. Special
container "*_init_bundle" handles restart on bundle config change.
When both types of change occur at the same time, the bundle must
be restarted first, so that the container has a chance to be
recreated with all bind-mounts updated before it tries to reload
the updated config.
Implement the improvement with two changes:
1. Make the "*_restart_bundle" start after the "*_init_bundle", and
make sure "*_restart_bundle" is only enabled after the initial
deployment.
2. During minor update, make sure that the "*_restart_bundle" not
only restarts the container, but also waits until the service
is operational (e.g. galera fully promoted to Master). This forces
the rolling restart to happen sequentially, and avoid service
disruption in quorum-based clustered services like galera and
rabbitmq.
Tested the following update use cases:
* minor update: ensure that *_restart_bundle restarts all types of
resources (OCF, bundles, A/P, A/P Master/Slave).
* minor update: ensure *_restart_bundle is not executed when no
config or image update happened for a service.
* restart_bundle: when resource (OCF or container) fails to
restart, bail out early instead of waiting for nothing until
timeout is reached.
* restart_bundle: make sure a resource is restarted even when it
is in failed stated when *_restart_bundle is called.
* restart_bundle: A/P can be restarted on any node, so watch
restart globally. When the resource restarts as Slave, continue
watching for a Master elsewhere in the cluster.
* restart_bundle: if an A/P is not running locally, make sure it
doesn't get restarted anywhere else in the cluster.
* restart_bundle: do not try to restart stopped (disabled) or
unmanaged resource. Bail out early instead, to not wait until
timeout is reached.
* stack update: make sure that running a stack update with no
change does not trigger any *_restart_bundle, and does not
restart any HA container either.
* stack update: when bundle and config will change, ensure bundle
is updated before HA containers are restarted (e.g. HAProxy
migration to TLS everywhere)
Change-Id: Ic41d4597e9033f9d7847bb6c10c25f443fbd5b0e
Closes-Bug: #1839858
(cherry picked from commit 3230f005c1d51863a2c2484fe4c05471f5dc25dc)
Reviewed: https:/ /review. opendev. org/707907 /git.openstack. org/cgit/ openstack/ tripleo- heat-templates/ commit/ ?id=2bd4cdeb2f7 887208b863e9c19 ab136e3fbf4958
Committed: https:/
Submitter: Zuul
Branch: stable/train
commit 2bd4cdeb2f78872 08b863e9c19ab13 6e3fbf4958
Author: Damien Ciabrini <email address hidden>
Date: Fri Nov 15 17:41:42 2019 +0100
HA: reorder init_bundle and restart_bundle for improved updates
A pacemaker bundle can be restarted either because: config- data)
parameter, ...)
. a tripleo config has been updated (from /var/lib/
. the bundle config has been updated (container image, bundle
In HA services, special container "*_restart_bundle" is in charge
of restarting the HA service on tripleo config change. Special
container "*_init_bundle" handles restart on bundle config change.
When both types of change occur at the same time, the bundle must
be restarted first, so that the container has a chance to be
recreated with all bind-mounts updated before it tries to reload
the updated config.
Implement the improvement with two changes:
1. Make the "*_restart_bundle" start after the "*_init_bundle", and
make sure "*_restart_bundle" is only enabled after the initial
deployment.
2. During minor update, make sure that the "*_restart_bundle" not
only restarts the container, but also waits until the service
is operational (e.g. galera fully promoted to Master). This forces
the rolling restart to happen sequentially, and avoid service
disruption in quorum-based clustered services like galera and
rabbitmq.
Tested the following update use cases:
* minor update: ensure that *_restart_bundle restarts all types of
resources (OCF, bundles, A/P, A/P Master/Slave).
* minor update: ensure *_restart_bundle is not executed when no
config or image update happened for a service.
* restart_bundle: when resource (OCF or container) fails to
restart, bail out early instead of waiting for nothing until
timeout is reached.
* restart_bundle: make sure a resource is restarted even when it
is in failed stated when *_restart_bundle is called.
* restart_bundle: A/P can be restarted on any node, so watch
restart globally. When the resource restarts as Slave, continue
watching for a Master elsewhere in the cluster.
* restart_bundle: if an A/P is not running locally, make sure it
doesn't get restarted anywhere else in the cluster.
* restart_bundle: do not try to restart stopped (disabled) or
unmanaged resource. Bail out early instead, to not wait until
timeout is reached.
* stack update: make sure that running a stack update with no
change does not trigger any *_restart_bundle, and does not
restart any HA container either.
* stack update: when bundle and config will change, ensure bundle
is updated before HA containers are restarted (e.g. HAProxy
migration to TLS everywhere)
Change-Id: Ic41d4597e9033f 9d7847bb6c10c25 f443fbd5b0e 3a2c2484fe4c054 71f5dc25dc)
Closes-Bug: #1839858
(cherry picked from commit 3230f005c1d5186