Introduce restart_bundle containers to detect config changes and restart pacemaker resources
During the containerization work we regressed on the restart of
pacemaker resources when a config change for the service was detected.
In baremetal we used to do the following:
1) If a puppet config change was detect we'd touch a file with the
service name under /var/lib/tripleo/pacemaker-restarts/<service>
2) A post deployment bash script (extraconfig/tasks/pacemaker_resource_restart.sh)
would test for the service file's existence and restart the pcs service via
'pcs resource restart --wait=600 service' on the bootstrap node.
With this patchset we make use of paunch's ability do detect if a config
hash change happened to respawn a temporary container (called
<service>_restart_bundle) which will simply always restart the pacemaker
service from the bootstrap node whenever invoked, but only if the pcmk
resource already exists. For this reason we add config_volume and bind
mount it inside the container, so that the TRIPLEO_CONFIG_HASH env
variable gets generated for these *_restart_bundle containers.
We tested this change as follows:
A) Deployed an HA overcloud with this change and observed that pcmk resources
were not restarted needlessly during initial deploy
B) Rerun the exact same overcloud deploy with no changes, observed that
no spurious restarts would take place
C) Added an env file to trigger the of config of haproxy[1], redeployed and observed that it restarted
haproxy only:
Jun 06 16:22:37 overcloud-controller-0 dockerd-current[15272]: haproxy-bundle restart invoked
D) Added a trigger [2] for mysql config change, redeployed and observed restart:
Jun 06 16:40:52 overcloud-controller-0 dockerd-current[15272]: galera-bundle restart invoked
E) Added a trigger [3] for a rabbitmq config change, redeployed and observed restart:
Jun 06 17:03:41 overcloud-controller-0 dockerd-current[15272]: rabbitmq-bundle restart invoked
F) Added a trigger [4] for a redis config change, redeployed and observed restart:
Jun 07 08:42:54 overcloud-controller-0 dockerd-current[15272]: redis-bundle restart invoked
G) Rerun a deploy with no changes and observed that no spurious restarts
were triggered
Reviewed: https:/ /review. openstack. org/572840 /git.openstack. org/cgit/ openstack/ tripleo- heat-templates/ commit/ ?id=a6389da22d4 3fa61d4db16b676 b23c3a4c468dfd
Committed: https:/
Submitter: Zuul
Branch: master
commit a6389da22d43fa6 1d4db16b676b23c 3a4c468dfd
Author: Michele Baldessari <email address hidden>
Date: Tue Jun 5 14:19:24 2018 +0000
Introduce restart_bundle containers to detect config changes and restart pacemaker resources
During the containerization work we regressed on the restart of tripleo/ pacemaker- restarts/ <service> tasks/pacemaker _resource_ restart. sh)
pacemaker resources when a config change for the service was detected.
In baremetal we used to do the following:
1) If a puppet config change was detect we'd touch a file with the
service name under /var/lib/
2) A post deployment bash script (extraconfig/
would test for the service file's existence and restart the pcs service via
'pcs resource restart --wait=600 service' on the bootstrap node.
With this patchset we make use of paunch's ability do detect if a config _restart_ bundle) which will simply always restart the pacemaker
hash change happened to respawn a temporary container (called
<service>
service from the bootstrap node whenever invoked, but only if the pcmk
resource already exists. For this reason we add config_volume and bind
mount it inside the container, so that the TRIPLEO_CONFIG_HASH env
variable gets generated for these *_restart_bundle containers.
We tested this change as follows: controller- 0 dockerd- current[ 15272]: haproxy-bundle restart invoked controller- 0 dockerd- current[ 15272]: galera-bundle restart invoked controller- 0 dockerd- current[ 15272]: rabbitmq-bundle restart invoked controller- 0 dockerd- current[ 15272]: redis-bundle restart invoked
A) Deployed an HA overcloud with this change and observed that pcmk resources
were not restarted needlessly during initial deploy
B) Rerun the exact same overcloud deploy with no changes, observed that
no spurious restarts would take place
C) Added an env file to trigger the of config of haproxy[1], redeployed and observed that it restarted
haproxy only:
Jun 06 16:22:37 overcloud-
D) Added a trigger [2] for mysql config change, redeployed and observed restart:
Jun 06 16:40:52 overcloud-
E) Added a trigger [3] for a rabbitmq config change, redeployed and observed restart:
Jun 06 17:03:41 overcloud-
F) Added a trigger [4] for a redis config change, redeployed and observed restart:
Jun 07 08:42:54 overcloud-
G) Rerun a deploy with no changes and observed that no spurious restarts
were triggered
[1] haproxy config change trigger: defaults:
tripleo: :haproxy: :haproxy_ globals_ override:
'maxconn' : 1111
parameter_
ExtraConfig:
[2] mysql config change trigger: defaults:
mysql_ max_connections : 1111
parameter_
ExtraConfig:
[3] rabbitmq config change trigger (default partition handling is 'ignore'): defaults:
rabbitmq_ config_ variables:
cluster_ partition_ handling: 'pause_minority'
queue_ master_ locator: '<<"min-masters">>'
loopback_ users: '[]'
parameter_
ExtraConfig:
[4] redis config change trigger: defaults:
redis: :tcp_backlog: 666
redis: :params: :tcp_backlog: 666
parameter_
ExtraConfig:
Change-Id: I62870c05509756 9ceab2ff67cf0fe 63122277c5b
Co-Authored-By: Damien Ciabrini <email address hidden>
Closes-Bug: #1775196