Fix up a race when deploying pacemaker_remote nodes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
Michele Baldessari |
Bug Description
We currently create remote resources without waiting for their creation.
This leads to the following potential race (spotted by Marian Mkrcmari):
- On Step1 pacemaker bootstrap node creates the resource but the remote
resource is not yet created
- Step1 completes and Step2 starts
- On Step2 the remote node sets a property (or calls pcs cib) but the
remote is not yet set up so 'pcs cluster cib' will fail there with:
(err): Could not evaluate: backup_cib: Running: /usr/sbin/pcs cluster
cib /var/lib/
with code: 1 ->
I am not entirely sure why we started seeing this only now. The suspicion is that it
is for the same reason for which we started to see https:/
only lately. Likely some puppet dependencies changed the ordering of execution
and broke some assumptions.
Changed in tripleo: | |
status: | Triaged → In Progress |
Reviewed: https:/ /review. openstack. org/463103 /git.openstack. org/cgit/ openstack/ puppet- tripleo/ commit/ ?id=b6d02fd5001 153b53b3061d63d 2cb686b0646f18
Committed: https:/
Submitter: Jenkins
Branch: master
commit b6d02fd5001153b 53b3061d63d2cb6 86b0646f18
Author: Michele Baldessari <email address hidden>
Date: Sat May 6 17:40:24 2017 +0200
Use verify_on_create when creating pacemaker remote resources
We currently create remote resources without waiting for their creation.
This leads to the following potential race (spotted by Marian Mkrcmari):
- On Step1 pacemaker bootstrap node creates the resource but the remote
resource is not yet created
- Step1 completes and Step2 starts
- On Step2 the remote node sets a property (or calls pcs cib) but the
remote is not yet set up so 'pcs cluster cib' will fail there with:
(err): Could not evaluate: backup_cib: Running: /usr/sbin/pcs cluster pacemaker/ cib/puppet- cib-backup20170 506-15994- 1swnk1i failed
cib /var/lib/
with code: 1 ->
Note that when verify_on_create is set to true we are not using the cib
dump/push mechanism. That is fine because we create the remotes on
step1 and the dump/push mechanism is only needed starting from step2
when multiple nodes set cluster properties at the same time.
Tested by Marian Mkrcmari successfully as well.
Closes-Bug: #1689028
Change-Id: I764526b3f3c065 91d477cc92779d8 3a19802368e 5ab0522bba91df7 29b37f34e0f
Depends-On: I1db31dcc92b869