galera resource replicas don't start properly during scale up
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Medium
|
Damien Ciabrini |
Bug Description
When scaling up the overcloud, each time a node is added, two pacemaker
resources are reconfigured to update the number of galera replicas and
an internal galera<->pacemaker name mapping.
Currently, the number of replicas is updated before the name mapping,
so there's a time window where pacemaker is allowed to start new galera
replicas but the resource agent won't succeed to do so due to the missing
name mapping. This result is error in the cluster, that shouldn't
be there even if the cluster eventually recovers:
Aug 20 18:25:58 controller-0 pacemaker-
Aug 20 18:25:58 controller-0 pacemaker-
Aug 20 18:25:58 controller-0 pacemaker-
Aug 20 18:25:58 controller-0 pacemaker-
summary: |
- galera resource don't start properly during scale up + galera resource replicas don't start properly during scale up |
Reviewed: https:/ /review. opendev. org/747150 /git.openstack. org/cgit/ openstack/ puppet- tripleo/ commit/ ?id=16a6ba465d4 20b23da77bab2f6 4286037d1ced37
Committed: https:/
Submitter: Zuul
Branch: master
commit 16a6ba465d420b2 3da77bab2f64286 037d1ced37
Author: Damien Ciabrini <email address hidden>
Date: Thu Aug 20 14:06:13 2020 +0200
HA: ensure scaling up galera does not cause promotion errors
During scale up, two galera resources are being updated in the
pacemaker cluster. Force a specific ordering in puppet to make
sure the galera resource agent always picks up the up-to-date
config when it starts new replicas.
Closes-Bug: #1892530
Change-Id: Id40ac8c10fd034 8ce4fd99ce319da b933312acfa