A galera prim node can't be started by OCF RA because there is another node running thinking the prim is OK
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Confirmed
|
Medium
|
Fuel Sustaining |
Bug Description
If there is no a prim Galera node ready, other resource instances waiting for a new prim shall fail if have managed to start by a chance. Otherwise, the prim fails to start by Galera OCF RA design. A reelection checker reports false when another resource instances found running in the same pacemaker partition that has a quorum.
In this bug, a node waiting for a prim *was* kept started w/o a prim node ready, and the OCF RA have been reporting - mistakenly - that the prim is OK.
Steps to reproduce were given in the Galera reliability testing https:/
It may be reproduced, although it is a rare corner case hard to catch, on a Fuel env as well, given that the node-1, node-2, node-3 deployed as controller nodes:
1) https:/
2) PURGE=true ./vagrant_
3) docker exec -it jepsen bash -c "TESTPROC=mysqld lein test :only jepsen.
tags: | added: area-library |
Changed in fuel: | |
status: | New → Confirmed |
I hope that is a rare corner case, when multiple network partitions apply in a raw. Hence, a medium bug