After a network split, a node can make a write progress and end-up with a diverged local seqno
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MySQL patches by Codership |
Invalid
|
Undecided
|
Unassigned | ||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
Invalid
|
Undecided
|
Unassigned | ||
percona-xtradb-cluster-5.6 (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
How to reproduce:
Given:
* A cluster of 5 Galera nodes (percona packages)
* running in the multi-master writes/reads layout,
* auto-recovery decisions made by the custom Pacemaker OCF RA, it can only monitor/stop/start the mysqld_safe, no more interference, it removes nothing in the data path.
With that, do multiple runs of custom jepsen tests with Nemesis in the random network-split partitions mode (all links and details about test cases I described here https:/
Results:
Expected: nodes will always recover after partitions with a merged state, isolated nodes can't make write progress w/o quorum.
Actual: a node (the n5 here) have done write progress, diverged, and refuses to start with the errors like:
[ERROR] WSREP: Local state seqno (189675) is greater than group seq no (188050): states diverged.
Logs, package versions, configs and wsrep status/vars are attached.
description: | updated |
Changed in codership-mysql: | |
status: | New → Invalid |
https:/ /groups. google. com/forum/ #!topic/ codership- team/MOuSg_ tiIOI