inconsistency in multi-master test when NOT using binlogging
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MySQL patches by Codership |
Fix Released
|
Critical
|
Seppo Jaakola | ||
5.1 |
Fix Released
|
Critical
|
Seppo Jaakola | ||
5.5 |
Fix Released
|
Critical
|
Seppo Jaakola |
Bug Description
yes, this is a sibling bug to: https:/
Similar inconsistency will happen in multi-master test when binlogging has not been enabled. However, the cause is different.
How to reproduce:
1. start two nodes and make sure that neither log-bin nor log-slave-updates is enabled
2. start a load balancer (glbd) to distribute load with round robin policy
3. launch randgen load against the load balancer
randgen commandline used was:
./gentest.pl --gendata=
One node will fail in applying with error like:
111006 9:41:10 [ERROR] Slave SQL: Could not execute Delete_rows event on table randgen.BB; Can't find record in 'BB', Error_code: 1032; handler erro
r HA_ERR_
111006 9:41:10 [Warning] WSREP: RBR event 2 Delete_rows apply warning: 120, 18623
111006 9:42:10 [ERROR] WSREP: Failed to apply trx: source: 91e241d8-
id: 17 trx_id: 295490 seqnos (l: 1941, g: 18623, s: 18620, d: 18599, ts: 131788317997432
111006 9:42:10 [ERROR] WSREP: Failed to apply app buffer: (M<8D>N^S, seqno: 18623, status: WSREP_FATAL
The anatomy of this bug is as follows:
1. a large transaction processing in node A suffers a statement rollback
2. failing statement is rolled back in node A, and transaction continues
3. the large transaction commits. But here the populated write set contains the failed statement
4. node B applies this WS with failed statement, and now nodes are inconsistent
5. node B commits another transaction, which uses information received in the failed statement
6. when node A, tries to apply the WS from node B, inconsistency due to failed statement will make applying impossible