Crash resistant replication breaks with binlog XA transaction recovery
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS |
Fix Released
|
High
|
Laurynas Biveinis | ||
5.1 |
Fix Released
|
High
|
Laurynas Biveinis | ||
5.5 |
Fix Released
|
High
|
Laurynas Biveinis |
Bug Description
Moved from bug 937852:
It seems that there are several potential issues with the crash-resistant replication. One way to fail 5.1 does not involve the case where slave-relay.info is overwritten:
1) Add a crash injection site at trx_commit_
commit protocol in the COMMIT phase.
2) Replicate an event from master to slave that will trigger this crash.
3) At the time of the crash the relay log master log position will point to the crashed
prepared transaction at position X, relay log pos will point to Y, InnoDB transactional fields
will point to the same master log position and relay log position Z, Z < Y.
4) On the InnoDB crash recovery InnoDB will undo the prepared transaction.
5) On the binlog crash recovery InnoDB will redo and commit the prepared transaction.
6) The slave will attempt to start replication assuming position X for the master log and
position Y for the relay log.
7) Thus it will attempt to re-execute the transaction that was committed in 5)
Related branches
- Alexey Kopytov (community): Approve
- Laurynas Biveinis: Pending requested
- Percona core: Pending requested
-
Diff: 878 lines (+527/-170)7 files modifiedPercona-Server/mysql-test/suite/rpl/r/rpl_percona_crash_resistant_rpl.result (+54/-0)
Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl-slave.opt (+1/-0)
Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl.test (+117/-0)
Percona-Server/storage/innodb_plugin/handler/ha_innodb.cc (+225/-113)
Percona-Server/storage/innodb_plugin/include/trx0sys.h (+15/-1)
Percona-Server/storage/innodb_plugin/trx/trx0sys.c (+96/-54)
Percona-Server/storage/innodb_plugin/trx/trx0trx.c (+19/-2)
- Alexey Kopytov (community): Approve
-
Diff: 869 lines (+523/-165)7 files modifiedPercona-Server/mysql-test/suite/rpl/r/rpl_percona_crash_resistant_rpl.result (+54/-0)
Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl-slave.opt (+1/-0)
Percona-Server/mysql-test/suite/rpl/t/rpl_percona_crash_resistant_rpl.test (+117/-0)
Percona-Server/storage/innobase/handler/ha_innodb.cc (+221/-108)
Percona-Server/storage/innobase/include/trx0sys.h (+15/-1)
Percona-Server/storage/innobase/trx/trx0sys.c (+96/-54)
Percona-Server/storage/innobase/trx/trx0trx.c (+19/-2)
One of the causes is that the log positions are written in the 2PC commit COMMIT phase while that should happen in the PREPARE instead.
A second potential issue (not confirmed yet) is that slave-relay.info is overwritten too early, after the InnoDB crash recovery has run, but before the binlog crash recovery. It's possible however that this is purely theoretical issue right now.