Percona Server with XtraDB

Crash resistant replication breaks with binlog XA transaction recovery

Reported by Laurynas Biveinis on 2012-06-13
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona Server
High
Laurynas Biveinis
5.1
High
Laurynas Biveinis
5.5
High
Laurynas Biveinis

Bug Description

Moved from bug 937852:

It seems that there are several potential issues with the crash-resistant replication. One way to fail 5.1 does not involve the case where slave-relay.info is overwritten:

1) Add a crash injection site at trx_commit_off_kernel. This will trigger during the XA 2PC
commit protocol in the COMMIT phase.
2) Replicate an event from master to slave that will trigger this crash.
3) At the time of the crash the relay log master log position will point to the crashed
prepared transaction at position X, relay log pos will point to Y, InnoDB transactional fields
will point to the same master log position and relay log position Z, Z < Y.
4) On the InnoDB crash recovery InnoDB will undo the prepared transaction.
5) On the binlog crash recovery InnoDB will redo and commit the prepared transaction.
6) The slave will attempt to start replication assuming position X for the master log and
position Y for the relay log.
7) Thus it will attempt to re-execute the transaction that was committed in 5)

One of the causes is that the log positions are written in the 2PC commit COMMIT phase while that should happen in the PREPARE instead.

A second potential issue (not confirmed yet) is that slave-relay.info is overwritten too early, after the InnoDB crash recovery has run, but before the binlog crash recovery. It's possible however that this is purely theoretical issue right now.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers