Crash-Resistant Replication feature working wrong
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS |
Invalid
|
High
|
Unassigned | ||
5.1 |
Won't Fix
|
High
|
Unassigned | ||
5.5 |
Incomplete
|
High
|
Unassigned | ||
5.6 |
Invalid
|
High
|
Unassigned |
Bug Description
After crash recovery with option innodb_
i write about this strange behaviour in forum with well formating http://
repeat this same text
What i do?
checking with percona server 5.5.20 and 5.1.59. both case running replication capacity about 100 qps.
set safe variables
sync_binlog = 1
innodb_
innodb_
skip-slave-start
need anything else for this test?
i do emulation power failure over SysRq-B.
echo 1 > /proc/sys/
echo b > /proc/sysrq-trigger
after server rebooted, see info files.
[root@db40 airo]# cat /local/
15
log-bin.013888
868868572
10.x.xx.xx
replication
replication
3306
60
0
0
[root@db40 airo]# cat /local/
/local/
868868715
log-bin.013888
868868572
2
then start mysql server, seen log after recovery
InnoDB: relay-log.info is detected.
InnoDB: relay log: position 868868715, file name /local/
InnoDB: master log: position 868868572, file name log-bin.013888
InnoDB: The InnoDB memory heap is disabled
..........
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 869698571, file name log-bin.013888
InnoDB: and relay log file
InnoDB: position 0 869698714, file name /local/
InnoDB: Last MySQL binlog file position 0 1071255772, file name /local/
120221 13:15:20 InnoDB: Restoring buffer pool pages from ib_lru_dump
120221 13:15:20 Percona XtraDB (http://
InnoDB: relay-log.info was overwritten.
120221 13:15:21 [Note] Recovering after a crash using /local/
120221 13:17:19 [Note] Starting crash recovery...
120221 13:17:19 [Note] Crash recovery finished.
recovery done is fine, replication not started because set option skip this action.
10.8.60.
*******
Replicate
Replicate_
Master_
1 row in set (0.00 sec)
Can be seen that Exec_Master_Log_Pos get from binlog, and differs from master.info. from binlog is bigger pos, than master.info and this position is right.
Okey, everything looks great.
Start replication. And get error - Duplicate entry.
10.8.60.
*******
Replicate
Replicate_
Master_
We see that Exec position jump back from 869698571 to 868868572. This position replication get from mysql.info.
tags: | added: crash-resistant-slave-5.5 |
It seems that there are several potential issues with the crash-resistant replication. One way to fail 5.1 does not involve the case where slave-relay.info is overwritten:
1) Add a crash injection site at trx_commit_ off_kernel. This will trigger during the XA 2PC
commit protocol in the COMMIT phase.
2) Replicate an event from master to slave that will trigger this crash.
3) At the time of the crash the relay log master log position will point to the crashed
prepared transaction at position X, relay log pos will point to Y, InnoDB transactional fields
will point to the same master log position and relay log position Z, Z < Y.
4) On the InnoDB crash recovery InnoDB will undo the prepared transaction.
5) On the binlog crash recovery InnoDB will redo and commit the prepared transaction.
6) The slave will attempt to start replication assuming position X for the master log and
position Y for the relay log.
7) Thus it will attempt to re-execute the transaction that was committed in 5)