Reconcile relay log rollback in do_apply_event for 5.6.21
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
MySQL patches by Codership |
Fix Committed
|
Medium
|
Seppo Jaakola | |||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.5 |
Invalid
|
Undecided
|
Unassigned | |||
5.6 |
Fix Released
|
Undecided
|
Raghavendra D Prabhu |
Bug Description
<<<<<<< TREE
if(error)
#ifdef WITH_WSREP
=======
if (error)
>>>>>>> MERGE-SOURCE
{
<<<<<<< TREE
/* rollback to saved relay log positions */
rli_
rli_
rli_
#endif /* WITH_WSREP */
=======
>>>>>>> MERGE-SOURCE
rli-
<<<<<<< TREE
#ifdef WITH_WSREP
=======
rli_
rli_
rli_
rli_
rli_
rli_
DBUG_
/*
If relay log repository is TABLE, we do not have to revert back to
original positions in TABLE, since the new position changes will not be
persisted in TABLE with failed commit; In case of FILE, we need to
revert back the new positions, hence we need to flush original positions
into FILE.
*/
if (!rli_ptr-
rli_
>>>>>>> MERGE-SOURCE
}
<<<<<<< TREE
#endif /* WITH_WSREP */
=======
>>>>>>> MERGE-SOURCE
%%%%%%%
The former is from wsrep, the latter is from 5.6
description: | updated |
The rli parts in MERGE-SOURCE were added in vanilla mysql 5.6 for 5.6.21: ======= ======= ======= ======= ======= ======= ======= ======= ====
=======
Bug#17450876: REPLICATION STOP WITH "ERROR IN XID_LOG_EVENT:
COMMIT COULD NOT BE COMPLETED"
Problem:
========
When a SQL thread which is waiting for commit lock is killed
and restarted it causes a transaction to be skipped on slave.
Analysis:
========
when SQL thread is at a state where a DML is waiting for MDL
commit lock if SQL thread is killed then position are getting
updated in memory. i.e in the existing design positions are
flushed before the actual commit because of this rli object
will have its positions updated but the transaction is yet
to be committed. When the SQL thread is restarted it reads
position from the rli object and hence the last transaction
gets skipped on slave.
Fix:
===
When SQL thread is killed at a stage where it is waiting for
commit lock, the commit fails and an error is reported back
saying "Commit could not be completed and Query execution
was interrupted". As part of fix SQL threads positions that
existed before the commit are persisted and they are
restored back on error.
Similar symptoms exist in case of MTS as well. In MTS
"The slave coordinator and worker threads are stopped,
possibly leaving data in inconsistent state" error is
reported. In MTS a bitmap is maintained for successful
commits. This bit map is cleared on error and the old
positions are retrieved from the checkpoint which points to
old positions.
%%%%%%% %%%%%%% %%%%%%% %%%%%%% %%%%%%% %%%%%%% %%%%%%% %%%%%%
The wsrep bits were added for lp:1309669 for " Cluster node acting as async slave stops with the wrong position after hitting max write set size "