MySQL patches by Codership

Reconcile relay log rollback in do_apply_event for 5.6.21

Bug #1377226 reported by Raghavendra D Prabhu on 2014-10-03

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
MySQL patches by Codership	Fix Committed	Medium	Seppo Jaakola	MySQL patches by Codership 5.6.21-25.9
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Invalid	Undecided	Unassigned
5.6	Fix Released	Undecided	Raghavendra D Prabhu	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.21-25.8

Bug Description

<<<<<<< TREE
  if(error)
#ifdef WITH_WSREP
=======
  if (error)
>>>>>>> MERGE-SOURCE
  {
<<<<<<< TREE
    /* rollback to saved relay log positions */
    rli_ptr->set_group_master_log_pos(wsrep_log_pos_save);
    rli_ptr->set_group_relay_log_pos(wsrep_relay_log_pos_save);
    rli_ptr->set_group_relay_log_name(wsrep_relay_log_name_save);
#endif /* WITH_WSREP */
=======
>>>>>>> MERGE-SOURCE
    rli->report(ERROR_LEVEL, thd->get_stmt_da()->sql_errno(),
                "Error in Xid_log_event: Commit could not be completed, '%s'",
                thd->get_stmt_da()->message());
<<<<<<< TREE
#ifdef WITH_WSREP
=======

    rli_ptr->set_group_master_log_name(saved_group_master_log_name);
    rli_ptr->notify_group_master_log_name_update();
    rli_ptr->set_group_master_log_pos(saved_group_master_log_pos);
    rli_ptr->set_group_relay_log_name(saved_group_relay_log_name);
    rli_ptr->notify_group_relay_log_name_update();
    rli_ptr->set_group_relay_log_pos(saved_group_relay_log_pos);

    DBUG_PRINT("info", ("Rolling back to group master %s %llu group relay %s"
                        " %llu\n", rli_ptr->get_group_master_log_name(),
                        rli_ptr->get_group_master_log_pos(),
                        rli_ptr->get_group_relay_log_name(),
                        rli_ptr->get_group_relay_log_pos()));

    /*
      If relay log repository is TABLE, we do not have to revert back to
      original positions in TABLE, since the new position changes will not be
      persisted in TABLE with failed commit; In case of FILE, we need to
      revert back the new positions, hence we need to flush original positions
      into FILE.
    */
    if (!rli_ptr->is_transactional())
      rli_ptr->flush_info(false);
>>>>>>> MERGE-SOURCE
  }
<<<<<<< TREE
#endif /* WITH_WSREP */
=======
>>>>>>> MERGE-SOURCE

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The former is from wsrep, the latter is from 5.6

See original description

Raghavendra D Prabhu (raghavendra-prabhu) on 2014-10-03

description:

updated

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2014-10-03:

The rli parts in MERGE-SOURCE were added in vanilla mysql 5.6 for 5.6.21:
===================================================================

Bug#17450876:REPLICATION STOP WITH "ERROR IN XID_LOG_EVENT:
COMMIT COULD NOT BE COMPLETED"

Problem:
========
When a SQL thread which is waiting for commit lock is killed
and restarted it causes a transaction to be skipped on slave.

Analysis:
========
when SQL thread is at a state where a DML is waiting for MDL
commit lock if SQL thread is killed then position are getting
updated in memory. i.e in the existing design positions are
flushed before the actual commit because of this rli object
will have its positions updated but the transaction is yet
to be committed. When the SQL thread is restarted it reads
position from the rli object and hence the last transaction
gets skipped on slave.

Fix:
===
When SQL thread is killed at a stage where it is waiting for
commit lock, the commit fails and an error is reported back
saying "Commit could not be completed and Query execution
was interrupted". As part of fix SQL threads positions that
existed before the commit are persisted and they are
restored back on error.

Similar symptoms exist in case of MTS as well. In MTS
"The slave coordinator and worker threads are stopped,
possibly leaving data in inconsistent state" error is
reported. In MTS a bitmap is maintained for successful
commits. This bit map is cleared on error and the old
positions are retrieved from the checkpoint which points to
old positions.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The wsrep bits were added for lp:1309669 for " Cluster node acting as async slave stops with the wrong position after hitting max write set size "

summary:

- Reconcile relay log rollback in do_apply_event
+ Reconcile relay log rollback in do_apply_event for 5.6.21

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2014-10-03:

From the looks of it, the wsrep patch here is not required with mysql one in place.

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2014-10-08:

Yes, Oracle has fixed this bug, and we can remove the wsrep patch for it

This was done in wsrep 5.6 tree as part of the actual merge: lp:1378686

Changed in codership-mysql:
assignee:	nobody → Seppo Jaakola (seppo-jaakola)
importance:	Undecided → Medium
status:	New → In Progress
milestone:	none → 5.6.21-25.7
status:	In Progress → Fix Committed

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1747

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.