Reconcile relay log rollback in do_apply_event for 5.6.21

Bug #1377226 reported by Raghavendra D Prabhu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Committed
Medium
Seppo Jaakola
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Invalid
Undecided
Unassigned
5.6
Fix Released
Undecided
Raghavendra D Prabhu

Bug Description

<<<<<<< TREE
  if(error)
#ifdef WITH_WSREP
=======
  if (error)
>>>>>>> MERGE-SOURCE
  {
<<<<<<< TREE
    /* rollback to saved relay log positions */
    rli_ptr->set_group_master_log_pos(wsrep_log_pos_save);
    rli_ptr->set_group_relay_log_pos(wsrep_relay_log_pos_save);
    rli_ptr->set_group_relay_log_name(wsrep_relay_log_name_save);
#endif /* WITH_WSREP */
=======
>>>>>>> MERGE-SOURCE
    rli->report(ERROR_LEVEL, thd->get_stmt_da()->sql_errno(),
                "Error in Xid_log_event: Commit could not be completed, '%s'",
                thd->get_stmt_da()->message());
<<<<<<< TREE
#ifdef WITH_WSREP
=======

    rli_ptr->set_group_master_log_name(saved_group_master_log_name);
    rli_ptr->notify_group_master_log_name_update();
    rli_ptr->set_group_master_log_pos(saved_group_master_log_pos);
    rli_ptr->set_group_relay_log_name(saved_group_relay_log_name);
    rli_ptr->notify_group_relay_log_name_update();
    rli_ptr->set_group_relay_log_pos(saved_group_relay_log_pos);

    DBUG_PRINT("info", ("Rolling back to group master %s %llu group relay %s"
                        " %llu\n", rli_ptr->get_group_master_log_name(),
                        rli_ptr->get_group_master_log_pos(),
                        rli_ptr->get_group_relay_log_name(),
                        rli_ptr->get_group_relay_log_pos()));

    /*
      If relay log repository is TABLE, we do not have to revert back to
      original positions in TABLE, since the new position changes will not be
      persisted in TABLE with failed commit; In case of FILE, we need to
      revert back the new positions, hence we need to flush original positions
      into FILE.
    */
    if (!rli_ptr->is_transactional())
      rli_ptr->flush_info(false);
>>>>>>> MERGE-SOURCE
  }
<<<<<<< TREE
#endif /* WITH_WSREP */
=======
>>>>>>> MERGE-SOURCE

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The former is from wsrep, the latter is from 5.6

description: updated
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

The rli parts in MERGE-SOURCE were added in vanilla mysql 5.6 for 5.6.21:
===================================================================

Bug#17450876:REPLICATION STOP WITH "ERROR IN XID_LOG_EVENT:
COMMIT COULD NOT BE COMPLETED"

Problem:
========
When a SQL thread which is waiting for commit lock is killed
and restarted it causes a transaction to be skipped on slave.

Analysis:
========
when SQL thread is at a state where a DML is waiting for MDL
commit lock if SQL thread is killed then position are getting
updated in memory. i.e in the existing design positions are
flushed before the actual commit because of this rli object
will have its positions updated but the transaction is yet
to be committed. When the SQL thread is restarted it reads
position from the rli object and hence the last transaction
gets skipped on slave.

Fix:
===
When SQL thread is killed at a stage where it is waiting for
commit lock, the commit fails and an error is reported back
saying "Commit could not be completed and Query execution
was interrupted". As part of fix SQL threads positions that
existed before the commit are persisted and they are
restored back on error.

Similar symptoms exist in case of MTS as well. In MTS
"The slave coordinator and worker threads are stopped,
possibly leaving data in inconsistent state" error is
reported. In MTS a bitmap is maintained for successful
commits. This bit map is cleared on error and the old
positions are retrieved from the checkpoint which points to
old positions.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The wsrep bits were added for lp:1309669 for " Cluster node acting as async slave stops with the wrong position after hitting max write set size "

summary: - Reconcile relay log rollback in do_apply_event
+ Reconcile relay log rollback in do_apply_event for 5.6.21
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

From the looks of it, the wsrep patch here is not required with mysql one in place.

Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

Yes, Oracle has fixed this bug, and we can remove the wsrep patch for it

This was done in wsrep 5.6 tree as part of the actual merge: lp:1378686

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
importance: Undecided → Medium
status: New → In Progress
milestone: none → 5.6.21-25.7
status: In Progress → Fix Committed
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1747

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.