Server hangs in binary log group commit

Bug #1412037 reported by George Ormond Lorch III
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.1
Won't Fix
Undecided
Unassigned
5.5
New
High
George Ormond Lorch III
5.6
New
Undecided
Unassigned
5.7
New
Undecided
Unassigned

Bug Description

PS 5.5.28-29.4 introduced a fix for bug 1070856 that created another issue.

Binlog group commit can now hang in MYSQL_BIN_LOG::write_cache. The function contains the following loop:
     while (hdr_offs < length)
      {
        /*
          partial header only? save what we can get, process once
          we get the rest.
        */

        if (hdr_offs + LOG_EVENT_HEADER_LEN > length)
        {
          carry= length - hdr_offs;
          memcpy(header, (char *)cache->read_pos + hdr_offs, carry);
          length= hdr_offs;
        }
        else
        {
          /* we've got a full event-header, and it came in one piece */

          uchar *log_pos= (uchar *)cache->read_pos + hdr_offs + LOG_POS_OFFSET;

          /* fix end_log_pos */
          val= uint4korr(log_pos) + group;
          int4store(log_pos, val);

          /* next event header at ... */
>> log_pos= (uchar *)cache->read_pos + hdr_offs + EVENT_LEN_OFFSET;
>> hdr_offs += uint4korr(log_pos);

        }
      }

The lines noted above end up calculating the new log log_pos, which contains 0x00000000 and thus the calculation for hdr_offs += uint4korr(log_pos) ends up not moving the hdr_offs, causing a infinite loop.

Through bisecting various releases and hand builds with specific commits, we were able to identify this specific fix as the cause where the customer could/could not reproduce. Customers scripts to reproduce are large, contain private data, and take 30 minutes to run and could not be reduced to a specific series of events that caused the issue.

Customers scripts make heavy use of transaction SAVEPOINTS and ROLLBACK TO SAVEPOINT...for example, one of the series of query patterns that is executing around the time of the hang is:

BEGIN;
SAVEPOINT `IQe40KFaDEKZ3lTJNWZFNQ`;
SAVEPOINT `5B2D12rdYk6lX_PYv9VsaQ`;
SAVEPOINT `0dkjUeJqxkWFzJ9U86CzSg`;
SAVEPOINT `9WOYybwgO0qQBypK_62jMQ`;
ROLLBACK TO SAVEPOINT `9WOYybwgO0qQBypK_62jMQ`;
RELEASE SAVEPOINT `9WOYybwgO0qQBypK_62jMQ`;
ROLLBACK;
BEGIN;
SAVEPOINT `76G0bv_RyU21Lsdo-SIw4g`;
SAVEPOINT `kGjeq1CdA0aKN_Xd4i6dGQ`;
SAVEPOINT `ea4aItMo-EKerImaRHh_wg`;
SAVEPOINT `1tXgKAMp70iHY-lf-eFVEw`;
ROLLBACK TO SAVEPOINT `1tXgKAMp70iHY-lf-eFVEw`;
RELEASE SAVEPOINT `1tXgKAMp70iHY-lf-eFVEw`;
COMMIT;

Tags: i47576
no longer affects: percona-server
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3248

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.