MySQL patches by Codership

Cluster node acting as async slave stops with the wrong position after hitting max write set size

Bug #1309669 reported by Ovais Tariq on 2014-04-18

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
MySQL patches by Codership	Fix Released	Low	Seppo Jaakola	MySQL patches by Codership 5.6.21-25.9
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Invalid	Undecided	Unassigned
5.6	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.19-25.6

Bug Description

PXC version:
Server version: 5.6.15-rel63.0-log 5.6.15-, Percona XtraDB Cluster - binary (GPL), Release: 25.5, Revision 759 wsrep_25.5.r4061, wsrep_25.5.r4061

I have a three node cluster and one of the nodes is acting as an async slave of MySQL 5.6.15 master. When a write set exceeds the wsrep_max_ws_size limit and replication fails on the PXC node upon commit, then the node reports the wrong stopping position. It basically skips over to the next binlog event.

Suppose the wsrep_max_ws_size variable is set as follows on node1 which is replicating off the async master:
node1 [localhost] {msandbox} (test) > set global wsrep_max_ws_size=1024;
Query OK, 0 rows affected (0.00 sec)

And suppose following sequence of events are executed on the master:
master [localhost] {msandbox} (test) > insert into t1 values(null, 'test');
Query OK, 1 row affected (0.00 sec)

master [localhost] {msandbox} (test) > create table L (i int primAry key auto_increment, s varchar(1024));
Query OK, 0 rows affected (0.14 sec)

master [localhost] {msandbox} (test) > insert into L (s) values (repeat('i',1024));
Query OK, 1 row affected (0.03 sec)

master [localhost] {msandbox} (test) > insert into t1 values(null, 'boo');
Query OK, 1 row affected (0.00 sec)

master [localhost] {msandbox} (test) > select * from t1;
+---+------+
| i | c |
+---+------+
| 1 | test |
| 2 | boo |
+---+------+
2 rows in set (0.00 sec)

When the insert into the table `L` is replicated it causes the replication to break on PXC node node1.
mysql [localhost] {msandbox} (test) > show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 19680
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000040
          Read_Master_Log_Pos: 20059609
               Relay_Log_File: mysql_sandbox56100-relay-bin.000003
                Relay_Log_Pos: 20013435
        Relay_Master_Log_File: mysql-bin.000040
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1180
                   Last_Error: Error in Xid_log_event: Commit could not be completed, 'Got error 5 during COMMIT'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 20059609
              Relay_Log_Space: 20056166
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1180
               Last_SQL_Error: Error in Xid_log_event: Commit could not be completed, 'Got error 5 during COMMIT'
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 66093ad0-7ee1-11e3-be39-6c626d957cff
             Master_Info_File: /work/sandboxes/pxc_5_6_15-25.5.759/node01/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 140418 19:29:22
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0

Check the value up to which the binlog events have been executed as reported by slave status:
Exec_Master_Log_Pos: 20059609

This is where replication will restart from when "START SLAVE" is executed again. Looking at the master's binlogs we see the following:
# at 20058959
#140418 19:28:25 server id 1 end_log_pos 20059038 CRC32 0x72427f1f Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842105/*!*/;
BEGIN
/*!*/;
# at 20059038
# at 20059070
#140418 19:28:25 server id 1 end_log_pos 20059070 CRC32 0x04f4f858 Intvar
SET INSERT_ID=1/*!*/;
#140418 19:28:25 server id 1 end_log_pos 20059179 CRC32 0x02bc455a Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842105/*!*/;
insert into t1 values(null, 'test')
/*!*/;
# at 20059179
#140418 19:28:25 server id 1 end_log_pos 20059210 CRC32 0x39d62eab Xid = 1832
COMMIT/*!*/;
# at 20059210
#140418 19:29:06 server id 1 end_log_pos 20059350 CRC32 0x6a880863 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842146/*!*/;
create table L (i int primAry key auto_increment, s varchar(1024))
/*!*/;
# at 20059350
#140418 19:29:22 server id 1 end_log_pos 20059429 CRC32 0x20d2f809 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842162/*!*/;
BEGIN
/*!*/;
# at 20059429
# at 20059461
#140418 19:29:22 server id 1 end_log_pos 20059461 CRC32 0x08179788 Intvar
SET INSERT_ID=1/*!*/;
#140418 19:29:22 server id 1 end_log_pos 20059578 CRC32 0x7b6cc5b6 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842162/*!*/;
insert into L (s) values (repeat('i',1024))
/*!*/;
# at 20059578
#140418 19:29:22 server id 1 end_log_pos 20059609 CRC32 0xec70565a Xid = 1834
COMMIT/*!*/;
# at 20059609
#140418 19:30:37 server id 1 end_log_pos 20059688 CRC32 0xd93485b0 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842237/*!*/;
BEGIN
/*!*/;
# at 20059688
# at 20059720
#140418 19:30:37 server id 1 end_log_pos 20059720 CRC32 0xd65c0e86 Intvar
SET INSERT_ID=2/*!*/;
#140418 19:30:37 server id 1 end_log_pos 20059828 CRC32 0x228ecfc3 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842237/*!*/;
insert into t1 values(null, 'boo')
/*!*/;
# at 20059828
#140418 19:30:37 server id 1 end_log_pos 20059859 CRC32 0x8e3d246d Xid = 1836
COMMIT/*!*/;

Now we can see that the position 20059609 is the position of the next binlog event group and not the one that failed:
# at 20059609
#140418 19:30:37 server id 1 end_log_pos 20059688 CRC32 0xd93485b0 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842237/*!*/;
BEGIN
/*!*/;
# at 20059688
# at 20059720
#140418 19:30:37 server id 1 end_log_pos 20059720 CRC32 0xd65c0e86 Intvar
SET INSERT_ID=2/*!*/;
#140418 19:30:37 server id 1 end_log_pos 20059828 CRC32 0x228ecfc3 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842237/*!*/;
insert into t1 values(null, 'boo')
/*!*/;
# at 20059828
#140418 19:30:37 server id 1 end_log_pos 20059859 CRC32 0x8e3d246d Xid = 1836
COMMIT/*!*/;

So executing a subsequent "START SLAVE" on the failed slave, causes it to restart the replication from the next binlog event skipping the one where it failed:
mysql [localhost] {msandbox} (test) > start slave;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {msandbox} (test) > select * from t1;
+---+------+
| i | c |
+---+------+
| 1 | test |
| 2 | boo |
+---+------+
2 rows in set (0.00 sec)

As we can see that the slave proceeded to execute the next INSERT into the table `t` after the failed one. And this leads to data inconsistencies.