Cluster node acting as async slave stops with the wrong position after hitting max write set size

Bug #1309669 reported by Ovais Tariq
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Released
Low
Seppo Jaakola
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Invalid
Undecided
Unassigned
5.6
Fix Released
Undecided
Unassigned

Bug Description

PXC version:
Server version: 5.6.15-rel63.0-log 5.6.15-, Percona XtraDB Cluster - binary (GPL), Release: 25.5, Revision 759 wsrep_25.5.r4061, wsrep_25.5.r4061

I have a three node cluster and one of the nodes is acting as an async slave of MySQL 5.6.15 master. When a write set exceeds the wsrep_max_ws_size limit and replication fails on the PXC node upon commit, then the node reports the wrong stopping position. It basically skips over to the next binlog event.

Suppose the wsrep_max_ws_size variable is set as follows on node1 which is replicating off the async master:
node1 [localhost] {msandbox} (test) > set global wsrep_max_ws_size=1024;
Query OK, 0 rows affected (0.00 sec)

And suppose following sequence of events are executed on the master:
master [localhost] {msandbox} (test) > insert into t1 values(null, 'test');
Query OK, 1 row affected (0.00 sec)

master [localhost] {msandbox} (test) > create table L (i int primAry key auto_increment, s varchar(1024));
Query OK, 0 rows affected (0.14 sec)

master [localhost] {msandbox} (test) > insert into L (s) values (repeat('i',1024));
Query OK, 1 row affected (0.03 sec)

master [localhost] {msandbox} (test) > insert into t1 values(null, 'boo');
Query OK, 1 row affected (0.00 sec)

master [localhost] {msandbox} (test) > select * from t1;
+---+------+
| i | c |
+---+------+
| 1 | test |
| 2 | boo |
+---+------+
2 rows in set (0.00 sec)

When the insert into the table `L` is replicated it causes the replication to break on PXC node node1.
mysql [localhost] {msandbox} (test) > show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 19680
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000040
          Read_Master_Log_Pos: 20059609
               Relay_Log_File: mysql_sandbox56100-relay-bin.000003
                Relay_Log_Pos: 20013435
        Relay_Master_Log_File: mysql-bin.000040
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1180
                   Last_Error: Error in Xid_log_event: Commit could not be completed, 'Got error 5 during COMMIT'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 20059609
              Relay_Log_Space: 20056166
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1180
               Last_SQL_Error: Error in Xid_log_event: Commit could not be completed, 'Got error 5 during COMMIT'
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 66093ad0-7ee1-11e3-be39-6c626d957cff
             Master_Info_File: /work/sandboxes/pxc_5_6_15-25.5.759/node01/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 140418 19:29:22
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0

Check the value up to which the binlog events have been executed as reported by slave status:
Exec_Master_Log_Pos: 20059609

This is where replication will restart from when "START SLAVE" is executed again. Looking at the master's binlogs we see the following:
# at 20058959
#140418 19:28:25 server id 1 end_log_pos 20059038 CRC32 0x72427f1f Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842105/*!*/;
BEGIN
/*!*/;
# at 20059038
# at 20059070
#140418 19:28:25 server id 1 end_log_pos 20059070 CRC32 0x04f4f858 Intvar
SET INSERT_ID=1/*!*/;
#140418 19:28:25 server id 1 end_log_pos 20059179 CRC32 0x02bc455a Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842105/*!*/;
insert into t1 values(null, 'test')
/*!*/;
# at 20059179
#140418 19:28:25 server id 1 end_log_pos 20059210 CRC32 0x39d62eab Xid = 1832
COMMIT/*!*/;
# at 20059210
#140418 19:29:06 server id 1 end_log_pos 20059350 CRC32 0x6a880863 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842146/*!*/;
create table L (i int primAry key auto_increment, s varchar(1024))
/*!*/;
# at 20059350
#140418 19:29:22 server id 1 end_log_pos 20059429 CRC32 0x20d2f809 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842162/*!*/;
BEGIN
/*!*/;
# at 20059429
# at 20059461
#140418 19:29:22 server id 1 end_log_pos 20059461 CRC32 0x08179788 Intvar
SET INSERT_ID=1/*!*/;
#140418 19:29:22 server id 1 end_log_pos 20059578 CRC32 0x7b6cc5b6 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842162/*!*/;
insert into L (s) values (repeat('i',1024))
/*!*/;
# at 20059578
#140418 19:29:22 server id 1 end_log_pos 20059609 CRC32 0xec70565a Xid = 1834
COMMIT/*!*/;
# at 20059609
#140418 19:30:37 server id 1 end_log_pos 20059688 CRC32 0xd93485b0 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842237/*!*/;
BEGIN
/*!*/;
# at 20059688
# at 20059720
#140418 19:30:37 server id 1 end_log_pos 20059720 CRC32 0xd65c0e86 Intvar
SET INSERT_ID=2/*!*/;
#140418 19:30:37 server id 1 end_log_pos 20059828 CRC32 0x228ecfc3 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842237/*!*/;
insert into t1 values(null, 'boo')
/*!*/;
# at 20059828
#140418 19:30:37 server id 1 end_log_pos 20059859 CRC32 0x8e3d246d Xid = 1836
COMMIT/*!*/;

Now we can see that the position 20059609 is the position of the next binlog event group and not the one that failed:
# at 20059609
#140418 19:30:37 server id 1 end_log_pos 20059688 CRC32 0xd93485b0 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842237/*!*/;
BEGIN
/*!*/;
# at 20059688
# at 20059720
#140418 19:30:37 server id 1 end_log_pos 20059720 CRC32 0xd65c0e86 Intvar
SET INSERT_ID=2/*!*/;
#140418 19:30:37 server id 1 end_log_pos 20059828 CRC32 0x228ecfc3 Query thread_id=577 exec_time=0 error_code=0
SET TIMESTAMP=1397842237/*!*/;
insert into t1 values(null, 'boo')
/*!*/;
# at 20059828
#140418 19:30:37 server id 1 end_log_pos 20059859 CRC32 0x8e3d246d Xid = 1836
COMMIT/*!*/;

So executing a subsequent "START SLAVE" on the failed slave, causes it to restart the replication from the next binlog event skipping the one where it failed:
mysql [localhost] {msandbox} (test) > start slave;
Query OK, 0 rows affected (0.01 sec)

mysql [localhost] {msandbox} (test) > select * from t1;
+---+------+
| i | c |
+---+------+
| 1 | test |
| 2 | boo |
+---+------+
2 rows in set (0.00 sec)

As we can see that the slave proceeded to execute the next INSERT into the table `t` after the failed one. And this leads to data inconsistencies.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

"Got error 5 during COMMIT" is certainly due to max writeset size excess. It was added recently.

However, that is still a valid error given to clients, so probably needs to be handled by SQL thread somehow (or not).

It would be interesting to test this with http://www.percona.com/doc/percona-xtradb-cluster/5.6/wsrep-system-index.html#wsrep_preordered

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

I think the SQL thread needs to reset its position to that of last executed binlog event. Currently, it just moves forward to the next one regardless of the fact that the COMMIT failed due to max writeset size reached.

Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

This issue happens only in 5.6 branch, and looks like a bug in binlog group commit code

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
status: New → In Progress
importance: Undecided → Low
Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

A fix was pushed in wsrep-5.6: http://bazaar.launchpad.net/~codership/codership-mysql/5.6/revision/4082
...where relays log positions are rolled back to group start position, if commit fails.

Changed in codership-mysql:
status: In Progress → Fix Committed
milestone: none → 5.6.17-25.6
Changed in codership-mysql:
status: Fix Committed → Fix Released
milestone: 5.6.20-25.6 → 5.6.21-25.9
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1671

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.