Transaction applied partially on a slave with relay_log_info_repository=table

Bug #1477894 reported by Przemek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
Invalid
Undecided
Unassigned
5.6
Triaged
High
Unassigned
5.7
Triaged
High
Unassigned

Bug Description

Description:
A transaction may be executed partially(!) on a slave running with relay_log_info_repository=TABLE, when a slave SQL thread is blocked for more then innodb_lock_wait_timeout. Instead of re-trying just a timeouted statement it re-tries the whole transaction. As by default the innodb_rollback_on_timeout is OFF, this leads to data inconsistency. Happens for both statement and ROW format.

The workaround for that is switching innodb_rollback_on_timeout to ON, or using relay_log_info_repository=FILE.
Tested on: 5.6.25-73.1-log Percona Server (GPL)

How to repeat:
In slave's my.cnf, add:
relay_log_info_repository = TABLE

--master:
use test
create table t1 (id int auto_increment primary key, a varchar(50));
create table t2 (id int auto_increment primary key, a varchar(50));
insert into t1 values (null,"aa"),(null,"bb"),(null,"cc");
insert into t2 values (null,"AA"),(null,"BB"),(null,"CC");

-- slave:
begin; select * from t2 for update;

-- master:
begin;
insert into t1 values (10,"dd");
insert into t2 values (10,"DD");
commit;

As a result, the slave ends up with error:
2015-07-24 09:57:53 14775 [Warning] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'insert into t2 values (10,"DD")', Error_code: 1205
2015-07-24 09:57:53 14775 [ERROR] Slave SQL: Error 'Duplicate entry '10' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t1 values (10,"dd")', Error_code: 1062
2015-07-24 09:57:53 14775 [Warning] Slave: Duplicate entry '10' for key 'PRIMARY' Error_code: 1062
2015-07-24 09:57:53 14775 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000002' position 946

And inconsistent data comparing to the master:
--master:
master [localhost] {msandbox} (test) > select * from t1; select * from t2;
+----+------+
| id | a |
+----+------+
| 1 | aa |
| 2 | bb |
| 3 | cc |
| 10 | dd |
+----+------+
4 rows in set (0.00 sec)

+----+------+
| id | a |
+----+------+
| 1 | AA |
| 2 | BB |
| 3 | CC |
| 10 | DD |
+----+------+
4 rows in set (0.00 sec)

--slave:
slave1 [localhost] {msandbox} (test) > select * from t1; select * from t2;
+----+------+
| id | a |
+----+------+
| 1 | aa |
| 2 | bb |
| 3 | cc |
| 10 | dd |
+----+------+
4 rows in set (0.00 sec)

+----+------+
| id | a |
+----+------+
| 1 | AA |
| 2 | BB |
| 3 | CC |
+----+------+
3 rows in set (0.00 sec)

Corresponding upstream bug:
http://bugs.mysql.com/bug.php?id=77826

Przemek (pmalkowski)
tags: added: i55029 i57050
tags: added: upstream
Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

This is easy to reproduce with snadbox:

...
2015-07-27 11:37:07 3937 [Note] /home/openxs/5.6.25/bin/mysqld: ready for connections.
Version: '5.6.25-log' socket: '/tmp/mysql_sandbox21692.sock' port: 21692 MySQL Community Server (GPL)
2015-07-27 11:37:07 3937 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000001' at position 178696, relay log './mysql_sandbox21692-relay-bin.000002' position: 178859
2015-07-27 11:39:44 3937 [Warning] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'insert into t2 values (10,"DD")', Error_code: 1205
2015-07-27 11:39:44 3937 [ERROR] Slave SQL: Error 'Duplicate entry '10' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t1 values (10,"dd")', Error_code: 1062
2015-07-27 11:39:44 3937 [Warning] Slave: Duplicate entry '10' for key 'PRIMARY' Error_code: 1062
2015-07-27 11:39:44 3937 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000002' position 946

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-917

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.