Slave lock wait timeout and duplicate entry errors

Bug #1339276 reported by KS
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Incomplete
Undecided
Unassigned
5.6
Incomplete
Undecided
Unassigned

Bug Description

Hi, we've recently upgraded our database to 5.6.17 and notice what may be a bug.

Server version: 5.6.17-65.0-rel65.0-log Percona Server with XtraDB (GPL), Release rel65.0, Revision 587

Summary: We have some reporting queries running on a slave that cause an occasional "Lock wait timeout exceeded; try restarting transaction" (even before upgrading to 5.6) .We are in the process of moving reporting to a different architecture to avoid such timeouts. However, the problem is - while in 5.5, the slave stopped after trying slave_transaction_retries times. In 5.6, some cases that have resulted in timeout followed by a duplicate key error. This has never happened in 5.5.

GENERAL LOG:
**************

                183288 Query BEGIN
                183288 Query UPDATE `table1`
                        SET type='web'
                        WHERE ( table1_id = '123 )
                183288 Query INSERT INTO `table2`( queue_id, data, created )
                                VALUES ( '28', 'abc', '1404418635' )
Comment: "Lock wait timeout exceeded, try restarting transaction" occurs on the following UPDATE statement:
140703 20:18:06 183288 Query UPDATE table3
                        SET on_hold = IF( on_hold + '1' < 0, 0, on_hold + '1' ), modified = '1404418635'
                        WHERE item='2397'
Comment: "Transaction is retried":
                183288 Query BEGIN
                183288 Query UPDATE `table1`
                        SET type='web'
                        WHERE ( table1_id = '123 )
Comment: SQL thread stops with "Duplicate entry":
                183288 Query INSERT INTO `table2`( queue_id, data, created )
                                VALUES ( '28', 'abc', '1404418635' )

ERROR LOG:
***********

2014-07-03 20:18:06 15493 [Warning] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'test'. Query: 'UPDATE table3
                        SET on_hold = IF( on_hold + '1' < 0, 0, on_hold + '1' ), modified = '1404418635'
                        WHERE item='2397'', Error_code: 1205
2014-07-03 20:18:06 15493 [ERROR] Slave SQL: Error 'Duplicate entry '97102448' for key 'PRIMARY'' on query. Default database: 'thinkgeek'. Query: 'INSERT INTO `table2`( queue_id, data, created )
                                VALUES ( '28', 'abc', '1404418635' )', Error_code: 1062
2014-07-03 20:18:06 15493 [Warning] Slave: Duplicate entry '97102448' for key 'PRIMARY' Error_code: 1062
2014-07-03 20:18:06 15493 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'bin-log.003698' position 689390092

DDL for table2:

+------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+----------------+
| queue_element_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| queue_id | int(11) | NO | MUL | NULL | |
| data | text | YES | | NULL | |
| lock_time | int(10) unsigned | YES | | NULL | |
| requeue_count | int(3) unsigned | YES | MUL | 0 | |
| created | int(10) unsigned | NO | | 0 | |
+------------------+------------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)

It seems like when a timeout occurs and the last statement in transaction is rolled back (as stated here http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_rollback_on_timeout), the auto_increment value for queue_element_id column (value 97102448 in the error log) does not increment and when the transaction is retried it results in a duplicate key.

 We never had --innodb_rollback_on_timeout turned on, but this has never happened in 5.5. Has something changed in 5.6? This has currently stalled our upgrade plan. Please look into this. If you need additional information, just let me know.

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

I wonder if both table1 and table2 in the above are InnoDB? Check http://bugs.mysql.com/bug.php?id=42165

Please, send your my.cnf file content also. I wonder if MTS is enabled. Check http://bugs.mysql.com/bug.php?id=68465

Revision history for this message
KS (ksou) wrote :

Table1, Table2 and Table3 are InnoDB. I've uploaded the config file to this report. MTS is not enabled.

Revision history for this message
KS (ksou) wrote :
Revision history for this message
KS (ksou) wrote :

Any progress on this bug? We are on a tight upgrade timeline and have halted upgrades until we know what is causing the problem and fix it.

Thank you!

Revision history for this message
KS (ksou) wrote :

Just wondering if there is any update?

Revision history for this message
Muhammad Irfan (muhammad-irfan) wrote :

Can you please also share table definitions for table1, table3 too please.

mysql> SHOW CREATE TABLE table1\G
mysql> SHOW CREATE TABLE table3\G

I tried with populating all tables with test data and run INSERT/UPDATE on all three tables parallel on master and at the same time few SELECT queries on all 3 tables on slave. Didn't get any DUPLICATE KEY errors though. I will try again once i have missing info.

Revision history for this message
KS (ksou) wrote :

Just wanted to update this ticket.

Ever since we removed the crash-safe replication parameters from the slave instance, the sporadic duplicate key issue has never happened again. These three were the only replication parameters that I changed while upgrading to 5.6 so the thought was that they likely caused the issue.

CHANGED FROM:

relay-log-recovery = 1
relay-log-info-repository = TABLE
master_info_repository = TABLE

TO:

relay-log-recovery = 0
relay-log-info-repository = FILE
master_info_repository = FILE

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3165

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.