"Deadlock" on START SLAVE with crash-safe former MTS slave and autocommit=OFF
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
MySQL Server |
Unknown
|
Unknown
|
||||
Percona Server moved to https://jira.percona.com/projects/PS | Status tracked in 5.7 | |||||
5.6 |
Fix Released
|
Medium
|
Unassigned | |||
5.7 |
Fix Released
|
Medium
|
Unassigned |
Bug Description
It seems START SLAVE can hang in a session with autocommit=OFF waiting for a lock at InnoDB level if other session with autocommit=OFF checks InnoDB locks in the information_schema.
To reproduce this, set up usual replication sandboxes using MySQL Sandbox, start all nodes and on one of slaves add the following lines to default my.sandbox.cnf:
slave_parallel_
master_
relay_log_
Now, stop and start the node:
1008 ./stop
1009 ./start
1010 ./my sql
start two sessions, and in one session do:
set @@session.
show slave status\G
select count(*) from mysql.slave_
stop slave;
set global slave_parallel_
SELECT t.trx_id , t.trx_mysql_
Now in other session, do:
set @@session.
start slave; -- it hangs
Now in the first session:
slave2 [localhost] {msandbox} ((none)) > SELECT t.trx_id , t.trx_mysql_
+------
| trx_id | trx_mysql_thread_id | USER | Host | TIME | trx_state | trx_operation_state | trx_tables_in_use | trx_tables_locked | trx_rows_modified | trx_started | Query |
+------
| 21056 | 14 | system user | | 3 | LOCK WAIT | starting index read | 1 | 1 | 0 | 2015-07-03 11:15:10 | NULL |
| 21055 | 12 | msandbox | localhost | 3 | RUNNING | NULL | 0 | 0 | 0 | 2015-07-03 11:15:10 | start slave |
+------
2 rows in set (0.00 sec)
slave2 [localhost] {msandbox} ((none)) > show processlist;
+----+-
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-
| 11 | msandbox | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
| 12 | msandbox | localhost | NULL | Query | 9 | Waiting for slave thread to start | start slave | 0 | 0 |
| 13 | system user | | NULL | Connect | 9 | Connecting to master | NULL | 0 | 0 |
| 14 | system user | | NULL | Connect | 9 | System lock | NULL | 0 | 0 |
+----+-
4 rows in set (0.00 sec)
slave2 [localhost] {msandbox} ((none)) > show engine innodb status\G
*******
Type: InnoDB
Name:
Status:
=======
2015-07-03 11:15:30 7f059fbbe700 INNODB MONITOR OUTPUT
=======
Per second averages calculated from the last 32 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2 srv_active, 0 srv_shutdown, 152 srv_idle
srv_master_thread log flush and writes: 154
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 6
OS WAIT ARRAY INFO: signal count 6
Mutex spin waits 8, rounds 497, OS waits 2
RW-shared spins 3, rounds 90, OS waits 3
RW-excl spins 0, rounds 0, OS waits 0
Spin rounds per wait: 62.12 mutex, 30.00 RW-shared, 0.00 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 21057
Purge done for trx's n:o < 21054 undo n:o < 0 state: running but idle
History list length 73
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 21052, not started
MySQL thread id 11, OS thread handle 0x7f059fbbe700, query id 18 localhost msandbox init
show engine innodb status
---TRANSACTION 21056, ACTIVE 20 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 14, OS thread handle 0x7f05c0617700, query id 0 System lock
------- TRX HAS BEEN WAITING 20 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5 page no 3 n bits 80 index `PRIMARY` of table `mysql`
------------------
---TRANSACTION 21055, ACTIVE 20 sec
4 lock struct(s), heap size 1184, 16 row lock(s)
MySQL thread id 12, OS thread handle 0x7f059fb7d700, query id 15 localhost msandbox Waiting for slave thread to start
start slave
--------
...
After some time in a second session you'll get:
slave2 [localhost] {msandbox} ((none)) > start slave;
ERROR 2013 (HY000): Lost connection to MySQL server during query
and in the first one:
slave2 [localhost] {msandbox} ((none)) > show processlist; +----+-
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-
| 1 | system user | | NULL | Connect | 666 | Waiting for master to send event | NULL | 0 | 0 |
| 2 | system user | | NULL | Connect | 666 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0 | 0 |
| 3 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 4 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 5 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 6 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 7 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 8 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 9 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 10 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 11 | msandbox | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
| 12 | msandbox | localhost | NULL | Sleep | 333 | | NULL | 1 | 0 |
+----+-
12 rows in set (0.00 sec)
The deadlock is not detected in InnoDB and slave actually gets started.
Upstream?