Comment 0 for bug 1471141

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

It seems START SLAVE can hang in a session withj autocommit=OFF waiting for a lock at InnoDB level if other session with autocommit=OFF checks InnoDB locks in the information_schema.

To reproduce this, set up usual replication sandboxes using MySQL Sandbox, start all nodes and one one of slaves add the following lines to default my.sandbox.cnf:

slave_parallel_workers=8
master_info_repository=TABLE
relay_log_info_repository = TABLE

Now, stop and start the node:

1008 ./stop
1009 ./start
1010 ./my sql

start two sessions, and in one session do:

set @@session.autocommit=OFF;
show slave status\G
select count(*) from mysql.slave_worker_info; -- you should see 8 here
stop slave;
set global slave_parallel_workers=0;
 SELECT t.trx_id , t.trx_mysql_thread_id , p.USER , p.Host , p.TIME , t.trx_state , t.trx_operation_state , t.trx_tables_in_use , t.trx_tables_locked , t.trx_rows_modified , t.trx_started , SUBSTRING(t.trx_query, 1, 50) Query FROM information_schema.innodb_trx t INNER JOIN information_schema.processlist p ON t.trx_mysql_thread_id = p.ID;

Now in other session, do:

set @@session.autocommit = OFF;
start slave; -- it hangs

Now in the first session:

slave2 [localhost] {msandbox} ((none)) > SELECT t.trx_id , t.trx_mysql_thread_id , p.USER , p.Host , p.TIME , t.trx_state , t.trx_operation_state , t.trx_tables_in_use , t.trx_tables_locked , t.trx_rows_modified , t.trx_started , SUBSTRING(t.trx_query, 1, 50) Query FROM information_schema.innodb_trx t INNER JOIN information_schema.processlist p ON t.trx_mysql_thread_id = p.ID;
+--------+---------------------+-------------+-----------+------+-----------+---------------------+-------------------+-------------------+-------------------+---------------------+-------------+
| trx_id | trx_mysql_thread_id | USER | Host | TIME | trx_state | trx_operation_state | trx_tables_in_use | trx_tables_locked | trx_rows_modified | trx_started | Query |
+--------+---------------------+-------------+-----------+------+-----------+---------------------+-------------------+-------------------+-------------------+---------------------+-------------+
| 21056 | 14 | system user | | 3 | LOCK WAIT | starting index read | 1 | 1 | 0 | 2015-07-03 11:15:10 | NULL |
| 21055 | 12 | msandbox | localhost | 3 | RUNNING | NULL | 0 | 0 | 0 | 2015-07-03 11:15:10 | start slave |
+--------+---------------------+-------------+-----------+------+-----------+---------------------+-------------------+-------------------+-------------------+---------------------+-------------+
2 rows in set (0.00 sec)

slave2 [localhost] {msandbox} ((none)) > show processlist;
+----+-------------+-----------+------+---------+------+-----------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-------------+-----------+------+---------+------+-----------------------------------+------------------+-----------+---------------+
| 11 | msandbox | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
| 12 | msandbox | localhost | NULL | Query | 9 | Waiting for slave thread to start | start slave | 0 | 0 |
| 13 | system user | | NULL | Connect | 9 | Connecting to master | NULL | 0 | 0 |
| 14 | system user | | NULL | Connect | 9 | System lock | NULL | 0 | 0 |
+----+-------------+-----------+------+---------+------+-----------------------------------+------------------+-----------+---------------+
4 rows in set (0.00 sec)

slave2 [localhost] {msandbox} ((none)) > show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2015-07-03 11:15:30 7f059fbbe700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 32 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2 srv_active, 0 srv_shutdown, 152 srv_idle
srv_master_thread log flush and writes: 154
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 6
OS WAIT ARRAY INFO: signal count 6
Mutex spin waits 8, rounds 497, OS waits 2
RW-shared spins 3, rounds 90, OS waits 3
RW-excl spins 0, rounds 0, OS waits 0
Spin rounds per wait: 62.12 mutex, 30.00 RW-shared, 0.00 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 21057
Purge done for trx's n:o < 21054 undo n:o < 0 state: running but idle
History list length 73
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 21052, not started
MySQL thread id 11, OS thread handle 0x7f059fbbe700, query id 18 localhost msandbox init
show engine innodb status
---TRANSACTION 21056, ACTIVE 20 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 14, OS thread handle 0x7f05c0617700, query id 0 System lock
------- TRX HAS BEEN WAITING 20 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5 page no 3 n bits 80 index `PRIMARY` of table `mysql`.`slave_worker_info` trx id 21056 lock mode S waiting
------------------
---TRANSACTION 21055, ACTIVE 20 sec
4 lock struct(s), heap size 1184, 16 row lock(s)
MySQL thread id 12, OS thread handle 0x7f059fb7d700, query id 15 localhost msandbox Waiting for slave thread to start
start slave
--------

...

After some time in a second session you'll get:

slave2 [localhost] {msandbox} ((none)) > start slave;
ERROR 2013 (HY000): Lost connection to MySQL server during query

and in the first one:

slave2 [localhost] {msandbox} ((none)) > show processlist; +----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+-----------+---------------+
| 1 | system user | | NULL | Connect | 666 | Waiting for master to send event | NULL | 0 | 0 |
| 2 | system user | | NULL | Connect | 666 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0 | 0 |
| 3 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 4 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 5 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 6 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 7 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 8 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 9 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 10 | system user | | NULL | Connect | 666 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 11 | msandbox | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
| 12 | msandbox | localhost | NULL | Sleep | 333 | | NULL | 1 | 0 |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+-----------+---------------+
12 rows in set (0.00 sec)

The deadlock is not detected in InnoDB and slave actually gets started.