Nodes getting blocked with "flush tables" command
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.6 |
Fix Released
|
Undecided
|
Krunal Bauskar |
Bug Description
When "flush tables" is issued on one of the cluster nodes, it gets replicated and leads to blocking other nodes who were doing writes in the same time. I suspect this may be a side effect of fix for bug lp:1421360
Easily reproducible:
percona1 mysql> select @@version,
*******
@@version: 5.6.26-74.0-56-log
@@version_comment: Percona XtraDB Cluster (GPL), Release rel74.0, Revision 1, WSREP version 25.12, wsrep_25.12
1 row in set (0.00 sec)
percona1 mysql> show create table test.tt1\G
*******
Table: tt1
Create Table: CREATE TABLE `tt1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ts` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=
1 row in set (0.00 sec)
[root@percona1 ~]# mysqlslap --password=cmon --delimiter=";" --number-
Warning: Using a password on the command line interface can be insecure.
-- above running
percona2 mysql> flush tables;
Query OK, 0 rows affected (0.11 sec)
percona1 mysql> show processlist;
+----+-
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-
| 1 | system user | | NULL | Sleep | 90 | wsrep aborter idle | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 90 | NULL | NULL | 0 | 0 |
| 5 | system user | | | Sleep | 9 | Waiting for table flush | flush tables | 0 | 0 |
| 6 | system user | | NULL | Sleep | 89 | NULL | NULL | 0 | 0 |
| 7 | system user | | NULL | Sleep | 89 | NULL | NULL | 0 | 0 |
| 8 | root | localhost | test | Query | 0 | init | show processlist | 0 | 0 |
| 9 | root | localhost | NULL | Sleep | 22 | | NULL | 0 | 0 |
| 10 | root | localhost | test | Query | 9 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 11 | root | localhost | test | Query | 9 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 12 | root | localhost | test | Query | 9 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 13 | root | localhost | test | Query | 9 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
+----+-
11 rows in set (0.00 sec)
percona2 mysql> show processlist;
+----+-
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-
| 1 | system user | | NULL | Sleep | 46 | committed 1426038 | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 972 | wsrep aborter idle | NULL | 0 | 0 |
| 3 | system user | | NULL | Sleep | 46 | committed 1426037 | NULL | 0 | 0 |
| 4 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
+----+-
4 rows in set (0.00 sec)
percona1 mysql> show processlist;
+----+-
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-
| 1 | system user | | NULL | Sleep | 639 | wsrep aborter idle | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 20 | Waiting for table flush | NULL | 0 | 0 |
| 5 | system user | | | Sleep | 558 | Waiting for table flush | flush tables | 0 | 0 |
| 6 | system user | | NULL | Sleep | 19 | Waiting for table flush | NULL | 0 | 0 |
| 7 | system user | | NULL | Sleep | 19 | Waiting for table flush | NULL | 0 | 0 |
| 8 | root | localhost | test | Query | 0 | init | show processlist | 0 | 0 |
| 10 | root | localhost | test | Query | 558 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 11 | root | localhost | test | Query | 558 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 12 | root | localhost | test | Query | 558 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 13 | root | localhost | test | Query | 558 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
+----+-
10 rows in set (0.00 sec)
percona1 mysql> kill 5;
ERROR 1095 (HY000): You are not owner of thread 5
So the percona1 node gets completely blocked for writes, eventually leading to cluster stall due to later Flow Control trigger. The only way to recover is to restart the blocked node.
I am not able to reproduce in PXC 5.6.24, so looks like it confirms the regression.