FLUSH TABLE on master can cause pxc-node (acting as slave) to stall DML workload
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.6 |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Bug #1520491 fixed for cases when FLUSH TABLES initially run on the cluster node, but if cluster replicates from asynchronous master and that master issues FLUSH TABLES behavior, described in bug #1520491, will happen.
How to repeat.
1. Start cluster from MTR:
./mtr --suite=
(ports 13000, 13004, 13008)
2. Start mysqld from MTR (use any other Percona Server directory, in my case this is 5.6 server):
./mtr --start --suite=rpl rpl_alter &
(ports 13011, 13012)
3. Connect to 13011 and change server_id:
set global server_id=11;
4. Connect to 13000, setup and start replication:
CHANGE MASTER TO master_
start slave;
5. Run load on 13000 as described in bug #1520491:
----<q>----
percona1 mysql> show create table test.tt1\G
*******
Table: tt1
Create Table: CREATE TABLE `tt1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ts` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=
1 row in set (0.00 sec)
[root@percona1 ~]# mysqlslap --password=cmon --delimiter=";" --number-
----</q>----
6. Connect to 13011 and run FLUSH TABLES
7. Connect to 13000 and observe:
mysql> show processlist;
+----+-
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-
| 1 | system user | | NULL | Sleep | 462 | wsrep aborter idle | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 462 | NULL | NULL | 0 | 0 |
| 9 | system user | | | Connect | 230 | Waiting for table flush | flush tables | 0 | 0 |
| 15 | root | localhost:52870 | NULL | Sleep | 249 | | NULL | 0 | 0 |
| 16 | root | localhost:52872 | test | Query | 228 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 17 | root | localhost:52876 | test | Query | 229 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 18 | root | localhost:52874 | test | Query | 229 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 19 | root | localhost:52878 | test | Query | 229 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 22 | root | localhost:53038 | NULL | Query | 0 | init | show processlist | 0 | 0 |
+----+-
9 rows in set (0.00 sec)
mysql> kill 9;
ERROR 1095 (HY000): You are not owner of thread 9
Suggested fix: fix FLUSH TABLES behavior properly, not matter from which server it is originated. Having PXC, replicating from asynchronous master, is common situation.
summary: |
- Bug #1520491 not properly fixed + FLUSH TABLE on master can cause pxc-node (acting as slave) to stall DML + workload |
commit 0f737386c7fbb2e 89d0c31f90b7711 f161b5af54
Merge: 0c5743d ccbbfd8
Author: Krunal Bauskar <email address hidden>
Date: Wed Oct 12 15:09:06 2016 +0530
Merge pull request #324 from kbauskar/ 5.6-pxc- 707
- PXC#707: FLUSH TABLE on master can cause pxc-node (acting as slave)…
commit ccbbfd882d9a72a 22226e8d41442be a98b37cb43
Author: Krunal Bauskar <email address hidden>
Date: Wed Oct 12 09:45:04 2016 +0530
- PXC#707: FLUSH TABLE on master can cause pxc-node (acting as slave) to stall DML workload
Use-case has master that replicates (using mysql asynchronous replication) to
a slave node that is also a pxc-node from cluster.
* pxc-node is running DML workload which keeps on opening and closing the table.
* master executes FLUSH TABLE which is then replicated on slave/pxc-node.
* FLUSH TABLE is further replicating by wsrep-replication logic of pxc-node
within the cluster. This replication is done using TOI replication protocol.
TOI needs CommitMonitor so FLUSH TABLE holds CommitMonitor before it starts
execution.
* FLUSH TABLE execution waits to removal old-table-share from the table-cache.
But this table-share can be removed only when active DML are done.
* DML can't proceed as they too need CommitMonitor to complete which is currently
held by FLUSH TABLE thread
Fix is to skip removal of the table share by passing REFRESH_FAST option
replication from one-pxc-node to other pxc-node but this use-case
while closing the table.
(Fix for this was added in bug#1520491 that considered use-case of
is trying to exercise replication from master to pxc-node).