FLUSH TABLE on master can cause pxc-node (acting as slave) to stall DML workload

Bug #1629296 reported by Sveta Smirnova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.6
Fix Released
Undecided
Unassigned

Bug Description

Bug #1520491 fixed for cases when FLUSH TABLES initially run on the cluster node, but if cluster replicates from asynchronous master and that master issues FLUSH TABLES behavior, described in bug #1520491, will happen.

How to repeat.

1. Start cluster from MTR:

./mtr --suite=galera_3nodes --mysqld=--wsrep-provider=$HOME/build/galera/libgalera_smm.so --start galera_parallel_apply_3nodes &

(ports 13000, 13004, 13008)

2. Start mysqld from MTR (use any other Percona Server directory, in my case this is 5.6 server):

./mtr --start --suite=rpl rpl_alter &

(ports 13011, 13012)

3. Connect to 13011 and change server_id:

set global server_id=11;

4. Connect to 13000, setup and start replication:

CHANGE MASTER TO master_host='127.0.0.1', master_port=13011, master_user='root';
start slave;

5. Run load on 13000 as described in bug #1520491:

----<q>----
percona1 mysql> show create table test.tt1\G
*************************** 1. row ***************************
       Table: tt1
Create Table: CREATE TABLE `tt1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ts` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=16775 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

[root@percona1 ~]# mysqlslap --password=cmon --delimiter=";" --number-of-queries=600 --create-schema=test --concurrency=4 --query="insert into tt1 set ts=now()"
----</q>----

6. Connect to 13011 and run FLUSH TABLES

7. Connect to 13000 and observe:

mysql> show processlist;
+----+-------------+-----------------+------+---------+------+---------------------------+------------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-------------+-----------------+------+---------+------+---------------------------+------------------------------+-----------+---------------+
| 1 | system user | | NULL | Sleep | 462 | wsrep aborter idle | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 462 | NULL | NULL | 0 | 0 |
| 9 | system user | | | Connect | 230 | Waiting for table flush | flush tables | 0 | 0 |
| 15 | root | localhost:52870 | NULL | Sleep | 249 | | NULL | 0 | 0 |
| 16 | root | localhost:52872 | test | Query | 228 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 17 | root | localhost:52876 | test | Query | 229 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 18 | root | localhost:52874 | test | Query | 229 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 19 | root | localhost:52878 | test | Query | 229 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 22 | root | localhost:53038 | NULL | Query | 0 | init | show processlist | 0 | 0 |
+----+-------------+-----------------+------+---------+------+---------------------------+------------------------------+-----------+---------------+
9 rows in set (0.00 sec)

mysql> kill 9;
ERROR 1095 (HY000): You are not owner of thread 9

Suggested fix: fix FLUSH TABLES behavior properly, not matter from which server it is originated. Having PXC, replicating from asynchronous master, is common situation.

Tags: i140950
summary: - Bug #1520491 not properly fixed
+ FLUSH TABLE on master can cause pxc-node (acting as slave) to stall DML
+ workload
Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

commit 0f737386c7fbb2e89d0c31f90b7711f161b5af54
Merge: 0c5743d ccbbfd8
Author: Krunal Bauskar <email address hidden>
Date: Wed Oct 12 15:09:06 2016 +0530

    Merge pull request #324 from kbauskar/5.6-pxc-707

    - PXC#707: FLUSH TABLE on master can cause pxc-node (acting as slave)…

commit ccbbfd882d9a72a22226e8d41442bea98b37cb43
Author: Krunal Bauskar <email address hidden>
Date: Wed Oct 12 09:45:04 2016 +0530

    - PXC#707: FLUSH TABLE on master can cause pxc-node (acting as slave) to stall DML workload

      Use-case has master that replicates (using mysql asynchronous replication) to
      a slave node that is also a pxc-node from cluster.

      * pxc-node is running DML workload which keeps on opening and closing the table.
      * master executes FLUSH TABLE which is then replicated on slave/pxc-node.
      * FLUSH TABLE is further replicating by wsrep-replication logic of pxc-node
        within the cluster. This replication is done using TOI replication protocol.
        TOI needs CommitMonitor so FLUSH TABLE holds CommitMonitor before it starts
        execution.
      * FLUSH TABLE execution waits to removal old-table-share from the table-cache.
        But this table-share can be removed only when active DML are done.
      * DML can't proceed as they too need CommitMonitor to complete which is currently
        held by FLUSH TABLE thread

        Fix is to skip removal of the table share by passing REFRESH_FAST option
        while closing the table.
        (Fix for this was added in bug#1520491 that considered use-case of
         replication from one-pxc-node to other pxc-node but this use-case
         is trying to exercise replication from master to pxc-node).

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-707

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.