Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

FLUSH TABLE on master can cause pxc-node (acting as slave) to stall DML workload

Bug #1629296 reported by Sveta Smirnova on 2016-09-30

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
	5.6	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.34-26.19

Bug Description

Bug #1520491 fixed for cases when FLUSH TABLES initially run on the cluster node, but if cluster replicates from asynchronous master and that master issues FLUSH TABLES behavior, described in bug #1520491, will happen.

How to repeat.

1. Start cluster from MTR:

./mtr --suite=galera_3nodes --mysqld=--wsrep-provider=$HOME/build/galera/libgalera_smm.so --start galera_parallel_apply_3nodes &

(ports 13000, 13004, 13008)

2. Start mysqld from MTR (use any other Percona Server directory, in my case this is 5.6 server):

./mtr --start --suite=rpl rpl_alter &

(ports 13011, 13012)

3. Connect to 13011 and change server_id:

set global server_id=11;

4. Connect to 13000, setup and start replication:

CHANGE MASTER TO master_host='127.0.0.1', master_port=13011, master_user='root';
start slave;

5. Run load on 13000 as described in bug #1520491:

----<q>----
percona1 mysql> show create table test.tt1\G
*************************** 1. row ***************************
       Table: tt1
Create Table: CREATE TABLE `tt1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ts` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=16775 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

[root@percona1 ~]# mysqlslap --password=cmon --delimiter=";" --number-of-queries=600 --create-schema=test --concurrency=4 --query="insert into tt1 set ts=now()"
----</q>----

6. Connect to 13011 and run FLUSH TABLES

7. Connect to 13000 and observe:

mysql> kill 9;
ERROR 1095 (HY000): You are not owner of thread 9

Suggested fix: fix FLUSH TABLES behavior properly, not matter from which server it is originated. Having PXC, replicating from asynchronous master, is common situation.

Tags:

Krunal Bauskar (krunal-bauskar) on 2016-10-12

summary:

- Bug #1520491 not properly fixed
+ FLUSH TABLE on master can cause pxc-node (acting as slave) to stall DML
+ workload

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2016-10-12:

commit 0f737386c7fbb2e89d0c31f90b7711f161b5af54
Merge: 0c5743d ccbbfd8
Author: Krunal Bauskar <email address hidden>
Date: Wed Oct 12 15:09:06 2016 +0530

Merge pull request #324 from kbauskar/5.6-pxc-707

- PXC#707: FLUSH TABLE on master can cause pxc-node (acting as slave)…

commit ccbbfd882d9a72a22226e8d41442bea98b37cb43
Author: Krunal Bauskar <email address hidden>
Date: Wed Oct 12 09:45:04 2016 +0530

- PXC#707: FLUSH TABLE on master can cause pxc-node (acting as slave) to stall DML workload

Use-case has master that replicates (using mysql asynchronous replication) to
a slave node that is also a pxc-node from cluster.

      * pxc-node is running DML workload which keeps on opening and closing the table.
      * master executes FLUSH TABLE which is then replicated on slave/pxc-node.
      * FLUSH TABLE is further replicating by wsrep-replication logic of pxc-node
        within the cluster. This replication is done using TOI replication protocol.
        TOI needs CommitMonitor so FLUSH TABLE holds CommitMonitor before it starts
        execution.
      * FLUSH TABLE execution waits to removal old-table-share from the table-cache.
        But this table-share can be removed only when active DML are done.
      * DML can't proceed as they too need CommitMonitor to complete which is currently
        held by FLUSH TABLE thread

        Fix is to skip removal of the table share by passing REFRESH_FAST option
        while closing the table.
        (Fix for this was added in bug#1520491 that considered use-case of
         replication from one-pxc-node to other pxc-node but this use-case
         is trying to exercise replication from master to pxc-node).

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-707

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.