Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Nodes getting blocked with "flush tables" command

Bug #1520491 reported by Przemek on 2015-11-27

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
	5.6	Fix Released	Undecided	Krunal Bauskar	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.27-25.13

Bug Description

When "flush tables" is issued on one of the cluster nodes, it gets replicated and leads to blocking other nodes who were doing writes in the same time. I suspect this may be a side effect of fix for bug lp:1421360
Easily reproducible:

percona1 mysql> select @@version,@@version_comment\G
*************************** 1. row ***************************
@@version: 5.6.26-74.0-56-log
@@version_comment: Percona XtraDB Cluster (GPL), Release rel74.0, Revision 1, WSREP version 25.12, wsrep_25.12
1 row in set (0.00 sec)

percona1 mysql> show create table test.tt1\G
*************************** 1. row ***************************
       Table: tt1
Create Table: CREATE TABLE `tt1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ts` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=16775 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

[root@percona1 ~]# mysqlslap --password=cmon --delimiter=";" --number-of-queries=600 --create-schema=test --concurrency=4 --query="insert into tt1 set ts=now()"
Warning: Using a password on the command line interface can be insecure.
-- above running

percona2 mysql> flush tables;
Query OK, 0 rows affected (0.11 sec)

percona1 mysql> show processlist;
+----+-------------+-----------+------+---------+------+---------------------------+------------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-------------+-----------+------+---------+------+---------------------------+------------------------------+-----------+---------------+
| 1 | system user | | NULL | Sleep | 90 | wsrep aborter idle | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 90 | NULL | NULL | 0 | 0 |
| 5 | system user | | | Sleep | 9 | Waiting for table flush | flush tables | 0 | 0 |
| 6 | system user | | NULL | Sleep | 89 | NULL | NULL | 0 | 0 |
| 7 | system user | | NULL | Sleep | 89 | NULL | NULL | 0 | 0 |
| 8 | root | localhost | test | Query | 0 | init | show processlist | 0 | 0 |
| 9 | root | localhost | NULL | Sleep | 22 | | NULL | 0 | 0 |
| 10 | root | localhost | test | Query | 9 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 11 | root | localhost | test | Query | 9 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 12 | root | localhost | test | Query | 9 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 13 | root | localhost | test | Query | 9 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
+----+-------------+-----------+------+---------+------+---------------------------+------------------------------+-----------+---------------+
11 rows in set (0.00 sec)

percona1 mysql> show processlist;
+----+-------------+-----------+------+---------+------+---------------------------+------------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-------------+-----------+------+---------+------+---------------------------+------------------------------+-----------+---------------+
| 1 | system user | | NULL | Sleep | 639 | wsrep aborter idle | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 20 | Waiting for table flush | NULL | 0 | 0 |
| 5 | system user | | | Sleep | 558 | Waiting for table flush | flush tables | 0 | 0 |
| 6 | system user | | NULL | Sleep | 19 | Waiting for table flush | NULL | 0 | 0 |
| 7 | system user | | NULL | Sleep | 19 | Waiting for table flush | NULL | 0 | 0 |
| 8 | root | localhost | test | Query | 0 | init | show processlist | 0 | 0 |
| 10 | root | localhost | test | Query | 558 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 11 | root | localhost | test | Query | 558 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 12 | root | localhost | test | Query | 558 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
| 13 | root | localhost | test | Query | 558 | wsrep in pre-commit stage | insert into tt1 set ts=now() | 0 | 0 |
+----+-------------+-----------+------+---------+------+---------------------------+------------------------------+-----------+---------------+
10 rows in set (0.00 sec)

percona1 mysql> kill 5;
ERROR 1095 (HY000): You are not owner of thread 5

So the percona1 node gets completely blocked for writes, eventually leading to cluster stall due to later Flow Control trigger. The only way to recover is to restart the blocked node.

Tags:

Revision history for this message

Przemek (pmalkowski) wrote on 2015-11-27:

Stacktrace of the blocked node made with pt-stalk --collect-gdb Edit (2.5 KiB, application/octet-stream)

Revision history for this message

Przemek (pmalkowski) wrote on 2015-11-27:

I am not able to reproduce in PXC 5.6.24, so looks like it confirms the regression.

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2015-12-02:

  Enabling replication of FLUSH TABLES causes hang on the node
  where the instruction gets replication with parallel DML
  workload active.

Issue:
-----

  Replicated event FLUSH TABLES is executed as an background
  action. Before clearing the table cache FLUSH TABLES
  marks existing table cache as invalid by incrementing
  refresh_version (which then != share->version()) and so
  new booting trx will detect share->has_old_version() and will
  wait for old version to die.
  (FLUSH TABLES MDL based co-ordination logic.)

  During this process if there are already active DML statements
  (before flush tables incremented refresh_version) then FLUSH TABLES
  thread will wait for ref-count of such shared version to drop to 0.

Now with galera replication involved let's consider a use-case:

  a. DML statement actively executing.
  b. FLUSH TABLE starts execution as replication event.
  c. FLUSH TABLE gets galera-execution-token (seqno) assigned.
  d. DML statement now enter galera space for commit certification
     and get galera-execution-token assigned (that is > FLUSH TABLES
     token).
  e. FLUSH TABLES thread waits for ref-count to drop to 0.
  f. DML statement wait in galera commit-ordering as action with token
     less that it is pending execution. This blocks ref-cnt to drop to 0.

DEADLOCK.

This issue was reported in FTWRL context as bug#1280768.

  Proper fix would be to sort this double co-ordination.
  (MDL based on cordination in MySQL and
   Commit Ordering based cordination in Galera).

  Current fix (workaround) is to avoid FLUSH TABLES replication as it earlier.
  It was enabled to avoid GTID inconsistency as FLUSH TABLES command is written
  to bin-log but not replicated which then causes issues if async slave's master
  (that is using galera-node as master) is switched from galera-node-a
  to galera-node-b.

  Skipping FLUSH TABLES replication (if node is operating as galera node
  that is WSREP_ON) will have effect on async replication as FLUSH TABLES
  will not be replicated to async slave.

Related bugs: lp:1520491, lp:1280768

Enabling replication of FLUSH TABLES causes hang on the node
  where the instruction gets replication with parallel DML
  workload active.

Issue:
  -----

Now with galera replication involved let's consider a use-case:

DEADLOCK.

This issue was reported in FTWRL context as bug#1280768.

Proper fix would be to sort this double co-ordination.
  (MDL based on cordination in MySQL and
   Commit Ordering based cordination in Galera).

Skipping FLUSH TABLES replication (if node is operating as galera node
  that is WSREP_ON) will have effect on async replication as FLUSH TABLES
  will not be replicated to async slave.

Related bugs: lp:1520491, lp:1280768

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1867