SHOW STATUS blocked when wsrep_local_recv_queue>0 and wsrep_sync_wait=1

Bug #1428541 reported by Przemek on 2015-03-05
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Confirmed
Wishlist
Unassigned
5.6
Confirmed
Wishlist
Unassigned

Bug Description

A node who reached max recv_queue and sending Flow Control pause messages, is blocked from executing SHOW STATUS commands until Flow Control is released. This makes monitoring of this host impossible.
Also SHOW STATUS commands are increasing the 'wsrep_causal_reads' counter.
Happens on both PXC 5.6.21 and 5.5.39.

percona10 mysql> select @@version,@@version_comment;
+--------------------+---------------------------------------------------------------------------------------------------+
| @@version | @@version_comment |
+--------------------+---------------------------------------------------------------------------------------------------+
| 5.6.21-70.1-56-log | Percona XtraDB Cluster (GPL), Release rel70.1, Revision 939, WSREP version 25.8, wsrep_25.8.r4150 |
+--------------------+---------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

percona10 mysql> select @@wsrep_causal_reads,@@wsrep_sync_wait;
+----------------------+-------------------+
| @@wsrep_causal_reads | @@wsrep_sync_wait |
+----------------------+-------------------+
| 1 | 1 |
+----------------------+-------------------+
1 row in set (0.01 sec)

percona10 mysql> show status like 'wsrep_causal_reads';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_causal_reads | 31 |
+--------------------+-------+
1 row in set (0.00 sec)

percona10 mysql> show status like 'wsrep_causal_reads';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_causal_reads | 32 |
+--------------------+-------+
1 row in set (0.00 sec)

percona30 mysql> select @@wsrep_causal_reads,@@wsrep_sync_wait;
+----------------------+-------------------+
| @@wsrep_causal_reads | @@wsrep_sync_wait |
+----------------------+-------------------+
| 1 | 1 |
+----------------------+-------------------+
1 row in set (0.00 sec)

-- block one node for writes:

percona30 mysql> flush tables with read lock;
Query OK, 0 rows affected (0.00 sec)

-- run some write workload on other nodes to trigger Flow Control:

percona20 mysql> show processlist;
+----+-------------+-------------------+--------+---------+------+-----------------------------------------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-------------+-------------------+--------+---------+------+-----------------------------------------------------------------------------+------------------+-----------+---------------+
| 1 | system user | | NULL | Sleep | 162 | NULL | NULL | 0 | 0 |
----
| 12 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
| 13 | app | 192.168.3.1:59781 | sbtest | Query | 24 | wsrep in pre-commit stage | COMMIT | 0 | 0 |
| 14 | app | 192.168.3.1:59780 | sbtest | Query | 24 | wsrep in pre-commit stage | COMMIT | 0 | 0 |
| 15 | app | 192.168.3.1:59782 | sbtest | Query | 24 | wsrep in pre-commit stage | COMMIT | 0 | 0 |
| 16 | app | 192.168.3.1:59777 | sbtest | Query | 24 | wsrep in pre-commit stage | COMMIT | 0 | 0 |
| 17 | app | 192.168.3.1:59776 | sbtest | Query | 24 | wsrep in pre-commit stage | COMMIT | 0 | 0 |
| 18 | app | 192.168.3.1:59774 | sbtest | Query | 24 | wsrep in pre-commit stage | COMMIT | 0 | 0 |
| 19 | app | 192.168.3.1:59775 | sbtest | Query | 24 | wsrep in pre-commit stage | COMMIT | 0 | 0 |
| 20 | app | 192.168.3.1:59778 | sbtest | Query | 24 | wsrep in pre-commit stage | COMMIT | 0 | 0 |
+----+-------------+-------------------+--------+---------+------+-----------------------------------------------------------------------------+------------------+-----------+---------------+
20 rows in set (0.00 sec)

-- show commands work on nodes where wsrep_local_recv_queue is 0:

percona10 mysql> show status like 'ws%flo%';
+------------------------------+-------------+
| Variable_name | Value |
+------------------------------+-------------+
| wsrep_flow_control_paused_ns | 57132383323 |
| wsrep_flow_control_paused | 0.245948 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 1 |
+------------------------------+-------------+
4 rows in set (0.00 sec)

percona10 mysql> show status like 'ws%recv_queue%';
+----------------------------+----------+
| Variable_name | Value |
+----------------------------+----------+
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 3 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0.071429 |
+----------------------------+----------+
4 rows in set (0.00 sec)

percona20 mysql> show status like 'ws%flo%';
+------------------------------+-------------+
| Variable_name | Value |
+------------------------------+-------------+
| wsrep_flow_control_paused_ns | 38937652971 |
| wsrep_flow_control_paused | 0.219784 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 1 |
+------------------------------+-------------+
4 rows in set (0.00 sec)

-- but show status commands are blocked on the node where wsrep_local_recv_queue exceeded fc_limit due to FTWRL:

percona30 mysql> show processlist;
+----+-------------+-----------+------+---------+------+--------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-------------+-----------+------+---------+------+--------------------+------------------+-----------+---------------+
| 1 | system user | | NULL | Sleep | 52 | NULL | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 52 | wsrep aborter idle | NULL | 0 | 0 |
| 3 | system user | | NULL | Sleep | 50 | NULL | NULL | 0 | 0 |
| 4 | system user | | NULL | Sleep | 50 | NULL | NULL | 0 | 0 |
| 5 | system user | | NULL | Sleep | 50 | NULL | NULL | 0 | 0 |
| 6 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
+----+-------------+-----------+------+---------+------+--------------------+------------------+-----------+---------------+
6 rows in set (0.00 sec)

percona30 mysql> show status like 'ws%flo%';
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

percona30 mysql> show status like 'ha%';
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

percona30 mysql> show global status like 'ws%flo%';
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Przemek (pmalkowski) wrote :

I apologize, it seems I misunderstood the status of related former reports:
lp:1126316 and lp:1271177
as if this was fixed already and blocking should no longer occur.

However it seems that this behavior for SHOW STATUS is intended when causal reads are set, and one should disable it per session to make SHOW STATUS to work under Flow Control condition.
I don't understand why blocking of reading the status variables is needed for causal reads, but if this is indeed necessary and intended, let's close this one with duplicate status.

Anyways, the workaround and the only method for any monitoring tools is to do:

percona30 mysql> set session wsrep_sync_wait=0;
Query OK, 0 rows affected (0.00 sec)

percona30 mysql> show status like 'ws%recv_q%';
+----------------------------+-----------+
| Variable_name | Value |
+----------------------------+-----------+
| wsrep_local_recv_queue | 29 |
| wsrep_local_recv_queue_max | 30 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 12.584416 |
+----------------------------+-----------+
4 rows in set (0.00 sec)

Przemek (pmalkowski) wrote :

If this behavior is not going to be changed, please consider this report as a feature request in order to add an additional bitmask option to the http://www.percona.com/doc/percona-xtradb-cluster/5.6/wsrep-system-index.html#wsrep_sync_wait variable to disable causality checks for SHOW STATUS commands. This would allow all the existing monitoring tools to work in Flow Control conditions.

Przemek (pmalkowski) wrote :

One correction, it is not about Flow Control here, my bad.
Any replication lag makes SHOW STATUS to block or be delayed, so any time the wsrep_local_recv_queue > 0.

summary: - SHOW STATUS commands blocked when Flow Control triggers and
- wsrep_sync_wait=1
+ SHOW STATUS blocked when wsrep_local_recv_queue>0 and wsrep_sync_wait=1
Teemu Ollakka (teemu-ollakka) wrote :

Corresponding mysql-wsrep ticket: https://github.com/codership/mysql-wsrep/issues/86

This is similar to the issue (which was fixed) that was occurring with wsrep-causal-reads. Not just with receive queue, but whenever the provider was paused (as with FTWRL for instance), show status blocked.

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1182

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers