Hanging slave status and async replication threads
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MySQL patches by Codership |
New
|
Undecided
|
Unassigned | ||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
New
|
Undecided
|
Unassigned |
Bug Description
I have a two node cluster running the following version:
Server version: 5.6.15-rel63.0-log 5.6.15-, Percona XtraDB Cluster - binary (GPL), Release: 25.5, Revision 759 wsrep_25.5.r4061, wsrep_25.5.r4061
One of the node acts as an async slave of the MySQL 5.5 master. Today, several threads that were trying to execute "SHOW SLAVE STATUS" hung in the init state, while the slave IO thread and the slave SQL thread are also hung. The node is not processing any replication event.
Below is the output of "show processlist"
mysql> show processlist;
+------
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------
| 1 | system user | | NULL | Sleep | 285373 | NULL | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 285373 | wsrep aborter idle | NULL | 0 | 0 |
| 3 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 4 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 5 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 6 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 7 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 8 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 9 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 10 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 11 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 12 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 13 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 14 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 15 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 16 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 17 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 18 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 19 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 20 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 21 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 22 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 23 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 24 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 25 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 26 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 27 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 28 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 29 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 30 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 31 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 32 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 33 | system user | | NULL | Sleep | 285098 | NULL | NULL | 0 | 0 |
| 40 | system user | | NULL | Connect | 284967 | Queueing master event to the relay log | NULL | 0 | 0 |
| 41 | system user | | NULL | Connect | 56761 | wsrep in pre-commit stage | NULL | 0 | 0 |
| 43 | root | localhost | NULL | Query | 53891 | init | SHOW SLAVE STATUS | 0 | 0 |
| 26339 | cacti | xx.xx.xx.xx:52597 | NULL | Query | 53898 | init | SHOW SLAVE STATUS | 0 | 0 |
| 76345 | cacti | xx.xx.xx.xx:59872 | NULL | Query | 28542 | init | SHOW SLAVE STATUS | 0 | 0 |
| 76347 | cacti | xx.xx.xx.xx:59874 | NULL | Query | 28542 | init | SHOW SLAVE STATUS | 0 | 0 |
| 76636 | cacti | xx.xx.xx.xx:59961 | NULL | Query | 28397 | init | SHOW SLAVE STATUS | 0 | 0 |
| 114011 | root | localhost | db | Query | 9446 | init | show slave status | 0 | 0 |
| 123780 | cacti | xx.xx.xx.xx:40094 | NULL | Query | 4500 | init | SHOW SLAVE STATUS | 0 | 0 |
| 123781 | cacti | xx.xx.xx.xx:40095 | NULL | Query | 4500 | init | SHOW SLAVE STATUS | 0 | 0 |
| 128886 | root | localhost | db | Killed | 1884 | init | show slave status | 0 | 0 |
| 128983 | root | localhost | db | Query | 0 | init | show processlist | 0 | 0 |
+------
45 rows in set (0.00 sec)
The status variables show that there is no DML getting executed since the slave status hangs.
The relevant wsrep variables are as follows:
wsrep_cluster_
wsrep_provider = /usr/local/
wsrep_slave_threads = 32
wsrep_cluster_name = xyz_cluster
wsrep_sst_method = xtrabackup-v2
wsrep_node_name = node1
wsrep_node_address = 192.168.0.10
wsrep_provider_
wsrep_max_ws_rows = 1048576
wsrep_max_ws_size = 4294901759
tags: | added: i38783 |
Attached is the output of "thread apply all bt"