Threads stuck in permanent statistics state with innodb_thread_concurrency>0 and RW workload on two nodes

Bug #1722560 reported by Przemek on 2017-10-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Committed
Undecided
Unassigned
5.7
Fix Committed
Undecided
Unassigned

Bug Description

A PXC node gets completely locked when I execute OLTP workload against two nodes, with concurrency higher then innodb_thread_concurrency.
Simple SELECT queries enter a state where they NEVER finish while the server is not doing anything and there are no blocking queries.

How to reproduce:

* make a 3 node cluster with recent PXC 5.7, and limit the innodb_thread_concurrency to 4 or 8:

 mysql> select @@version,@@version_comment,@@innodb_thread_concurrency\G
*************************** 1. row ***************************
                  @@version: 5.7.19-17-57-log
          @@version_comment: Percona XtraDB Cluster (GPL), Release rel17, Revision c10027a, WSREP version 29.22, wsrep_29.22
@@innodb_thread_concurrency: 4
1 row in set (0.00 sec)

* prepare sysbench table (I tried both sysbench 0.5 and 1.0):
~# sysbench /usr/share/sysbench/oltp_read_write.lua --auto_inc=off --mysql_storage_engine=innodb --table-size=20000 --tables=1 --mysql-host=172.17.0.1 --mysql-db=test --mysql-user=root --mysql-password=*** --threads=50 --max-requests=50000 --report-interval=1 --db-driver=mysql --tx-rate=50 prepare

* run the test against two nodes in the same time:
1)
sysbench /usr/share/sysbench/oltp_read_write.lua --auto_inc=off --mysql_storage_engine=innodb --table-size=20000 --tables=1 --mysql-host=172.17.0.1 --mysql-db=test --mysql-user=root --mysql-password=cmon --threads=50 --report-interval=1 --db-driver=mysql --tx-rate=50 --time=100 run

2)
sysbench /usr/share/sysbench/oltp_read_write.lua --auto_inc=off --mysql_storage_engine=innodb --table-size=20000 --tables=1 --mysql-host=172.17.0.3 --mysql-db=test --mysql-user=root --mysql-password=cmon --threads=50 --report-interval=1 --db-driver=mysql --tx-rate=50 --time=100 run

As a result, one test finishes, but other is stuck. One node looks like this:
mysql> show status like 'ws%flow%';
+----------------------------------+--------------+
| Variable_name | Value |
+----------------------------------+--------------+
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_flow_control_interval | [ 141, 141 ] |
| wsrep_flow_control_interval_low | 141 |
| wsrep_flow_control_interval_high | 141 |
| wsrep_flow_control_status | OFF |
+----------------------------------+--------------+
8 rows in set (0.01 sec)

mysql> show status like 'Threads%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Threads_cached | 7 |
| Threads_connected | 44 |
| Threads_created | 51 |
| Threads_running | 44 |
+-------------------+-------+
4 rows in set (0.00 sec)

mysql> select id,db,command,time,state,info from information_schema.processlist where id<>6 order by id;
+-----+------+---------+------+--------------------------------------+------------------------------------------------+
| id | db | command | time | state | info |
+-----+------+---------+------+--------------------------------------+------------------------------------------------+
| 1 | NULL | Sleep | 404 | wsrep: aborter idle | NULL |
| 2 | NULL | Sleep | 198 | wsrep: committed write set (1332479) | NULL |
| 5 | NULL | Sleep | 198 | wsrep: committed write set (1332480) | NULL |
| 57 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 58 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 59 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 60 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 61 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 62 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 63 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 64 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 65 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 66 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 68 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 69 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 71 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 72 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 73 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 74 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 75 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 76 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 77 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 78 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 79 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 80 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 81 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 82 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 83 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 84 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 85 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 86 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 88 | test | Execute | 206 | Sending data | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
| 89 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 91 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 93 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 94 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 95 | test | Execute | 206 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 96 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 97 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 98 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 99 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 100 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 101 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 103 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 105 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 106 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
+-----+------+---------+------+--------------------------------------+------------------------------------------------+
46 rows in set (0.00 sec)

mysql> pager grep 'queries inside InnoDB'
PAGER set to 'grep 'queries inside InnoDB''
mysql> show engine innodb status\G
6 queries inside InnoDB, 44 queries in queue
1 row in set (0.00 sec)

mysql> select * from information_schema.INNODB_TRX where trx_mysql_thread_id=88\G
*************************** 1. row ***************************
                    trx_id: 421623406094864
                 trx_state: RUNNING
               trx_started: 2017-10-10 15:10:55
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 0
       trx_mysql_thread_id: 88
                 trx_query: SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ?
       trx_operation_state: sleeping before entering InnoDB
         trx_tables_in_use: 1
         trx_tables_locked: 0
          trx_lock_structs: 0
     trx_lock_memory_bytes: 1136
           trx_rows_locked: 0
         trx_rows_modified: 0
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)

mysql> pager grep 421623406094864
PAGER set to 'grep 421623406094864'
mysql> show engine innodb status\G
1 row in set (0.00 sec)

Threads don't respond to kill, and the only way to stop the node is kill -9 on pid.

I was NOT able to reproduce the same issue on standalone MySQL nor on semi-sync replication pair.

Changed in percona-xtradb-cluster:
status: New → Confirmed
Krunal Bauskar (krunal-bauskar) wrote :

commit 74bf8771d7dd639ee2100ffa5782263f5a4ab2aa
Author: Krunal Bauskar <email address hidden>
Date: Wed Oct 11 18:46:47 2017 +0530

    - PXC#878: Threads stuck in permanent statistics state with innodb_thread_concurrency>0

      * Upstream merge accidentally got a wrong code that looks like a copy-paste
        mistake.

      * When the threads leaves InnoDB it should mark exit from innodb server
        concurrency. It was doing exactly opposite

Changed in percona-xtradb-cluster:
status: Confirmed → Fix Committed

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-878

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers