Threads stuck in permanent statistics state with innodb_thread_concurrency>0 and RW workload on two nodes

Bug #1722560 reported by Przemek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Committed
Undecided
Unassigned
5.7
Fix Committed
Undecided
Unassigned

Bug Description

A PXC node gets completely locked when I execute OLTP workload against two nodes, with concurrency higher then innodb_thread_concurrency.
Simple SELECT queries enter a state where they NEVER finish while the server is not doing anything and there are no blocking queries.

How to reproduce:

* make a 3 node cluster with recent PXC 5.7, and limit the innodb_thread_concurrency to 4 or 8:

 mysql> select @@version,@@version_comment,@@innodb_thread_concurrency\G
*************************** 1. row ***************************
                  @@version: 5.7.19-17-57-log
          @@version_comment: Percona XtraDB Cluster (GPL), Release rel17, Revision c10027a, WSREP version 29.22, wsrep_29.22
@@innodb_thread_concurrency: 4
1 row in set (0.00 sec)

* prepare sysbench table (I tried both sysbench 0.5 and 1.0):
~# sysbench /usr/share/sysbench/oltp_read_write.lua --auto_inc=off --mysql_storage_engine=innodb --table-size=20000 --tables=1 --mysql-host=172.17.0.1 --mysql-db=test --mysql-user=root --mysql-password=*** --threads=50 --max-requests=50000 --report-interval=1 --db-driver=mysql --tx-rate=50 prepare

* run the test against two nodes in the same time:
1)
sysbench /usr/share/sysbench/oltp_read_write.lua --auto_inc=off --mysql_storage_engine=innodb --table-size=20000 --tables=1 --mysql-host=172.17.0.1 --mysql-db=test --mysql-user=root --mysql-password=cmon --threads=50 --report-interval=1 --db-driver=mysql --tx-rate=50 --time=100 run

2)
sysbench /usr/share/sysbench/oltp_read_write.lua --auto_inc=off --mysql_storage_engine=innodb --table-size=20000 --tables=1 --mysql-host=172.17.0.3 --mysql-db=test --mysql-user=root --mysql-password=cmon --threads=50 --report-interval=1 --db-driver=mysql --tx-rate=50 --time=100 run

As a result, one test finishes, but other is stuck. One node looks like this:
mysql> show status like 'ws%flow%';
+----------------------------------+--------------+
| Variable_name | Value |
+----------------------------------+--------------+
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_flow_control_interval | [ 141, 141 ] |
| wsrep_flow_control_interval_low | 141 |
| wsrep_flow_control_interval_high | 141 |
| wsrep_flow_control_status | OFF |
+----------------------------------+--------------+
8 rows in set (0.01 sec)

mysql> show status like 'Threads%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Threads_cached | 7 |
| Threads_connected | 44 |
| Threads_created | 51 |
| Threads_running | 44 |
+-------------------+-------+
4 rows in set (0.00 sec)

mysql> select id,db,command,time,state,info from information_schema.processlist where id<>6 order by id;
+-----+------+---------+------+--------------------------------------+------------------------------------------------+
| id | db | command | time | state | info |
+-----+------+---------+------+--------------------------------------+------------------------------------------------+
| 1 | NULL | Sleep | 404 | wsrep: aborter idle | NULL |
| 2 | NULL | Sleep | 198 | wsrep: committed write set (1332479) | NULL |
| 5 | NULL | Sleep | 198 | wsrep: committed write set (1332480) | NULL |
| 57 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 58 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 59 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 60 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 61 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 62 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 63 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 64 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 65 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 66 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 68 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 69 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 71 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 72 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 73 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 74 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 75 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 76 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 77 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 78 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 79 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 80 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 81 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 82 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 83 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 84 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 85 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 86 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 88 | test | Execute | 206 | Sending data | SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ? |
| 89 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 91 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 93 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 94 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 95 | test | Execute | 206 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 96 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 97 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 98 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 99 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 100 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 101 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 103 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 105 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
| 106 | test | Execute | 205 | statistics | SELECT c FROM sbtest1 WHERE id=? |
+-----+------+---------+------+--------------------------------------+------------------------------------------------+
46 rows in set (0.00 sec)

mysql> pager grep 'queries inside InnoDB'
PAGER set to 'grep 'queries inside InnoDB''
mysql> show engine innodb status\G
6 queries inside InnoDB, 44 queries in queue
1 row in set (0.00 sec)

mysql> select * from information_schema.INNODB_TRX where trx_mysql_thread_id=88\G
*************************** 1. row ***************************
                    trx_id: 421623406094864
                 trx_state: RUNNING
               trx_started: 2017-10-10 15:10:55
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 0
       trx_mysql_thread_id: 88
                 trx_query: SELECT c FROM sbtest1 WHERE id BETWEEN ? AND ?
       trx_operation_state: sleeping before entering InnoDB
         trx_tables_in_use: 1
         trx_tables_locked: 0
          trx_lock_structs: 0
     trx_lock_memory_bytes: 1136
           trx_rows_locked: 0
         trx_rows_modified: 0
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)

mysql> pager grep 421623406094864
PAGER set to 'grep 421623406094864'
mysql> show engine innodb status\G
1 row in set (0.00 sec)

Threads don't respond to kill, and the only way to stop the node is kill -9 on pid.

I was NOT able to reproduce the same issue on standalone MySQL nor on semi-sync replication pair.

Tags: i207570
Changed in percona-xtradb-cluster:
status: New → Confirmed
Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

commit 74bf8771d7dd639ee2100ffa5782263f5a4ab2aa
Author: Krunal Bauskar <email address hidden>
Date: Wed Oct 11 18:46:47 2017 +0530

    - PXC#878: Threads stuck in permanent statistics state with innodb_thread_concurrency>0

      * Upstream merge accidentally got a wrong code that looks like a copy-paste
        mistake.

      * When the threads leaves InnoDB it should mark exit from innodb server
        concurrency. It was doing exactly opposite

Changed in percona-xtradb-cluster:
status: Confirmed → Fix Committed
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-878

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.