server hang on writes after thread-pool turned on

Bug #1613084 reported by Siyuan Fu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
New
Undecided
Unassigned

Bug Description

I can not fully confirm it's related to changing to "thread_handling=pool-of-threads", our Percona servers had run tens of thousands hours and never had similar issue before. In a few hundred hours after turned on the thread pool feature, the issue happened twice on different instances and different data set.

The symptom was, our web service went down due to can not connect to the MySQL server which showed 'too many connections'. (Extra port can not access either, so not able to use console) The server has 32K connections limit but typically the connection number is only a few thousands. lsof showed the process had tons of connections in 'CLOSE_WAIT' state, which means mysqld did not call 'close' on them. In the meanwhile, the load / cpu / diskio on this MySQL server was super low. I gdb into it, seeing a lot of threads were in :
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00000000008d5fd7 in ?? ()
#2 0x00000000008d6532 in thr_lock ()
#3 0x00000000008d6bdb in thr_multi_lock ()
#4 0x00000000007d2c99 in mysql_lock_tables(THD*, TABLE**, unsigned int, unsigned int) ()
...
(I attached the full gdb output)

We had in-house MySQL traffic monitor tool based on in/out packet sniffing, it shows many writes query got stuck all of a sudden, no response packet went out of the server after that.

We finally have to kill the server and restart it.

+-------------------------+------------------------------------------------------+
| Variable_name | Value |
+-------------------------+------------------------------------------------------+
| innodb_version | 5.6.31-77.0 |
| protocol_version | 10 |
| slave_type_conversions | |
| tls_version | TLSv1.1,TLSv1.2 |
| version | 5.6.31-77.0-log |
| version_comment | Percona Server (GPL), Release 77.0, Revision 5c1061c |
| version_compile_machine | x86_64 |
| version_compile_os | debian-linux-gnu |
+-------------------------+------------------------------------------------------+

thread pool variables I set in my.cnf:

thread_handling=pool-of-threads
extra_max_connections = 5
extra_port=3307
thread_pool_stall_limit=30

other variables:
sql_mode = STRICT_ALL_TABLES
key_buffer = 32M
max_allowed_packet = 16M
thread_stack = 256K
thread_cache_size = 64
default_storage_engine = InnoDB
max_connections = 32000
table_open_cache = 10240
innodb_buffer_pool_size = 11G
innodb_buffer_pool_instances = 16
innodb_stats_on_metadata = OFF
innodb_checksum_algorithm = crc32
innodb_flush_log_at_trx_commit = 2
innodb_log_file_size = 2G
innodb_file_per_table
innodb_flush_method = O_DIRECT
innodb_flush_neighbors = 0
innodb_io_capacity_max = 10000
innodb_io_capacity = 10000
max_connect_errors = 1000000
#thread_concurrency = 10
back_log = 8192

Revision history for this message
Siyuan Fu (fusiyuan2010) wrote :
Revision history for this message
Siyuan Fu (fusiyuan2010) wrote :

I found the cause, it was because the running of mysqldump while there were a lot of writes operation.

Basically mysqldump gets read lock at the beginning, and by default the queries by mysqldump go to low priority queue, so the later unlock table query will have no chance to be scheduled if there were too many writes happened and got blocked.

fix suggestion is:
In sql/threadpool_unix.cc:connection_is_high_prio():
change
c->tickets > 0 && thd_is_transaction_active(c->thd)
to
c->tickets > 0 && (thd_is_transaction_active(c->thd) || thd_in_lock_tables(c->thd))

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3515

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.