Percona Server moved to https://jira.percona.com/projects/PS

server hang on writes after thread-pool turned on

Bug #1613084 reported by Siyuan Fu on 2016-08-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona Server moved to https://jira.percona.com/projects/PS	New	Undecided	Unassigned

Bug Description

I can not fully confirm it's related to changing to "thread_handling=pool-of-threads", our Percona servers had run tens of thousands hours and never had similar issue before. In a few hundred hours after turned on the thread pool feature, the issue happened twice on different instances and different data set.

The symptom was, our web service went down due to can not connect to the MySQL server which showed 'too many connections'. (Extra port can not access either, so not able to use console) The server has 32K connections limit but typically the connection number is only a few thousands. lsof showed the process had tons of connections in 'CLOSE_WAIT' state, which means mysqld did not call 'close' on them. In the meanwhile, the load / cpu / diskio on this MySQL server was super low. I gdb into it, seeing a lot of threads were in :
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00000000008d5fd7 in ?? ()
#2 0x00000000008d6532 in thr_lock ()
#3 0x00000000008d6bdb in thr_multi_lock ()
#4 0x00000000007d2c99 in mysql_lock_tables(THD*, TABLE**, unsigned int, unsigned int) ()
...
(I attached the full gdb output)

We had in-house MySQL traffic monitor tool based on in/out packet sniffing, it shows many writes query got stuck all of a sudden, no response packet went out of the server after that.

We finally have to kill the server and restart it.

thread pool variables I set in my.cnf:

thread_handling=pool-of-threads
extra_max_connections = 5
extra_port=3307
thread_pool_stall_limit=30

other variables:
sql_mode = STRICT_ALL_TABLES
key_buffer = 32M
max_allowed_packet = 16M
thread_stack = 256K
thread_cache_size = 64
default_storage_engine = InnoDB
max_connections = 32000
table_open_cache = 10240
innodb_buffer_pool_size = 11G
innodb_buffer_pool_instances = 16
innodb_stats_on_metadata = OFF
innodb_checksum_algorithm = crc32
innodb_flush_log_at_trx_commit = 2
innodb_log_file_size = 2G
innodb_file_per_table
innodb_flush_method = O_DIRECT
innodb_flush_neighbors = 0
innodb_io_capacity_max = 10000
innodb_io_capacity = 10000
max_connect_errors = 1000000
#thread_concurrency = 10
back_log = 8192

Revision history for this message

Siyuan Fu (fusiyuan2010) wrote on 2016-08-14:

call stacks of all threads Edit (119.9 KiB, text/plain)

Revision history for this message

Siyuan Fu (fusiyuan2010) wrote on 2016-08-23:

I found the cause, it was because the running of mysqldump while there were a lot of writes operation.

Basically mysqldump gets read lock at the beginning, and by default the queries by mysqldump go to low priority queue, so the later unlock table query will have no chance to be scheduled if there were too many writes happened and got blocked.

fix suggestion is:
In sql/threadpool_unix.cc:connection_is_high_prio():
change
c->tickets > 0 && thd_is_transaction_active(c->thd)
to
c->tickets > 0 && (thd_is_transaction_active(c->thd) || thd_in_lock_tables(c->thd))

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3515

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

call stacks of all threads Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.