Comment 18 for bug 1007268

Revision history for this message
Chehai Wu (wuchehai) wrote :

We ran into a similar issue recently. Our Percona Server 5.5.24-rel26.0 server with innodb_lazy_drop_table disabled just hung forever.

There was a single-thread deadlock in the error log:

InnoDB: Warning: a long semaphore wait:
--Thread 1258592576 has waited at buf0buf.c line 2529 for 241.00 seconds the semaphore:
S-lock on RW-latch at 0x10296528 '&buf_pool->page_hash_latch'
a writer (thread id 1258592576) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file buf0flu.c line 1481
Last time write locked in file /home/jenkins/workspace/percona-server-5.5-binaries/label_exp/centos5-64 /Percona-Server-5.5.23-rel25.3/storage/innobase/buf/buf0lru.c line 629

I ran pstack against the running Percona instance. Thread 1258592576 (0x4b049940) had the following stack trace:

Thread 23 (Thread 0x4b049940 (LWP 21020)):
#0 0x0000003af320aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000008f4527 in os_event_wait_low ()
#2 0x0000000000825352 in sync_array_wait_event ()
#3 0x0000000000825737 in rw_lock_s_lock_spin ()
#4 0x000000000086ca82 in buf_page_get_gen ()
#5 0x0000000000863983 in btr_search_drop_page_hash_when_freed ()
#6 0x000000000087bc3c in buf_LRU_flush_or_remove_pages ()
#7 0x00000000008b138b in fil_delete_tablespace ()
#8 0x00000000008b153d in fil_discard_tablespace ()
#9 0x0000000000808323 in row_truncate_table_for_mysql ()
#10 0x00000000007edc56 in ha_innobase::truncate() ()
#11 0x00000000007b971b in Truncate_statement::handler_truncate(THD*, TABLE_LIST*, bool) ()
#12 0x00000000007b9edd in Truncate_statement::truncate_table(THD*, TABLE_LIST*) ()
#13 0x00000000007b9f7e in Truncate_statement::execute(THD*) ()
#14 0x000000000057eab8 in mysql_execute_command(THD*) ()
#15 0x0000000000580d13 in mysql_parse(THD*, char*, unsigned int, Parser_state*) ()
#16 0x00000000007554db in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned int) ()
#17 0x0000000000518192 in apply_event_and_update_pos(Log_event*, THD*, Relay_log_info*) ()
#18 0x00000000005216b6 in exec_relay_log_event(THD*, Relay_log_info*) ()
#19 0x00000000005229c9 in handle_slave_sql ()
#20 0x0000003af320673d in start_thread () from /lib64/libpthread.so.0
#21 0x0000003af2ad44bd in clone () from /lib64/libc.so.6

After reading Percona Server source, I can see single-thread deadlock is possible if innodb_lazy_drop_table is disabled.

- buf_LRU_flush_or_remove_pages() calls buf_LRU_remove_all_pages() (the call stack of buf_LRU_remove_all_pages(), I believe, is optimized away).
- buf_LRU_remove_all_pages() executes rw_lock_x_lock(&buf_pool->page_hash_latch), which locks page_hash_latch in exclusive mode.
- buf_LRU_remove_all_pages() calls btr_search_drop_page_hash_when_freed().
- btr_search_drop_page_hash_when_freed() calls buf_page_get_gen().
- buf_page_get_gen() executes rw_lock_s_lock(&buf_pool->page_hash_latch), which locks page_hash_latch in shared mode.
- Deadlock!

If innodb_lazy_drop_table is enabled, the execution path is very different and it is unlikely to have a deadlock.