Long semaphore wait crashes PS 5.7

Bug #1701390 reported by Chehai Wu
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
New
Undecided
Unassigned

Bug Description

Occasionally a thread doing purge operation for undo log can stuck in deadlock and then blocks other threads because the thread holds the data dictionary lock. After 600 seconds, MySQL intentionally crashes.

I managed to capture "show engine innodb status" output and stack trace using GDB (see below). I am not very familiar with MySQL source code. Here is my rudimentary analysis (which could be wrong):

Thread 140039501625088 is stuck. The thread is trying to acquire an x-lock on a buffer pool page whose s-lock and sx-lock are already acquired by it. Lockword: ffffffffefffffff means "S locked, with a waiting writer which has SX lock. The number of S lock is 1.". Thread 140039501625088 holds the data dictionary lock while being stuck, so other threads are blocked.

s-lock is acquired when the thread is searching undo record in a secondary index.

sx-lock is acquired when the thread is either searching undo record or doing recursive merge for B-Tree nodes.

x-lock is being acquired when the thread is doing recursive merge of B-Tree nodes.

I thought the thread would upgrade the s-lock to a sx-lock then to a x-lock since the thread is the only thread has the s-lock.

mysql> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2017-06-27 20:27:15 0x7f5dad5ef700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 18 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 125618 srv_active, 0 srv_shutdown, 1529371 srv_idle
srv_master_thread log flush and writes: 1654882
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 455165
--Thread 140039518410496 has waited at srv0srv.cc line 2320 for 624.00 seconds the semaphore:
X-lock on RW-latch at 0x7f5dae7ce398 created in file dict0dict.cc line 1198
a writer (thread id 140040582629120) has reserved it in mode wait exclusive
number of readers 2, waiters flag 1, lock_word: fffffffffffffffe
Last time read locked in file row0purge.cc line 865
Last time write locked in file /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/handler/ha_innodb.cc line 13943
--Thread 140039501625088 has waited at btr0cur.cc line 1911 for 625.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7f5da3efe0d0 created in file buf0buf.cc line 1456
a writer (thread id 140039501625088) has reserved it in mode wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffefffffff
Last time read locked in file row0row.cc line 1073
Last time write locked in file /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/include/mtr0mtr.ic line 117
--Thread 140039585486592 has waited at buf0flu.cc line 1224 for 622.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f5da3ff4790 created in file buf0buf.cc line 1456
a writer (thread id 140039501625088) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/fsp/fsp0fsp.cc line 167
--Thread 140039484839680 has waited at row0row.cc line 1073 for 625.00 seconds the semaphore:
S-lock on RW-latch at 0x7f5da4002950 created in file buf0buf.cc line 1456
a writer (thread id 140039501625088) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: ffffffffdfffffff
Last time read locked in file row0row.cc line 1073
Last time write locked in file /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc line 1911
--Thread 140039455500032 has waited at dict0stats.cc line 2375 for 616.00 seconds the semaphore:
X-lock on RW-latch at 0x7f5dae7ce398 created in file dict0dict.cc line 1198
a writer (thread id 140040582629120) has reserved it in mode wait exclusive
number of readers 2, waiters flag 1, lock_word: fffffffffffffffe
Last time read locked in file row0purge.cc line 865
Last time write locked in file /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/handler/ha_innodb.cc line 13943
--Thread 140040582629120 has waited at dict0stats.cc line 3623 for 625.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7f5dae7ce398 created in file dict0dict.cc line 1198
a writer (thread id 140040582629120) has reserved it in mode wait exclusive
number of readers 2, waiters flag 1, lock_word: fffffffffffffffe
Last time read locked in file row0purge.cc line 865
Last time write locked in file /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/handler/ha_innodb.cc line 13943
OS WAIT ARRAY INFO: signal count 456490
RW-shared spins 0, rounds 861311, OS waits 415991
RW-excl spins 0, rounds 712712, OS waits 8934
RW-sx spins 77047, rounds 893910, OS waits 3275
Spin rounds per wait: 861311.00 RW-shared, 712712.00 RW-excl, 11.60 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 31438233
Purge done for trx's n:o < 31438233 undo n:o < 0 state: running
History list length 9
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421515275276992, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421515275275800, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421515275274608, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
435768 OS file reads, 2565558 OS file writes, 1647024 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 62, seg size 64, 1552 merges
merged operations:
 insert 396, delete mark 10642, delete 594
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 138389, node heap has 0 buffer(s)
Hash table size 138389, node heap has 0 buffer(s)
Hash table size 138389, node heap has 0 buffer(s)
Hash table size 138389, node heap has 0 buffer(s)
Hash table size 138389, node heap has 0 buffer(s)
Hash table size 138389, node heap has 0 buffer(s)
Hash table size 138389, node heap has 0 buffer(s)
Hash table size 138389, node heap has 0 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 31798899922
Log flushed up to 31798899922
Pages flushed up to 31792654680
Last checkpoint at 31792082305
Max checkpoint age 651585393
Checkpoint age target 631223350
Modified age 6245242
Checkpoint age 6817617
0 pending log flushes, 0 pending chkp writes
731006 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 558891008
Dictionary memory allocated 516115751
Internal hash tables (constant factor + variable factor)
    Adaptive hash index 8890560 (8856896 + 33664)
    Page hash 553912 (buffer pool 0 only)
    Dictionary cache 518329975 (2214224 + 516115751)
    File system 18523272 (812272 + 17711000)
    Lock system 1332584 (1329176 + 3408)
    Recovery system 0 (0 + 0)
Buffer pool size 32764
Buffer pool size, bytes 0
Free buffers 1024
Database pages 31740
Old database pages 11696
Modified db pages 1682
Pending reads 0
Pending writes: LRU 0, flush list 2, single page 0
Pages made young 106936, not young 10668248
0.00 youngs/s, 0.00 non-youngs/s
Pages read 431180, created 616607, written 1291654
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 31740, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
0 RW transactions active inside InnoDB
1 tablespace extents now reserved for B-tree split operations
Process ID=19203, Main thread ID=140039518410496, state: enforcing dict cache limit
Number of rows inserted 14188343, updated 531664, deleted 30275, read 63724948
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

1 row in set (0.00 sec)

GDB Stacktrace for Thread 140039501625088:

#0 0x00007f5dbcd196d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000fa802b in wait (this=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/include/os0event.h:156
#2 os_event::wait_low (this=0x7f5da3efe168, reset_sig_count=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/os/os0event.cc:131
#3 0x0000000000fa837a in os_event_wait_low (event=<optimized out>, reset_sig_count=<optimized out>)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/os/os0event.cc:328
#4 0x00000000010658f1 in sync_array_wait_event (arr=arr@entry=0x7f5db03d0938, cell=@0x7f5d7cbf7970: 0x7f5db131c058)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/sync/sync0arr.cc:475
#5 0x000000000106805d in rw_lock_x_lock_wait_func (line=1911, file_name=0x15e15e0 "/mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc",
    threshold=-268435456, lock=0x7f5da3efe0d0) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/sync/sync0rw.cc:504
#6 rw_lock_x_lock_low (line=1911, file_name=0x15e15e0 "/mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc", pass=0, lock=0x7f5da3efe0d0)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/sync/sync0rw.cc:587
#7 rw_lock_x_lock_func (lock=0x7f5da3efe0d0, pass=pass@entry=0, file_name=file_name@entry=0x15e15e0 "/mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc",
    line=line@entry=1911) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/sync/sync0rw.cc:723
#8 0x00000000010f041d in pfs_rw_lock_x_lock_func (lock=lock@entry=0x7f5da3efe0d0,
    file_name=file_name@entry=0x15e15e0 "/mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc", line=line@entry=1911, pass=0)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/include/sync0rw.ic:711
#9 0x00000000010f72d1 in buf_page_get_gen (page_id=..., page_size=..., rw_latch=rw_latch@entry=2, guess=<optimized out>, guess@entry=0x0, mode=mode@entry=10,
    file=file@entry=0x15e15e0 "/mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc", line=1911, mtr=0x7f5d7cbf9f40,
    dirty_with_no_latch=dirty_with_no_latch@entry=false) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/buf/buf0buf.cc:4750
#10 0x00000000010cda9d in btr_block_get_func (page_id=..., page_size=..., mode=mode@entry=2, line=line@entry=1911, mtr=<optimized out>,
    file=0x15e15e0 "/mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc")
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/include/btr0btr.ic:63
#11 0x00000000010d8ae3 in btr_cur_search_to_nth_level (index=index@entry=0x7f5daf7bd988, level=level@entry=2, tuple=0x7f5d56da8388, mode=mode@entry=PAGE_CUR_LE, latch_mode=latch_mode@entry=34, cursor=cursor@entry=0x7f5d7cbf8ed0,
    has_search_latch=has_search_latch@entry=0, file=file@entry=0x15e05d8 "/mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0btr.cc", line=line@entry=855,
    mtr=mtr@entry=0x7f5d7cbf9f40) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc:1911
#12 0x00000000010c0619 in btr_page_get_father_node_ptr_func (offsets=0x0, heap=0x7f5d35a41918, cursor=cursor@entry=0x7f5d7cbf8ed0, latch_mode=latch_mode@entry=34, line=line@entry=855, mtr=mtr@entry=0x7f5d7cbf9f40,
    file=0x15e05d8 "/mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0btr.cc")
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0btr.cc:788
#13 0x00000000010c0b35 in btr_page_get_father_block (offsets=offsets@entry=0x0, heap=<optimized out>, index=<optimized out>, block=block@entry=0x7f5da40028c0, mtr=mtr@entry=0x7f5d7cbf9f40, cursor=cursor@entry=0x7f5d7cbf8ed0)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0btr.cc:855
#14 0x00000000010c7a5b in btr_compress (cursor=cursor@entry=0x7f5d7cbf9760, adjust=adjust@entry=0, mtr=mtr@entry=0x7f5d7cbf9f40)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0btr.cc:3413
#15 0x00000000010d5a6c in btr_cur_compress_if_useful (cursor=cursor@entry=0x7f5d7cbf9760, adjust=adjust@entry=0, mtr=mtr@entry=0x7f5d7cbf9f40)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc:5138
#16 0x00000000010db9e9 in btr_cur_pessimistic_delete (err=err@entry=0x7f5d7cbf96c0, has_reserved_extents=has_reserved_extents@entry=1, cursor=cursor@entry=0x7f5d7cbf9760, flags=flags@entry=16, rollback=rollback@entry=false,
    mtr=mtr@entry=0x7f5d7cbf9f40) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc:5452
#17 0x00000000010c847e in btr_compress (cursor=cursor@entry=0x7f5d7cbf9ce0, adjust=adjust@entry=0, mtr=mtr@entry=0x7f5d7cbf9f40)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0btr.cc:3737
#18 0x00000000010d5a6c in btr_cur_compress_if_useful (cursor=cursor@entry=0x7f5d7cbf9ce0, adjust=adjust@entry=0, mtr=mtr@entry=0x7f5d7cbf9f40)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc:5138
---Type <return> to continue, or q <return> to quit---
#19 0x00000000010db9e9 in btr_cur_pessimistic_delete (err=err@entry=0x7f5d7cbf9ca0, has_reserved_extents=has_reserved_extents@entry=0, cursor=cursor@entry=0x7f5d7cbf9ce0, flags=flags@entry=0, rollback=rollback@entry=false,
    mtr=mtr@entry=0x7f5d7cbf9f40) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/btr/btr0cur.cc:5452
#20 0x000000000101aac9 in row_purge_remove_sec_if_poss_tree (node=node@entry=0x7f5daf797b50, index=index@entry=0x7f5daf7bd988, entry=entry@entry=0x7f5d7ac2e688)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/row/row0purge.cc:379
#21 0x000000000101c158 in row_purge_remove_sec_if_poss (entry=0x7f5d7ac2e688, index=0x7f5daf7bd988, node=0x7f5daf797b50)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/row/row0purge.cc:596
#22 row_purge_upd_exist_or_extern_func (node=node@entry=0x7f5daf797b50, undo_rec=undo_rec@entry=0x7f5d39978788 "\034y\f\004\v")
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/row/row0purge.cc:713
#23 0x000000000101c869 in row_purge_record_func (updated_extern=<optimized out>, undo_rec=0x7f5d39978788 "\034y\f\004\v", node=<optimized out>)
    at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/row/row0purge.cc:1005
#24 row_purge (thr=0x7f5d7cbfab00, undo_rec=<optimized out>, node=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/row/row0purge.cc:1049
#25 row_purge_step (thr=thr@entry=0x7f5daf797a88) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/row/row0purge.cc:1128
#26 0x0000000000fca11c in que_thr_step (thr=0x6) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/que/que0que.cc:1066
#27 que_run_threads_low (thr=0x6) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/que/que0que.cc:1128
#28 que_run_threads (thr=thr@entry=0x7f5daf797a88) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/que/que0que.cc:1168
#29 0x0000000001053c11 in srv_task_execute () at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/srv/srv0srv.cc:2822
#30 srv_worker_thread (arg=<optimized out>) at /mnt/workspace/percona-server-5.7-binaries-release/label_exp/centos6-64_static/percona-server-5.7.18-15/storage/innobase/srv/srv0srv.cc:2884
#31 0x00007f5dbcd15dc5 in start_thread () from /lib64/libpthread.so.0
#32 0x00007f5dbaf66ced in clone () from /lib64/libc.so.6

Chehai Wu (wuchehai)
description: updated
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3714

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.