Percona Server with XtraDB

InnoDB: Error: semaphore wait has lasted > 600 seconds. InnoDB: We intentionally crash the server, because it appears to be hung. 2013-10-07 16:29:16 7fc3ebfff700 InnoDB: Assertion failure in thread <nr> in file srv0srv.cc line 2124 (II)

Reported by Roel Van de Paar on 2013-10-08
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server
Status tracked in 5.6
5.1
Undecided
Unassigned
5.5
Undecided
Unassigned
5.6
High
Laurynas Biveinis

Bug Description

Thread 1 (LWP 15538):
+bt
#0 0x000000363720c69c in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000000687d8e in handle_fatal_signal (sig=6) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/sql/signal_handler.cc:254
#2 <signal handler called>
#3 0x0000003636e328a5 in raise () from /lib64/libc.so.6
#4 0x0000003636e34085 in abort () from /lib64/libc.so.6
#5 0x0000000000b58c9d in srv_error_monitor_thread (arg=<optimized out>) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/srv/srv0srv.cc:2124
#6 0x0000003637207851 in start_thread () from /lib64/libpthread.so.0
#7 0x0000003636ee890d in clone () from /lib64/libc.so.6

===========

Version: '5.6.13-rc60.6-debug-log' socket: '/sde//024793/current1_4/tmp/master.sock' port: 13060 Percona Server with XtraDB (GPL), Release rc60.6, Revision 462-debug
2013-10-07 16:11:56 15516 [Warning] InnoDB: changed page bitmap file './ib_modified_log_1_0.xdb' does not contain a complete run at the end.
2013-10-07 16:13:06 15516 [Warning] InnoDB: changed page bitmap file './ib_modified_log_1_0.xdb' does not contain a complete run at the end.
InnoDB: Warning: a long semaphore wait:
--Thread 140479423301376 has waited at srv0srv.cc line 2589 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x1562560 '&dict_operation_lock'
a writer (thread id 140479530440448) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 720
Last time write locked in file /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/handler/ha_innodb.cc line 10034
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
2013-10-07 16:17:43 7fc3eabfd700 INNODB MONITOR OUTPUT
=====================================
[...]
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
2013-10-07 16:29:16 7fc3ebfff700 InnoDB: Assertion failure in thread 140479454770944 in file srv0srv.cc line 2124

Related branches

lp:~laurynas-biveinis/percona-server/bug1236696
Merged into lp:percona-server at revision 483
Alexey Kopytov: Approve on 2013-10-22
Vlad Lesin: Pending (g2) requested 2013-10-15
Laurynas Biveinis: Pending requested 2013-10-15
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :

<laurynas> unless further debugging changes something, it happens if AHI is enabled and some of thread priorities (which are experimental) are set

Roel Van de Paar (roel11) wrote :
tags: added: ahi
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
Download full text (9.1 KiB)

The interesting thread is 37:

Thread 37 (LWP 15806):
+bt
#0 0x000000363720b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000ac4143 in os_cond_wait (fast_mutex=0x27e6c30, cond=0x27e6c70) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/os/os0sync.cc:214
#2 os_event_wait_low (event=0x27e6c30, reset_sig_count=264) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/os/os0sync.cc:610
#3 0x0000000000b63601 in sync_array_wait_event (arr=0x20e3460, index=0) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/sync/sync0arr.cc:433
#4 0x0000000000b653a6 in rw_lock_s_lock_spin (_lock=0x27e6a38, pass=0, priority_lock=true, high_priority=false, file_name=0xe1eb90 "/bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/btr/btr0sea.cc", line=954) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/sync/sync0rw.cc:537
#5 0x0000000000bc60ae in pfs_rw_lock_s_lock_func (line=954, file_name=0xe1eb90 "/bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/btr/btr0sea.cc", lock=0x27e6a38, pass=<optimized out>) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/include/sync0rw.ic:1090
#6 btr_search_guess_on_hash (index=0x280c2b8, info=0x280c670, tuple=0x7fc2e40595d8, mode=4, latch_mode=2, cursor=0x7fc3f08230b0, has_search_latch=0, mtr=0x7fc3f08228c0) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/btr/btr0sea.cc:954
#7 0x0000000000bb7680 in btr_cur_search_to_nth_level (index=0x280c2b8, level=0, tuple=0x7fc2e40595d8, mode=4, latch_mode=2, cursor=0x7fc3f08230b0, has_search_latch=0, file=0xe04590 "/bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc", line=2370, mtr=0x7fc3f08228c0) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/btr/btr0cur.cc:539
#8 0x0000000000b09b8e in row_ins_clust_index_entry_low (flags=0, mode=2, index=0x280c2b8, n_uniq=2, entry=0x7fc2e40595d8, n_ext=0, thr=0x7fc2e4050370) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:2370
#9 0x0000000000b0e825 in row_ins_clust_index_entry (index=0x280c2b8, entry=0x7fc2e40595d8, thr=0x7fc2e4050370, n_ext=0) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:2909
#10 0x0000000000b0ee07 in row_ins_index_entry (thr=0x7fc2e4050370, entry=0x7fc2e40595d8, index=0x280c2b8) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:3003
#11 row_ins_index_entry_step (thr=0x7fc2e4050370, node=0x7fc2e40d3d20) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:3080
#12 row_ins (thr=0x7fc2e4050370, node=0x7fc2e40d3d20) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:3220
#13 row_ins_step (thr=0x7fc2e4050370) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/row/row0ins.cc:3345
#14 0x0000000000aec6c8 in que_thr_step (thr=0x7fc2e4050370) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/que/que0que.cc:1095
#15 que_run_threads_low (thr=0x7fc2e4050370) at /bzr/5.6_dbg/Percona-Server-5.6.13-rc60.6/storage/innobase/que/que0que.cc:1177
#16 0x0000000000aece50 in que_run_threads (thr=0x7fc2e4050370) at /bzr/5.6_dbg/Percona-Serve...

Read more...

Lock word and high_priority_wait_ex_waiter values are incompatible. This might have have happened if rw_lock_x_lock_wait() succeeds in its last locking attempt, after having the set flag already. As there can be only wait wait_ex waiter, it is safe to reset the flag if locking succeeded there.

    Race conditions between successful locks after having set the
    waiters flag and unlocks exist. If a lock (both priority mutex or
    priority RW lock not as a next-waiter) is successfully locked this
    way, the corresponding waiter flag is set even though other
    waiters on the corresponding event do not necessarily exist.
    Then, on unlock, this higher-priority event with no waiters might
    be signaled while a lower-priority event with waiters will not
    be signaled.

    This is fixable by adjusting priority mutex high_priority_waiters
    and rw lock high_priority_x_waiters and high_priority_s_waiters
    flags to be atomically incremented instead of setting to 1 and
    atomically decremented instead of doing nothing if locking
    succeeds after setting the flag. high_priority_wait_ex_waiter
    flag can be maintained as 0/1 as described in the previous
    comment.

    The already-existing mutex::waiters and lock::waiters flag need
    not to be adjusted this way as there are no lower-priority events
    below them. Although such adjustment would result in a bit less
    spurious wakeup events under load and might be done as needed in
    the future.

summary: InnoDB: Error: semaphore wait has lasted > 600 seconds. InnoDB: We
intentionally crash the server, because it appears to be hung.
2013-10-07 16:29:16 7fc3ebfff700 InnoDB: Assertion failure in thread
- <nr> in file srv0srv.cc line 2124
+ <nr> in file srv0srv.cc line 2124 (II)
Roel Van de Paar (roel11) wrote :

Re-testing with patch in place (on revid 5.6.13-rc60.6-462-debug) shows bug 1239062 (III)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers