Deadlock with Aria around translog_flush_wait_for_end()

Bug #619666 reported by Philip Stoev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
New
Medium
Michael Widenius

Bug Description

When executing a DML workload, Aria 5.2 deadlocked with one thread with the following backtrace:

#2 0x08747ec2 in safe_cond_wait (cond=0x9278954, mp=0x92788f4, file=0x88d5d60 "ma_loghandler.c", line=7689) at thr_mutex.c:496
#3 0x0851b058 in translog_flush_wait_for_end (lsn=4303371829) at ma_loghandler.c:7688
#4 0x0851bb43 in translog_flush (lsn=4303371829) at ma_loghandler.c:7967
#5 0x0856d7b2 in flush_log_for_bitmap (page=0xb6079008 "\002", page_no=0, data_ptr=0xb39e708 "\376\376\t\003\f") at ma_bitmap.c:2558
#6 0x0851dd94 in pagecache_fwrite (pagecache=0x9281fa0, filedesc=0xb5f03684, buffer=0xb6079008 "\002", pageno=0, type=PAGECACHE_PLAIN_PAGE, flags=36)
    at ma_pagecache.c:618
#7 0x0852695f in flush_cached_blocks (pagecache=0x9281fa0, file=0xb39ece8, cache=0xab790b70, end=0xab790b78, type=FLUSH_RELEASE, first_errno=0xab790b5c)
    at ma_pagecache.c:4240
#8 0x085271b9 in flush_pagecache_blocks_int (pagecache=0x9281fa0, file=0xb39ece8, type=FLUSH_RELEASE, filter=0, filter_arg=0x0) at ma_pagecache.c:4530
---Type <return> to continue, or q <return> to quit---
#9 0x08527560 in flush_pagecache_blocks_with_filter (pagecache=0x9281fa0, file=0xb39ece8, type=FLUSH_RELEASE, filter=0, filter_arg=0x0)
    at ma_pagecache.c:4652
#10 0x085583db in _ma_once_end_block_record (share=0xb39e708) at ma_blockrec.c:432
#11 0x0857b96f in maria_close (info=0xab55978) at ma_close.c:79
#12 0x0852acc8 in ha_maria::close (this=0xab731e0) at ha_maria.cc:1051
#13 0x082df4e8 in closefrm (table=0xb2aab48, free_share=true) at table.cc:2451
#14 0x082c6d9a in intern_close_table (table=0xb2aab48) at sql_base.cc:786
#15 0x082c6e3b in free_cache_entry (table=0xb2aab48) at sql_base.cc:808
#16 0x0872a39c in my_hash_delete (hash=0x8a6e7a0, record=0xb2aab48 "\310ܺ\n\340\061\267\n\220\221\325\n8k\203\n") at hash.c:562
#17 0x082c7aa8 in close_open_tables (thd=0xb140730) at sql_base.cc:1208
#18 0x082c7e04 in close_thread_tables (thd=0xb140730) at sql_base.cc:1361
#19 0x0827faff in dispatch_command (command=COM_QUERY, thd=0xb140730, packet=0xb142739 "", packet_length=100) at sql_parse.cc:1636
#20 0x0827dee7 in do_command (thd=0xb140730) at sql_parse.cc:898
#21 0x0827aea8 in handle_one_connection (arg=0xb140730) at sql_connect.cc:1154
#22 0x00a08919 in start_thread () from /lib/libpthread.so.0
#23 0x001ede5e in clone () from /lib/libc.so.6

and several threads with:

#2 0x08747ec2 in safe_cond_wait (cond=0x9278954, mp=0x92788f4, file=0x88d5d60 "ma_loghandler.c", line=7689) at thr_mutex.c:496
#3 0x0851b058 in translog_flush_wait_for_end (lsn=4303370971) at ma_loghandler.c:7688
#4 0x0851bb43 in translog_flush (lsn=4303370971) at ma_loghandler.c:7967
#5 0x085b5247 in ma_commit (trn=0xae140c0) at ma_commit.c:61
#6 0x0852f62f in ha_maria::implicit_commit (thd=0xb13e330, new_trn=false) at ha_maria.cc:2647
#7 0x0827fab4 in dispatch_command (command=COM_QUERY, thd=0xb13e330,
    packet=0xb0e0789 "REPLACE INTO smf_log_boards (`id_msg`,`id_member`,`id_board`) VALUES (209,4,147)", packet_length=80) at sql_parse.cc:1625
#8 0x0827dee7 in do_command (thd=0xb13e330) at sql_parse.cc:898
#9 0x0827aea8 in handle_one_connection (arg=0xb13e330) at sql_connect.cc:1154
#10 0x00a08919 in start_thread () from /lib/libpthread.so.0
#11 0x001ede5e in clone () from /lib/libc.so.6

Tags: rqg aria deadlock
Revision history for this message
Philip Stoev (pstoev-askmonty) wrote :
Revision history for this message
Philip Stoev (pstoev-askmonty) wrote :
Download full text (19.0 KiB)

Dump of all threads:

Thread 20 (Thread 0xb78e6b70 (LWP 6939)):
#0 0x0088c424 in __kernel_vsyscall ()
#1 0x00a0efa9 in __lll_lock_wait () from /lib/libpthread.so.0
#2 0x00a0a330 in _L_lock_697 () from /lib/libpthread.so.0
#3 0x00a0a201 in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x08747807 in safe_mutex_lock (mp=0x92819c0, my_flags=0, file=0x88e9128 "ma_checkpoint.c", line=748) at thr_mutex.c:297
#5 0x085ac20c in collect_tables (str=0xb78e622c, checkpoint_start_log_horizon=4303371832) at ma_checkpoint.c:748
#6 0x085ab3f5 in really_execute_checkpoint () at ma_checkpoint.c:195
#7 0x085ab21a in ma_checkpoint_execute (level=CHECKPOINT_MEDIUM, no_wait=1 '\001') at ma_checkpoint.c:132
#8 0x085abec1 in ma_checkpoint_background (arg=0x1e) at ma_checkpoint.c:601
#9 0x00a08919 in start_thread () from /lib/libpthread.so.0
#10 0x001ede5e in clone () from /lib/libc.so.6

Thread 19 (Thread 0xafb0cb70 (LWP 6940)):
#0 0x0088c424 in __kernel_vsyscall ()
#1 0x00a0c5e4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x085feefe in xt_timed_wait_cond (self=0x0, cond=0x9279310, mutex=0x92792f8, milli_sec=500) at thread_xt.cc:1989
#3 0x085eb640 in tabc_fr_wait_for_cache (self=0xa80f728, msecs=500) at tabcache_xt.cc:873
#4 0x085ec24e in tabc_fr_main (self=0xa80f728) at tabcache_xt.cc:1194
#5 0x085ec3b3 in tabc_fr_run_thread (self=0xa80f728) at tabcache_xt.cc:1213
#6 0x085fcdb1 in xt_thread_main (data=0xbffdd52c) at thread_xt.cc:1036
#7 0x00a08919 in start_thread () from /lib/libpthread.so.0
#8 0x001ede5e in clone () from /lib/libc.so.6

Thread 18 (Thread 0xae70ab70 (LWP 6942)):
#0 0x0088c424 in __kernel_vsyscall ()
#1 0x00a10550 in sigwait () from /lib/libpthread.so.0
#2 0x0826efb4 in signal_hand (arg=0x0) at mysqld.cc:2936
#3 0x00a08919 in start_thread () from /lib/libpthread.so.0
#4 0x001ede5e in clone () from /lib/libc.so.6

Thread 17 (Thread 0xae30db70 (LWP 6944)):
#0 0x0088c424 in __kernel_vsyscall ()
#1 0x00a0c5e4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x085feefe in xt_timed_wait_cond (self=0xa8f9468, cond=0xae6b8914, mutex=0xae6b88fc, milli_sec=100) at thread_xt.cc:1989
#3 0x0860b161 in xn_sw_wait_for_xact (self=0xa8f9468, db=0xae697010, hsecs=10) at xaction_xt.cc:2789
#4 0x0860aca7 in xn_sw_main (self=0xa8f9468) at xaction_xt.cc:2688
#5 0x0860ae60 in xn_sw_run_thread (self=0xa8f9468) at xaction_xt.cc:2733
#6 0x085fcdb1 in xt_thread_main (data=0xaf10912c) at thread_xt.cc:1036
#7 0x00a08919 in start_thread () from /lib/libpthread.so.0
#8 0x001ede5e in clone () from /lib/libc.so.6

Thread 16 (Thread 0xad8cbb70 (LWP 6945)):
#0 0x0088c424 in __kernel_vsyscall ()
---Type <return> to continue, or q <return> to quit---
#1 0x00a0c5e4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x085feefe in xt_timed_wait_cond (self=0xa906ab0, cond=0xae6b89e8, mutex=0xae6b89d0, milli_sec=120000) at thread_xt.cc:1989
#3 0x0862b19e in dl_co_wait (self=0xa906ab0, db=0xae697010, secs=120) at datalog_xt.cc:1714
#4 0x0862c56d in dl_co_main (self=0xa906ab0, once_off=0) at datalog_xt.cc:2013
#5 0x0862c6d0 in dl_run_co_thread (self=0xa906ab0) at datalog_xt.cc:2051
...

Changed in maria:
milestone: none → 5.2
importance: Undecided → Medium
Revision history for this message
Philip Stoev (pstoev-askmonty) wrote :

To reproduce:

MALLOC_CHECK_=1 perl runall-new.pl --basedir=/home/philips/bzr/maria-5.2/ --grammar=conf/smf/smf2.yy --mysqld=--init_file=/home/philips/bzr/randgen-smf/conf/smf/smf2.sql --queries=10M --duration=360 --engine=Maria --mysqld=--default-storage-engine=Maria --mysqld=--skip-innodb

The .sql and the .yy file can be found on hasky in /home/pstoev/bug619666

You can disregard the output of the test, and within a minute or two you will observe that all queries have hanged in SHOW PROCESSLIST.

Changed in maria:
assignee: nobody → Michael Widenius (monty)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.