Deadlock between xt_pwrite_fmap xt_flush_fmap xt_lock_fmap_ptr

Bug #316368 reported by Philip Stoev
2
Affects Status Importance Assigned to Milestone
PBXT
Fix Committed
Undecided
Vladimir Kolesnikov

Bug Description

When executing a random workload in 100 threads, PBXT deadlocked with the following significant threads:

Thread #1

#1 0x00000000001988fe in xt_timed_wait_cond (self=0x0, cond=0x36c7270, mutex=0x36c7248, milli_sec=10000) at thread_xt.cc:1896
#2 0x00000000001af8d0 in xt_rwmutex_xlock (xsl=0x36c7238, thd_id=4) at lock_xt.cc:509
#3 0x0000000000152c48 in xt_pwrite_fmap (map=0x3f03ac8, offset=27262239, size=1405, data=0x3638ad7, stat=0x7f14b518eea8, thread=0x7f14b518ad88)
    at filesys_xt.cc:1378
#4 0x0000000000179bee in xres_apply_change (self=0x7f14b518ad88, ot=0x3f2d8f8, record=0x3638ac6, in_sequence=1, check_index=0, rec_buf=0x7f14b518b818)
    at restart_xt.cc:655
#5 0x000000000017bebb in xt_xres_apply_in_order (self=0x7f14b518ad88, ws=0x7f14b518b7a0, log_id=2, log_offset=13797428, record=0x3638ac6)
    at restart_xt.cc:1461
#6 0x00000000001ac194 in xlog_wr_main (self=0x7f14b518ad88) at xactlog_xt.cc:2421
#7 0x00000000001ac39f in xlog_wr_run_thread (self=0x7f14b518ad88) at xactlog_xt.cc:2462
#8 0x000000000019aec0 in thr_main (data=0x7fffc15b4d20) at thread_xt.cc:1004

Thread #2

#1 0x00000000001988fe in xt_timed_wait_cond (self=0x0, cond=0x36c7270, mutex=0x36c7248, milli_sec=10000) at thread_xt.cc:1896
#2 0x00000000001af492 in xt_rwmutex_slock (xsl=0x36c7238, thd_id=5) at lock_xt.cc:552
#3 0x00000000001527a4 in xt_flush_fmap (map=0x41fd688, stat=0x7f14b5193108, thread=0x7f14b518efe8) at filesys_xt.cc:1543
#4 0x000000000018de97 in xt_flush_record_row (ot=0x523f0d8, bytes_flushed=0x7f149e006fc0, have_table_lock=0) at table_xt.cc:2078
#5 0x0000000000178e1c in xres_cp_checkpoint (self=0x7f14b518efe8, db=0x7f14b4d955b8, curr_writer_total=1496, force_checkpoint=1) at restart_xt.cc:2352
#6 0x00000000001791ff in xres_cp_main (self=0x7f14b518efe8) at restart_xt.cc:2483
#7 0x000000000017951f in xres_cp_run_thread (self=0x7f14b518efe8) at restart_xt.cc:2536
#8 0x000000000019aec0 in thr_main (data=0x7fffc15b4d20) at thread_xt.cc:1004

Thread #3

#1 0x00000000001988fe in xt_timed_wait_cond (self=0x0, cond=0x36c7270, mutex=0x36c7248, milli_sec=10000) at thread_xt.cc:1896
#2 0x00000000001af492 in xt_rwmutex_slock (xsl=0x36c7238, thd_id=11) at lock_xt.cc:552
#3 0x000000000015267f in xt_lock_fmap_ptr (map=0x3670808, offset=2057944, size=32772, stat=0x3ae8af8, thread=0x3ae49d8) at filesys_xt.cc:1584
#4 0x000000000018b252 in xt_tab_seq_next (ot=0x3a391e8, buffer=0x3b7b680 "\177ШЪ", eof=0x7f149c7fc038) at table_xt.cc:4838
#5 0x0000000000158e59 in ha_pbxt::rnd_next (this=0x3ae4450, buf=0x3b7b680 "\177ШЪ") at ha_pbxt.cc:3099
#6 0x0000000000808de3 in rr_sequential (info=0x7f149c7fc4b0) at records.cc:390
#7 0x00000000007957d5 in mysql_delete (thd=0x7f14b539d128, table_list=0x3a67860, conds=0x3a68018, order=0x7f14b539f318, limit=7, options=0,
    reset_auto_increment=false) at sql_delete.cc:284
#8 0x00000000006d427b in mysql_execute_command (thd=0x7f14b539d128) at sql_parse.cc:3244
#9 0x00000000006d927f in mysql_parse (thd=0x7f14b539d128,
    inBuf=0x3a67410 "DELETE FROM `table1000_pbxt_int_autoinc` WHERE `enum_latin1` > '20080204082841' LIMIT 8", length=88, found_semicolon=0x7f149c7fdf00)
    at sql_parse.cc:5735
#10 0x00000000006d9e6a in dispatch_command (command=COM_QUERY, thd=0x7f14b539d128,
    packet=0x3a33d19 "DELETE FROM `table1000_pbxt_int_autoinc` WHERE `enum_latin1` > '20080204082841' LIMIT 8", packet_length=88) at sql_parse.cc:1007
#11 0x00000000006db393 in do_command (thd=0x7f14b539d128) at sql_parse.cc:690
#12 0x00000000006c949d in handle_one_connection (arg=0x7f14b539d128) at sql_connect.cc:1145

All other threads were stuck in xt_xn_wait_for_xact

Revision history for this message
Philip Stoev (pstoev) wrote :
Revision history for this message
Vladimir Kolesnikov (vkolesnikov) wrote :

Philip,

how can I reproduce this?

Changed in pbxt:
assignee: nobody → vkolesnikov
status: New → Incomplete
Revision history for this message
Philip Stoev (pstoev) wrote :

I am afraid this only happened once. I will keep trying to nail a reproducible test case.

Philip Stoev (pstoev)
Changed in pbxt:
assignee: vkolesnikov → pstoev
status: Incomplete → In Progress
Revision history for this message
Vladimir Kolesnikov (vkolesnikov) wrote :

Philip,
I believe I fixed this problem. Fix s already in lp:pbxt

Changed in pbxt:
assignee: pstoev → vkolesnikov
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.