FTWRL - BF aborted transaction deadlock

Bug #1083162 reported by Teemu Ollakka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Released
High
Teemu Ollakka

Bug Description

Running FTWRL in presence of BF aborted transactions caused the following deadlock situation:

Thread 5 (Thread 0x7f9308655700 (LWP 24704)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00000000008dd0a1 in os_cond_wait (mutex=0x2aaeff0, cond=0x2aaf028)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/storage/innobase/os/os0sync.c:207
#2 os_event_wait_low (event=0x2aaeff0, reset_sig_count=<optimized out>)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/storage/innobase/os/os0sync.c:609
#3 0x0000000000815103 in srv_suspend_mysql_thread (thr=0x7f92d8047c18)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/storage/innobase/srv/srv0srv.c:1721
#4 0x00000000007fa8e8 in row_mysql_handle_errors (new_err=0x7f9308651350, trx=0x7f92d80102c8, thr=<optimized out>,
    savept=<optimized out>) at /home/teemu/codership/galera/bzr/codership-mysql/5.5/storage/innobase/row/row0mysql.c:598
#5 0x0000000000808cfc in row_search_for_mysql (buf=0x7f92d804fcd0 "\341\004", mode=2, prebuilt=0x7f92d8047528,
    match_mode=<optimized out>, direction=0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/storage/innobase/row/row0sel.c:4662
#6 0x00000000007e0094 in ha_innobase::index_read (this=0x7f92d8045aa0, buf=0x7f92d804fcd0 "\341\004",
    key_ptr=0x7f92d8050060 "\004", key_len=4, find_flag=<optimized out>)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/storage/innobase/handler/ha_innodb.cc:6461
#7 0x00000000007e79db in ha_innobase::rnd_pos (this=0x7f92d8045aa0, buf=0x7f92d804fcd0 "\341\004", pos=0x7f92d8050060 "\004")
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/storage/innobase/handler/ha_innodb.cc:6937
#8 0x000000000074ad95 in Rows_log_event::find_row (this=0x7f92d804dbb0, rli=0x7f92d800b060)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/log_event.cc:9559
#9 0x000000000074b6fe in Delete_rows_log_event::do_exec_row (this=0x7f92d804dbb0, rli=0x7f92d800b060)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/log_event.cc:9901
#10 0x00000000007415b4 in Rows_log_event::do_apply_event (this=0x7f92d804dbb0, rli=0x7f92d800b060)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/log_event.cc:7931
#11 0x000000000058a6ae in apply_event (rli=<optimized out>, this=0x7f92d804dbb0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/log_event.h:1139
#12 wsrep_apply_rbr (thd=0x7f92d8000990, rbr_buf=<optimized out>, buf_len=0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:7876
#13 0x00000000005909f7 in wsrep_apply_cb (ctx=<optimized out>, buf=0x7f92d804d4a0, buf_len=387, global_seqno=<optimized out>)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:7955
#14 0x00007f93060c9347 in apply_wscoll (trx=..., apply_cb=0x590970 <wsrep_apply_cb(void*, void const*, unsigned long, long)>,
    recv_ctx=0x7f92d8000990) at galera/src/replicator_smm.cpp:37
#15 apply_trx_ws (recv_ctx=0x7f92d8000990, apply_cb=0x590970 <wsrep_apply_cb(void*, void const*, unsigned long, long)>,
    commit_cb=0x59c420 <wsrep_commit_cb(void*, long, bool)>, trx=...) at galera/src/replicator_smm.cpp:79
#16 0x00007f93060cfc28 in galera::ReplicatorSMM::apply_trx (this=0x30fa5d0, recv_ctx=0x7f92d8000990, trx=0x7f92d80403b0)
    at galera/src/replicator_smm.cpp:466
#17 0x00007f93060d0a56 in galera::ReplicatorSMM::process_trx (this=0x30fa5d0, recv_ctx=0x7f92d8000990, trx=0x7f92d80403b0)
    at galera/src/replicator_smm.cpp:1054
#18 0x00007f93060ac8e3 in galera::GcsActionSource::dispatch (this=0x30fab88, recv_ctx=0x7f92d8000990, act=...)
    at galera/src/gcs_action_source.cpp:110
#19 0x00007f93060ad2d3 in galera::GcsActionSource::process (this=0x30fab88, recv_ctx=0x7f92d8000990)
    at galera/src/gcs_action_source.cpp:168
---Type <return> to continue, or q <return> to quit---
#20 0x00007f93060d2055 in galera::ReplicatorSMM::async_recv (this=0x30fa5d0, recv_ctx=0x7f92d8000990)
    at galera/src/replicator_smm.cpp:371
#21 0x0000000000590cda in wsrep_replication_process (thd=0x7f92d8000990)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:8108
#22 0x0000000000517c9e in start_wsrep_THD (arg=0x590c90)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/mysqld.cc:4403
#23 0x00007f931a419e9a in start_thread (arg=0x7f9308655700) at pthread_create.c:308
#24 0x00007f931a146cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#25 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f9308614700 (LWP 24705)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007f93060cae28 in wait (cond=..., this=<optimized out>) at galerautils/src/gu_lock.hpp:56
#2 pre_enter (lock=..., obj=..., this=0x30faf30) at galera/src/monitor.hpp:369
#3 galera::Monitor<galera::ReplicatorSMM::LocalOrder>::enter (this=0x30faf30, obj=...) at galera/src/monitor.hpp:113
#4 0x00007f93060cd428 in galera::ReplicatorSMM::cert (this=0x30fa5d0, trx=0x7f92dc039d40) at galera/src/replicator_smm.cpp:1465
#5 0x00007f93060d0973 in galera::ReplicatorSMM::process_trx (this=0x30fa5d0, recv_ctx=0x7f92dc000990, trx=0x7f92dc039d40)
    at galera/src/replicator_smm.cpp:1047
#6 0x00007f93060ac8e3 in galera::GcsActionSource::dispatch (this=0x30fab88, recv_ctx=0x7f92dc000990, act=...)
    at galera/src/gcs_action_source.cpp:110
#7 0x00007f93060ad2d3 in galera::GcsActionSource::process (this=0x30fab88, recv_ctx=0x7f92dc000990)
    at galera/src/gcs_action_source.cpp:168
#8 0x00007f93060d2055 in galera::ReplicatorSMM::async_recv (this=0x30fa5d0, recv_ctx=0x7f92dc000990)
    at galera/src/replicator_smm.cpp:371
#9 0x0000000000590cda in wsrep_replication_process (thd=0x7f92dc000990)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:8108
#10 0x0000000000517c9e in start_wsrep_THD (arg=0x590c90)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/mysqld.cc:4403
#11 0x00007f931a419e9a in start_thread (arg=0x7f9308614700) at pthread_create.c:308
#12 0x00007f931a146cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#13 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f9304b33700 (LWP 25461)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007f93060cae28 in wait (cond=..., this=<optimized out>) at galerautils/src/gu_lock.hpp:56
#2 pre_enter (lock=..., obj=..., this=0x30faf30) at galera/src/monitor.hpp:369
#3 galera::Monitor<galera::ReplicatorSMM::LocalOrder>::enter (this=0x30faf30, obj=...) at galera/src/monitor.hpp:113
#4 0x00007f93060cd428 in galera::ReplicatorSMM::cert (this=0x30fa5d0, trx=0x7f92d4072100) at galera/src/replicator_smm.cpp:1465
#5 0x00007f93060cea56 in galera::ReplicatorSMM::pre_commit (this=0x30fa5d0, trx=0x7f92d4072100)
    at galera/src/replicator_smm.cpp:687
#6 0x00007f93060e8301 in galera_pre_commit (gh=<optimized out>, conn_id=<optimized out>, trx_handle=<optimized out>,
    rbr_data=<optimized out>, rbr_data_len=387, flags=1, global_seqno=0x3069bc0) at galera/src/wsrep_provider.cpp:396
#7 0x0000000000655fc0 in wsrep_run_wsrep_commit (thd=0x30682a0, hton=<optimized out>, all=<optimized out>)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/wsrep_hton.cc:332
#8 0x0000000000656d43 in wsrep_prepare (hton=0x293abf0, thd=0x30682a0, all=true)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/wsrep_hton.cc:118
#9 0x00000000006a2611 in ha_commit_trans (thd=0x30682a0, all=<optimized out>)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/handler.cc:1317
#10 0x000000000064c355 in trans_commit (thd=0x30682a0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/transaction.cc:196
#11 0x000000000059420d in mysql_execute_command (thd=0x30682a0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:4108
#12 0x00000000005986a4 in mysql_parse (thd=0x30682a0, parser_state=0x7f9304b32210, length=<optimized out>, rawbuf=<optimized out>)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:6190
#13 0x0000000000598fab in mysql_parse (parser_state=0x7f9304b32210, length=6, rawbuf=0x7f92d4045230 "COMMIT", thd=0x30682a0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:6141
#14 wsrep_mysql_parse (thd=0x30682a0, rawbuf=0x7f92d4045230 "COMMIT", length=6, parser_state=0x7f9304b32210)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:5955
#15 0x000000000059b227 in dispatch_command (command=COM_QUERY, thd=0x30682a0, packet=<optimized out>, packet_length=78848760)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:1212
#16 0x000000000059b83d in do_command (thd=0x30682a0) at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:869
#17 0x000000000063d415 in do_handle_one_connection (thd_arg=<optimized out>)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_connect.cc:865
#18 0x000000000063d62c in handle_one_connection (arg=0x30682a0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_connect.cc:777
#19 0x00007f931a419e9a in start_thread (arg=0x7f9304b33700) at pthread_create.c:308
#20 0x00007f931a146cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
---Type <return> to continue, or q <return> to quit---
#21 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f9304af2700 (LWP 25464)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007f93060d777c in wait (cond=..., this=<optimized out>) at galerautils/src/gu_lock.hpp:56
#2 galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::drain_common (this=0x30fafe8, seqno=<optimized out>, lock=...)
    at galera/src/monitor.hpp:470
#3 0x00007f93060d7aad in galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::drain (this=0x30fafe8, seqno=8310543)
    at galera/src/monitor.hpp:254
#4 0x00007f93060cc226 in galera::ReplicatorSMM::pause (this=0x30fa5d0) at galera/src/replicator_smm.cpp:1374
#5 0x00007f93060e794b in galera_pause (gh=<optimized out>) at galera/src/wsrep_provider.cpp:742
#6 0x000000000073bf4c in Global_read_lock::make_global_read_lock_block_commit (this=0x307dfb0, thd=0x307c9d0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/lock.cc:1079
#7 0x000000000065433c in reload_acl_and_cache (thd=0x307c9d0, options=16388, tables=0x0, write_to_binlog=0x7f9304aefa20)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_reload.cc:218
#8 0x00000000005920b1 in mysql_execute_command (thd=0x307c9d0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:3998
#9 0x00000000005986a4 in mysql_parse (thd=0x307c9d0, parser_state=0x7f9304af1210, length=<optimized out>, rawbuf=<optimized out>)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:6190
#10 0x0000000000598fab in mysql_parse (parser_state=0x7f9304af1210, length=27,
    rawbuf=0x7f92c8021aa0 "FLUSH TABLES WITH READ LOCK", thd=0x307c9d0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:6141
#11 wsrep_mysql_parse (thd=0x307c9d0, rawbuf=0x7f92c8021aa0 "FLUSH TABLES WITH READ LOCK", length=27, parser_state=0x7f9304af1210)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:5955
#12 0x000000000059b227 in dispatch_command (command=COM_QUERY, thd=0x307c9d0, packet=<optimized out>, packet_length=78582520)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:1212
#13 0x000000000059b83d in do_command (thd=0x307c9d0) at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_parse.cc:869
#14 0x000000000063d415 in do_handle_one_connection (thd_arg=<optimized out>)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_connect.cc:865
#15 0x000000000063d62c in handle_one_connection (arg=0x307c9d0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/sql_connect.cc:777
#16 0x00007f931a419e9a in start_thread (arg=0x7f9304af2700) at pthread_create.c:308
#17 0x00007f931a146cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#18 0x0000000000000000 in ?? ()

Thread 5 was trying to BF abort thread 3, which in turn was waiting in galera local monitor pre_enter(). However, FTWRL in thread 2 had already locked galera local monitor disallowing thread 3 to proceed into local monitor enter() method where it would had discovered that it was aborted.

To prevent this happening, FTWRL should acquire global read lock before pausing provider.

Related branches

Changed in codership-mysql:
milestone: none → 5.5.28-23.7
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Teemu Ollakka (teemu-ollakka)
Revision history for this message
Teemu Ollakka (teemu-ollakka) wrote :
Changed in codership-mysql:
status: Confirmed → Fix Committed
Changed in codership-mysql:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.