Percona XtraDB Cluster - HA scalable solution for MySQL

Deadlock with wsrep_kill_victim

Reported by Raghavendra D Prabhu on 2013-09-30
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Undecided
Seppo Jaakola
Percona XtraDB Cluster
Status tracked in Trunk
5.6
Undecided
Raghavendra D Prabhu
Trunk
Undecided
Unassigned

Bug Description

########################################
DEADLOCK of threads detected!
Mutex 0x7f2ecafc0088 owned by thread 139838908270336 file /mnt/workspace/build-xtradb-cluster-binaries-56/BUILD_TYPE/debug/label_exp/centos6-64/Percona-XtraDB-Cluster-5.6.13/storage/innobase/lock/lock0lock.cc line 6734
--Thread 139838908270336 has waited at ha_innodb.cc line 4309 for 0.0000 seconds the semaphore:
Mutex at 0x7f2ecafc0088 '&lock_sys->mutex', lock var 1
Last time reserved in file /mnt/workspace/build-xtradb-cluster-binaries-56/BUILD_TYPE/debug/label_exp/centos6-64/Percona-XtraDB-Cluster-5.6.13/storage/innobase/lock/lock0lock.cc line 6734, waiters flag 1
########################################
2013-09-30 20:09:55 7f2ec8740700 InnoDB: Assertion failure in thread 139838908270336 in file sync0arr.cc line 426

=====================================

There is a deadlock caused by following callchain:

Thread 30 (Thread 0x7f0e924ef700 (LWP 32799)):
#0 0x00007f0ea7ee603f in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1 0x00000000009c5b1a in os_cond_wait (cond=cond@entry=0x2e2eb10, fast_mutex=fast_mutex@entry=0x2e2ead0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/os/os0sync.cc:214
#2 0x00000000009c5dd7 in os_event_wait_low (event=event@entry=0x2e2ead0, reset_sig_count=1) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/os/os0sync.cc:610
#3 0x0000000000a5ab4d in sync_array_wait_event (arr=arr@entry=0x2d6b4c0, index=0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/sync/sync0arr.cc:424
#4 0x0000000000a5c70a in mutex_spin_wait (mutex=mutex@entry=0x7f0e91016088, file_name=file_name@entry=0xcf5e48 "/media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/handler/ha_innodb.cc", line=line@entry=4309) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/sync/sync0sync.cc:584
#5 0x000000000095d38b in mutex_enter_func (mutex=mutex@entry=0x7f0e91016088, file_name=file_name@entry=0xcf5e48 "/media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/handler/ha_innodb.cc", line=line@entry=4309) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/include/sync0sync.ic:222
#6 0x000000000095d3e8 in pfs_mutex_enter_func (mutex=0x7f0e91016088, file_name=file_name@entry=0xcf5e48 "/media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/handler/ha_innodb.cc", line=line@entry=4309) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/include/sync0sync.ic:247
#7 0x000000000095e69c in innobase_kill_connection (hton=0x2d6b280, thd=0x2feaf60) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/handler/ha_innodb.cc:4309
#8 0x000000000063cf14 in kill_handlerton (thd=thd@entry=0x2feaf60, plugin=plugin@entry=0x7f0e924ea198) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/handler.cc:899
#9 0x0000000000761b11 in plugin_foreach_with_mask (thd=thd@entry=0x2feaf60, func=func@entry=0x63cedb <kill_handlerton(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=4294967287, state_mask@entry=8, arg=arg@entry=0x0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/sql_plugin.cc:2087
#10 0x000000000063f16e in ha_kill_connection (thd=thd@entry=0x2feaf60) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/handler.cc:907
#11 0x000000000071cb93 in THD::awake (this=this@entry=0x2feaf60, state_to_set=state_to_set@entry=THD::KILL_QUERY) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/sql_class.cc:2171
#12 0x000000000071cc8f in wsrep_thd_awake (thd=0x2feaf60, signal=<optimized out>) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/sql_class.cc:1021
#13 0x000000000096990d in wsrep_innobase_kill_one_trx (bf_trx=bf_trx@entry=0x7f0e7c018578, victim_trx=0x7f0e4c00bab8, signal=signal@entry=1) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/handler/ha_innodb.cc:17228
#14 0x00000000009a0510 in wsrep_kill_victim (trx=trx@entry=0x7f0e7c018578, lock=lock@entry=0x7f0e4c00c160) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/lock/lock0lock.cc:1630
#15 0x00000000009a0a2f in lock_rec_other_has_conflicting (mode=mode@entry=1027, block=block@entry=0x7f0e91205350, heap_no=heap_no@entry=4, trx=trx@entry=0x7f0e7c018578) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/lock/lock0lock.cc:1665
#16 0x00000000009a5fae in lock_rec_lock_slow (impl=impl@entry=0, mode=mode@entry=1027, block=block@entry=0x7f0e91205350, heap_no=heap_no@entry=4, index=index@entry=0x7f0e44016b38, thr=thr@entry=0x7f0e4000b1a0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/lock/lock0lock.cc:2436
#17 0x00000000009a621f in lock_rec_lock (impl=impl@entry=0, mode=mode@entry=1027, block=block@entry=0x7f0e91205350, heap_no=heap_no@entry=4, index=index@entry=0x7f0e44016b38, thr=thr@entry=0x7f0e4000b1a0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/lock/lock0lock.cc:2524
#18 0x00000000009a6ce6 in lock_clust_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0x7f0e91205350, rec=rec@entry=0x7f0e916cc219 "", index=index@entry=0x7f0e44016b38, offsets=offsets@entry=0x7f0e924eaa30, mode=mode@entry=LOCK_X, gap_mode=gap_mode@entry=1024, thr=thr@entry=0x7f0e4000b1a0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/lock/lock0lock.cc:6742
#19 0x0000000000a3660d in sel_set_rec_lock (block=0x7f0e91205350, rec=rec@entry=0x7f0e916cc219 "", index=index@entry=0x7f0e44016b38, offsets=offsets@entry=0x7f0e924eaa30, mode=mode@entry=3, type=type@entry=1024, thr=thr@entry=0x7f0e4000b1a0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/row/row0sel.cc:1011
#20 0x0000000000a3c811 in row_search_for_mysql (buf=buf@entry=0x7f0e4000a280 "\377", mode=mode@entry=2, prebuilt=0x7f0e4000aac8, match_mode=match_mode@entry=1, direction=direction@entry=0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/row/row0sel.cc:4459
#21 0x0000000000971773 in ha_innobase::index_read (this=0x7f0e40008550, buf=0x7f0e4000a280 "\377", key_ptr=0x7f0e7c026fe8 "\306\301", key_len=<optimized out>, find_flag=<optimized out>) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/handler/ha_innodb.cc:8504
#22 0x0000000000648bae in handler::index_read_map (this=0x7f0e40008550, buf=0x7f0e4000a280 "\377", key=0x7f0e7c026fe8 "\306\301", keypart_map=<optimized out>, find_flag=HA_READ_KEY_EXACT) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/handler.h:2264
#23 0x000000000064152d in handler::ha_index_read_map (this=this@entry=0x7f0e40008550, buf=0x7f0e4000a280 "\377", key=0x7f0e7c026fe8 "\306\301", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/handler.cc:2851
#24 0x00000000006451ac in handler::read_range_first (this=0x7f0e40008550, start_key=0x7f0e40008630, end_key=0x7f0e40008650, eq_range_arg=<optimized out>, sorted=<optimized out>) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/handler.cc:6895
#25 0x000000000063cca7 in handler::multi_range_read_next (this=0x7f0e40008550, range_info=range_info@entry=0x7f0e924eb4a8) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/handler.cc:6066
#26 0x0000000000644c6b in DsMrr_impl::dsmrr_next (this=0x7f0e4000a228, range_info=0x7f0e924eb4a8) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/handler.cc:6425
#27 0x000000000095f8e2 in ha_innobase::multi_range_read_next (this=<optimized out>, range_info=<optimized out>) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/storage/innobase/handler/ha_innodb.cc:18754
#28 0x000000000085bae1 in QUICK_RANGE_SELECT::get_next (this=0x7f0e7c01a590) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/opt_range.cc:10585
#29 0x0000000000879e2e in rr_quick (info=0x7f0e924eb690) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/records.cc:367
#30 0x00000000007b98a4 in mysql_update (thd=thd@entry=0x7f0e7c0009a0, table_list=<optimized out>, fields=..., values=..., conds=0x7f0e7c009f30, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false, found_return=found_return@entry=0x7f0e924ebad0, updated_return=updated_return@entry=0x7f0e924ec030) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/sql_update.cc:741
#31 0x0000000000755561 in mysql_execute_command (thd=thd@entry=0x7f0e7c0009a0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/sql_parse.cc:3898
#32 0x0000000000759da3 in mysql_parse (thd=thd@entry=0x7f0e7c0009a0, rawbuf=<optimized out>, length=length@entry=39, parser_state=parser_state@entry=0x7f0e924ec840) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/sql_parse.cc:7136
#33 0x00000000008de388 in Query_log_event::do_apply_event (this=0x7f0e7c017730, rli=0x7f0e7c00bee0, query_arg=0x7f0e7c01a409 "UPDATE sbtest1 SET k=k+1 WHERE id=49606", q_len_arg=<optimized out>) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/log_event.cc:4824
#34 0x00000000008dea16 in Query_log_event::do_apply_event (this=<optimized out>, rli=<optimized out>) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/log_event.cc:4528
#35 0x00000000008d9c05 in Log_event::apply_event (this=this@entry=0x7f0e7c017730, rli=0x7f0e7c00bee0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/log_event.cc:3072
#36 0x000000000074cc88 in wsrep_apply_rbr (thd=thd@entry=0x7f0e7c0009a0, rbr_buf=rbr_buf@entry=0x7f0e95638e0a "\360\202DR\002\001", buf_len=638, buf_len@entry=818) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/sql_parse.cc:9064
#37 0x000000000075000d in wsrep_apply_cb (ctx=0x7f0e7c0009a0, buf=0x7f0e95638e0a, buf_len=818, meta=<optimized out>) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/sql_parse.cc:9128
#38 0x00007f0e9d7b5e56 in galera::TrxHandle::apply (this=this@entry=0x7f0e7c01e2c0, recv_ctx=recv_ctx@entry=0x7f0e7c0009a0, apply_cb=apply_cb@entry=0x74ff60 <wsrep_apply_cb(void*, void const*, unsigned long, wsrep_trx_meta const*)>, meta=...) at galera/src/trx_handle.cpp:298
#39 0x00007f0e9d7edafd in apply_trx_ws (recv_ctx=recv_ctx@entry=0x7f0e7c0009a0, apply_cb=0x74ff60 <wsrep_apply_cb(void*, void const*, unsigned long, wsrep_trx_meta const*)>, commit_cb=0x750257 <wsrep_commit_cb(void*, wsrep_trx_meta const*, bool*, bool)>, trx=..., meta=...) at galera/src/replicator_smm.cpp:38
#40 0x00007f0e9d7f09b0 in galera::ReplicatorSMM::apply_trx (this=this@entry=0x2cf7690, recv_ctx=recv_ctx@entry=0x7f0e7c0009a0, trx=trx@entry=0x7f0e7c01e2c0) at galera/src/replicator_smm.cpp:470
#41 0x00007f0e9d7f2efe in galera::ReplicatorSMM::process_trx (this=0x2cf7690, recv_ctx=0x7f0e7c0009a0, trx=0x7f0e7c01e2c0) at galera/src/replicator_smm.cpp:1242
#42 0x00007f0e9d7d1a19 in galera::GcsActionSource::dispatch (this=this@entry=0x2cf7c70, recv_ctx=recv_ctx@entry=0x7f0e7c0009a0, act=..., exit_loop=@0x7f0e924ee310: false) at galera/src/gcs_action_source.cpp:118
#43 0x00007f0e9d7d290c in galera::GcsActionSource::process (this=0x2cf7c70, recv_ctx=0x7f0e7c0009a0, exit_loop=@0x7f0e924ee310: false) at galera/src/gcs_action_source.cpp:177
#44 0x00007f0e9d7f311b in galera::ReplicatorSMM::async_recv (this=0x2cf7690, recv_ctx=0x7f0e7c0009a0) at galera/src/replicator_smm.cpp:346
#45 0x00007f0e9d803c68 in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at galera/src/wsrep_provider.cpp:207
#46 0x00000000007504f0 in wsrep_replication_process (thd=0x7f0e7c0009a0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/sql_parse.cc:9349
#47 0x0000000000624bae in start_wsrep_THD (arg=0x75049c <wsrep_replication_process(THD*)>) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc-5.6-test/Percona-Server/sql/mysqld.cc:5378
#48 0x00007f0ea7ee20a2 in start_thread () from /usr/lib/libpthread.so.0
#49 0x00007f0ea632743d in clone () from /usr/lib/libc.so.6

=============================

Full error attached.

=========================================

The deadlock is in lock_sys->mutex which is new in 5.6 in this codepath.

no longer affects: percona-xtradb-cluster

 lock_mutex_enter is held in lock_clust_rec_read_check_and_lock

after which it calls lock_rec_lock (as per call trace), eventually leading to innobase_kill_connection where it again calls lock_mutex_enter leading to a deadlock.

Quick bt:

pthread_cond_wait@@GLIBC_2.3.2
os_cond_wait
os_event_wait_low
sync_array_wait_event
mutex_spin_wait
mutex_enter_func
pfs_mutex_enter_func
innobase_kill_connection --------------------------------------------------------------------------> lock_mutex_enter deadlocks here again.
kill_handlerton
plugin_foreach_with_mask
ha_kill_connection
THD::awake
wsrep_thd_awake
wsrep_innobase_kill_one_trx
wsrep_kill_victim
lock_rec_other_has_conflicting
lock_rec_lock_slow
lock_rec_lock
lock_clust_rec_read_check_and_lock -------------------------------------------------> lock_mutex_enter
sel_set_rec_lock
row_search_for_mysql
ha_innobase::index_read
handler::index_read_map
handler::ha_index_read_map
handler::read_range_first
handler::multi_range_read_next
DsMrr_impl::dsmrr_next
ha_innobase::multi_range_read_next
QUICK_RANGE_SELECT::get_next
rr_quick
mysql_update
mysql_execute_command
mysql_parse
Query_log_event::do_apply_event
Query_log_event::do_apply_event
Log_event::apply_event
wsrep_apply_rbr
wsrep_apply_cb
galera::TrxHandle::apply
apply_trx_ws
galera::ReplicatorSMM::apply_trx
galera::ReplicatorSMM::process_trx
galera::GcsActionSource::dispatch
galera::GcsActionSource::process
galera::ReplicatorSMM::async_recv
galera_recv
wsrep_replication_process
start_wsrep_THD
start_thread
clone

The solution would be to drop the lock_sys->mutex (lock_mutex_exit) much before the kill, and re-acquiring it in innobase_kill_connection

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

=== modified file 'Percona-Server/sql/sql_class.cc'
--- Percona-Server/sql/sql_class.cc 2013-09-27 11:15:01 +0000
+++ Percona-Server/sql/sql_class.cc 2013-09-30 18:31:25 +0000
@@ -1017,6 +1017,7 @@
 {
   if (signal)
   {
+ ut_ad(!lock_mutex_own());
     mysql_mutex_lock(&thd->LOCK_thd_data);
     thd->awake(THD::KILL_QUERY);
     mysql_mutex_unlock(&thd->LOCK_thd_data);

=== modified file 'Percona-Server/storage/innobase/lock/lock0lock.cc'
--- Percona-Server/storage/innobase/lock/lock0lock.cc 2013-09-30 08:59:29 +0000
+++ Percona-Server/storage/innobase/lock/lock0lock.cc 2013-09-30 18:30:52 +0000
@@ -1662,7 +1662,9 @@

   if (lock_rec_has_to_wait(trx, mode, lock, is_supremum)) {
 #ifdef WITH_WSREP
+ lock_mutex_exit();
    wsrep_kill_victim(trx, lock);
+ lock_mutex_enter();
 #endif
    return(lock);
   }

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

should correct this but thread safety of the above needs to be looked into.

=== modified file 'Percona-Server/storage/innobase/handler/ha_innodb.cc'
--- Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-09-30 13:09:29 +0000
+++ Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-09-30 18:36:19 +0000
@@ -4301,6 +4301,7 @@
         handlerton* hton, /*!< in: innobase handlerton */
  THD* thd) /*!< in: handle to the MySQL thread being killed */
 {
+ ut_ad(!lock_mutex_own());
  trx_t* trx;

  DBUG_ENTER("innobase_kill_connection");

=== modified file 'Percona-Server/storage/innobase/lock/lock0lock.cc'
--- Percona-Server/storage/innobase/lock/lock0lock.cc 2013-09-30 08:59:29 +0000
+++ Percona-Server/storage/innobase/lock/lock0lock.cc 2013-09-30 18:30:52 +0000
@@ -1662,7 +1662,9 @@

   if (lock_rec_has_to_wait(trx, mode, lock, is_supremum)) {
 #ifdef WITH_WSREP
+ lock_mutex_exit();
    wsrep_kill_victim(trx, lock);
+ lock_mutex_enter();
 #endif
    return(lock);
   }

=============================================================

is better since lock_mutex is innodb only.

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
Download full text (10.2 KiB)

Above fix works but another deadlock is hit:

pthread_cond_wait@@GLIBC_2.3.2
os_cond_wait
os_event_wait_low
sync_array_wait_event
mutex_spin_wait
mutex_enter_func
pfs_mutex_enter_func
innobase_kill_connection -----------------------------> trx_mutex_enter deadlock
kill_handlerton
plugin_foreach_with_mask
ha_kill_connection
THD::awake
wsrep_thd_awake
wsrep_innobase_kill_one_trx
wsrep_kill_victim
lock_rec_other_has_conflicting
lock_rec_lock_slow ---------------------------------------------> trx_mutex_enter
lock_rec_lock
lock_clust_rec_read_check_and_lock
sel_set_rec_lock
row_search_for_mysql
ha_innobase::index_read
handler::index_read_map
handler::ha_index_read_map
handler::read_range_first
handler::multi_range_read_next
DsMrr_impl::dsmrr_next
ha_innobase::multi_range_read_next
QUICK_RANGE_SELECT::get_next
rr_quick
mysql_update
mysql_execute_command
mysql_parse
Query_log_event::do_apply_event
Query_log_event::do_apply_event
Log_event::apply_event
wsrep_apply_rbr
wsrep_apply_cb
galera::TrxHandle::apply
apply_trx_ws
galera::ReplicatorSMM::apply_trx
galera::ReplicatorSMM::process_trx
galera::GcsActionSource::dispatch
galera::GcsActionSource::process
galera::ReplicatorSMM::async_recv
galera_recv
wsrep_replication_process
start_wsrep_THD
start_thread
clone

------------------------------------------

The above fails as :

InnoDB: sync levels should be > 299 but a level is 297
Mutex '&trx->mutex'
InnoDB: Locked mutex: addr 0x7f4ad8018360 thread 139960774665984 file /mnt/workspace/build-xtradb-cluster-binaries-56/BUILD_TYPE/debug/label_exp/centos6-64/Percona-XtraDB-Cluster-5.6.13/storage/innobase/lock/lock0lock.cc line 2411
InnoDB: sync_thread_levels_g(array, 299) does not hold!
2013-09-30 22:15:57 7f4b2841a700 InnoDB: Assertion failure in thread 139960774665984 in file sync0sync.cc line 1250
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
19:15:57 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=11
max_threads=153
thread_count=11
connection_count=11
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 62039 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f4ad80009a0
Attempting backtrace. You can use the following information to find out
where...

Above requires similar trx_mutex_{enter,exit}, however it is better if the whole BF abort logic can be rewritten to avoid this if possible because as the stack indicates two separate innodb stacks (interleaved with handlerton calls) are involved here, as such, trx and lock_sys mutexes can cause conflict.

Fixed #7 with

------------------------------------------------------------
revno: 464
committer: Raghavendra D Prabhu <email address hidden>
branch nick: pxc56
timestamp: Tue 2013-10-01 01:37:57 +0530
message:
  Bug#1233301: Extending the mutex fixes

  Additionally,

  a) Added lock_mutex_{exit,enter} around lock_table_other_has_incompatible.

  b) Added trx_mutex_exit(trx) in wsrep_kill_victim.
diff:
=== modified file 'Percona-Server/storage/innobase/lock/lock0lock.cc'
--- Percona-Server/storage/innobase/lock/lock0lock.cc 2013-09-30 18:42:28 +0000
+++ Percona-Server/storage/innobase/lock/lock0lock.cc 2013-09-30 20:07:57 +0000
@@ -1626,6 +1626,8 @@
      lock_table_print(stderr, lock);
     }
    }
+ if (trx_mutex_own(lock->trx))
+ trx_mutex_exit(lock->trx);
    wsrep_innobase_kill_one_trx(
     (const trx_t*) trx, lock->trx, TRUE);
   }
@@ -4654,7 +4656,9 @@
 #ifdef WITH_WSREP
    if (wsrep_debug)
     fprintf(stderr, "WSREP: table lock abort");
+ lock_mutex_exit();
    wsrep_kill_victim((trx_t *)trx, (lock_t *)lock);
+ lock_mutex_enter();
 #endif
    return(lock);
   }

======================================================================================

Need to test this on jenkins.

 + if (trx_mutex_own(lock->trx))
+ trx_mutex_exit(lock->trx);

doesnt work because it is not available in debug build, need to move it above.

s/debug/non-debug/

The last commit fixes this - http://jenkins.percona.com/job/PXC-5.6-msysbench/23/

The other 'red' crashes are there due to different reasons. (though it may be related to this fix)

Even though this works, the drop and reacquire of mutexes can lead to race conditions (as discussed on skype), a better solution may be required.

The innobase_kill_connection (called by ha_kill_connection) which is not present in upstream code (hence codership tree not affected). It was implemented in https://blueprints.launchpad.net/percona-server/+spec/port-max-statement-time

Also, lock_cancel_waiting_and_release called by wsrep_innobase_kill_one_trx, requires both trx and lock_sys mutexes, hence dropping is not feasible in that code path.

All that innobase_kill_connection does is call lock_cancel_waiting_and_release, which is done even by wsrep_innobase_kill_one_trx before calling wsrep_thd_awake, so innobase_kill_connection in this case is redundant.

This can also affect 5.5 PXC, adding it.

Changed in percona-xtradb-cluster:
milestone: none → 5.5.34-23.7.6

Reported as https://bugs.launchpad.net/percona-server/+bug/1233690 for possible fix in Percona Server itself.

Reverting the earlier mutex patches as well.

Tested with an overload of THD::awake (after reverting the mutex commits) without ha_kill_connection, sysbench and sst tests are successful without any issues.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments