Percona XtraDB Cluster - HA scalable solution for MySQL

Issue with lock_sys->mutex when wsrep_log_conflicts

Reported by Raghavendra D Prabhu on 2013-10-02
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Status tracked in 5.6
5.6
High
Seppo Jaakola
Percona XtraDB Cluster
Status tracked in Trunk
5.6
Undecided
Unassigned
Trunk
Undecided
Unassigned

Bug Description

trx_print in wsrep_kill_victim is called when wsrep_log_conflicts is enabled. However,it causes issues with lock_mutex

*** Priority TRANSACTION:
########################################
DEADLOCK of threads detected!
Mutex 0x7fead4026088 owned by thread 140646291601152 file /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc line 6738
--Thread 140646291601152 has waited at trx0trx.cc line 2071 for 0.0000 seconds the semaphore:
Mutex at 0x7fead4026088 '&lock_sys->mutex', lock var 1
Last time reserved in file /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc line 6738, waiters flag 1
########################################
2013-10-03 01:49:01 7feac43ff700 InnoDB: Assertion failure in thread 140646291601152 in file sync0arr.cc line 426
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
20:19:01 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=20
max_threads=153
thread_count=19
connection_count=19
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 69207 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x39ee140
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7feac43feed8 thread_stack 0x40000
/pxc56/bin/mysqld(my_print_stacktrace+0x2c)[0x942498]
/pxc56/bin/mysqld(handle_fatal_signal+0x304)[0x6d8248]
/usr/lib/libpthread.so.0(+0xf870)[0x7fead9e2e870]
/usr/lib/libc.so.6(gsignal+0x39)[0x7fead81bc3d9]
/usr/lib/libc.so.6(abort+0x148)[0x7fead81bd7d8]
/pxc56/bin/mysqld[0xa60cbf]
/pxc56/bin/mysqld[0xa63a5a]
/pxc56/bin/mysqld[0xa7d0d9]
/pxc56/bin/mysqld[0xa7d15d]
/pxc56/bin/mysqld[0xa7ea35]
/pxc56/bin/mysqld[0x9a1b25]
/pxc56/bin/mysqld[0x9a2104]
/pxc56/bin/mysqld[0x9a775e]
/pxc56/bin/mysqld[0x9a79d6]
/pxc56/bin/mysqld[0x9a84a0]
/pxc56/bin/mysqld[0xa3b157]
/pxc56/bin/mysqld[0xa41567]
/pxc56/bin/mysqld[0x972173]
/pxc56/bin/mysqld[0x95e149]
/pxc56/bin/mysqld[0x9727d7]
/pxc56/bin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x101)[0x643c55]
/pxc56/bin/mysqld(_ZN12ha_partition8rnd_nextEPh+0x86)[0xbc7c1e]
/pxc56/bin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x101)[0x643c55]
/pxc56/bin/mysqld[0x83f7ac]
/pxc56/bin/mysqld(_Z8filesortP3THDP5TABLEP8FilesortbPyS5_+0x74c)[0x840304]
/pxc56/bin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP10SQL_I_ListI8st_orderEyy+0x9af)[0x888d17]
/pxc56/bin/mysqld(_Z21mysql_execute_commandP3THD+0x2880)[0x7567d4]
/pxc56/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x6ef)[0x75a84a]
/pxc56/bin/mysqld[0x75ae96]
/pxc56/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xbab)[0x75be6d]
/pxc56/bin/mysqld(_Z10do_commandP3THD+0x50c)[0x75d3c3]
/pxc56/bin/mysqld(_Z24do_handle_one_connectionP3THD+0x129)[0x729a7d]
/pxc56/bin/mysqld(handle_one_connection+0x20)[0x729ba5]
/usr/lib/libpthread.so.0(+0x80a2)[0x7fead9e270a2]
/usr/lib/libc.so.6(clone+0x6d)[0x7fead826c43d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fea5c004c50): DELETE FROM `table1000_innodb_key_pk_parts_2_int_autoinc` ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 3
Connection ID (thread ID): 12
Status: NOT_KILLED

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
Writing a core file

Backtrace:

=============================================================
(gdb) bt
#0 0x00007fead9e2c0b1 in pthread_kill () from /usr/lib/libpthread.so.0
#1 0x0000000000941cbf in my_write_core (sig=6) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/mysys/stacktrace.c:422
#2 0x00000000006d83ec in handle_fatal_signal (sig=6) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/signal_handler.cc:254
#3 <signal handler called>
#4 0x00007fead81bc3d9 in raise () from /usr/lib/libc.so.6
#5 0x00007fead81bd7d8 in abort () from /usr/lib/libc.so.6
#6 0x0000000000a60cbf in sync_array_wait_event (arr=arr@entry=0x3268060, index=2) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/sync/sync0arr.cc:426
#7 0x0000000000a63a5a in mutex_spin_wait (_mutex=_mutex@entry=0x7fead4026088, high_priority=high_priority@entry=false,
    file_name=file_name@entry=0xd40718 "/media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/trx/trx0trx.cc", line=line@entry=2071)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/sync/sync0sync.cc:660
#8 0x0000000000a7d0d9 in mutex_enter_func (mutex=0x7fead4026088, file_name=file_name@entry=0xd40718 "/media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/trx/trx0trx.cc",
    line=2071) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/include/sync0sync.ic:274
#9 0x0000000000a7d15d in pfs_mutex_enter_func (mutex=0x7fead4026088, file_name=file_name@entry=0xd40718 "/media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/trx/trx0trx.cc",
    line=line@entry=2071) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/include/sync0sync.ic:351
#10 0x0000000000a7ea35 in trx_print (f=0x7fead852d060 <_IO_2_1_stderr_>, trx=trx@entry=0x7fea5c00bbd8, max_query_len=max_query_len@entry=3000)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/trx/trx0trx.cc:2071
#11 0x00000000009a1b25 in wsrep_kill_victim (trx=trx@entry=0x7fea5c00bbd8, lock=lock@entry=0x7fea38008460)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:1610
#12 0x00000000009a2104 in lock_rec_other_has_conflicting (mode=mode@entry=1027, block=block@entry=0x7feaa7fad420, heap_no=heap_no@entry=2, trx=trx@entry=0x7fea5c00bbd8)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:1665
#13 0x00000000009a775e in lock_rec_lock_slow (impl=impl@entry=0, mode=mode@entry=1027, block=block@entry=0x7feaa7fad420, heap_no=heap_no@entry=2, index=index@entry=0x7fea74214488, thr=thr@entry=0x7fea3804f2b0)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:2438
#14 0x00000000009a79d6 in lock_rec_lock (impl=impl@entry=0, mode=mode@entry=1027, block=block@entry=0x7feaa7fad420, heap_no=heap_no@entry=2, index=index@entry=0x7fea74214488, thr=thr@entry=0x7fea3804f2b0)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:2526
#15 0x00000000009a84a0 in lock_clust_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0x7feaa7fad420, rec=rec@entry=0x7feaaf890080 "\200", index=index@entry=0x7fea74214488,
    offsets=offsets@entry=0x7feac43fbc50, mode=mode@entry=LOCK_X, gap_mode=gap_mode@entry=1024, thr=thr@entry=0x7fea3804f2b0)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:6746
#16 0x0000000000a3b157 in sel_set_rec_lock (block=0x7feaa7fad420, rec=rec@entry=0x7feaaf890080 "\200", index=index@entry=0x7fea74214488, offsets=offsets@entry=0x7feac43fbc50, mode=mode@entry=3,
    type=type@entry=1024, thr=thr@entry=0x7fea3804f2b0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/row/row0sel.cc:1011
#17 0x0000000000a41567 in row_search_for_mysql (buf=buf@entry=0x7fea38049d68 "\367\002", mode=mode@entry=1, prebuilt=0x7fea3804eba8, match_mode=match_mode@entry=0, direction=direction@entry=0)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/row/row0sel.cc:4459
#18 0x0000000000972173 in ha_innobase::index_read (this=0x7fea3804a140, buf=0x7fea38049d68 "\367\002", key_ptr=0x0, key_len=<optimized out>, find_flag=<optimized out>)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:8504
#19 0x000000000095e149 in ha_innobase::index_first (this=0x7fea3804a140, buf=0x7fea38049d68 "\367\002")
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:8901
#20 0x00000000009727d7 in ha_innobase::rnd_next (this=0x7fea3804a140, buf=0x7fea38049d68 "\367\002")
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:8998
#21 0x0000000000643c55 in handler::ha_rnd_next (this=this@entry=0x7fea3804a140, buf=buf@entry=0x7fea38049d68 "\367\002") at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/handler.cc:2786
#22 0x0000000000bc7c1e in ha_partition::rnd_next (this=0x7fea38012300, buf=0x7fea38049d68 "\367\002") at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/ha_partition.cc:4688
#23 0x0000000000643c55 in handler::ha_rnd_next (this=this@entry=0x7fea38012300, buf=0x7fea38049d68 "\367\002") at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/handler.cc:2786
#24 0x000000000083f7ac in find_all_keys (param=param@entry=0x7feac43fc9e0, select=select@entry=0x0, fs_info=fs_info@entry=0x7feac43fc970, buffpek_pointers=buffpek_pointers@entry=0x7feac43fcb80,
    tempfile=tempfile@entry=0x7feac43fca60, pq=0x0, found_rows=found_rows@entry=0x7feac43fcde0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/filesort.cc:768
#25 0x0000000000840304 in filesort (thd=thd@entry=0x39ee140, table=table@entry=0x7fea38011a10, filesort=filesort@entry=0x7feac43fce20, sort_positions=sort_positions@entry=true,
    examined_rows=examined_rows@entry=0x7feac43fcdc0, found_rows=found_rows@entry=0x7feac43fcde0) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/filesort.cc:340
#26 0x0000000000888d17 in mysql_delete (thd=thd@entry=0x39ee140, table_list=0x7fea5c004e68, conds=0x0, order_list=order_list@entry=0x39f0fe0, limit=3, options=0)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_delete.cc:303
#27 0x00000000007567d4 in mysql_execute_command (thd=thd@entry=0x39ee140) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:4168
#28 0x000000000075a84a in mysql_parse (thd=thd@entry=0x39ee140,
    rawbuf=rawbuf@entry=0x7fea5c004c50 "DELETE FROM `table1000_innodb_key_pk_parts_2_int_autoinc` ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 3", length=length@entry=137,
    parser_state=parser_state@entry=0x7feac43fe820) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:7048
#29 0x000000000075ae96 in wsrep_mysql_parse (thd=thd@entry=0x39ee140,
    rawbuf=0x7fea5c004c50 "DELETE FROM `table1000_innodb_key_pk_parts_2_int_autoinc` ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 3", length=length@entry=137,
    parser_state=parser_state@entry=0x7feac43fe820) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:6800
#30 0x000000000075be6d in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x39ee140,
    packet=packet@entry=0x395dd11 "DELETE FROM `table1000_innodb_key_pk_parts_2_int_autoinc` ORDER BY `col_char_12`,`col_char_12_key`,`col_int`,`col_int_key`,`pk` LIMIT 3",
    packet_length=packet_length@entry=137) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:1628
---Type <return> to continue, or q <return> to quit---
#31 0x000000000075d3c3 in do_command (thd=0x39ee140) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:1133
#32 0x0000000000729a7d in do_handle_one_connection (thd_arg=thd_arg@entry=0x39ee140) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_connect.cc:1640
#33 0x0000000000729ba5 in handle_one_connection (arg=0x39ee140) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_connect.cc:1544
#34 0x00007fead9e270a2 in start_thread () from /usr/lib/libpthread.so.0
#35 0x00007fead826c43d in clone () from /usr/lib/libc.so.6
===========================================================================

The cause is lock_clust_rec_read_check_and_lock which does lock_mutex_enter which then conflicts when trx_print is entered (which tries it again).

Following should here:

=== modified file 'Percona-Server/storage/innobase/lock/lock0lock.cc'
--- Percona-Server/storage/innobase/lock/lock0lock.cc 2013-10-03 10:17:48 +0000
+++ Percona-Server/storage/innobase/lock/lock0lock.cc 2013-10-03 10:51:57 +0000
@@ -1607,7 +1607,9 @@
     else
      fputs("\n*** Victim TRANSACTION:\n",
            stderr);
- trx_print(stderr, trx, 3000);
+ mutex_enter(&trx_sys->mutex);
+ trx_print_latched(stderr, trx, 3000);
+ mutex_exit(&trx_sys->mutex);

     if (bf_other)
      fputs("\n*** Priority TRANSACTION:\n",
@@ -1615,7 +1617,9 @@
     else
      fputs("\n*** Victim TRANSACTION:\n",
            stderr);
- trx_print(stderr, lock->trx, 3000);
+ mutex_enter(&trx_sys->mutex);
+ trx_print_latched(stderr, lock->trx, 3000);
+ mutex_exit(&trx_sys->mutex);

     fputs("*** WAITING FOR THIS LOCK TO BE GRANTED:\n",
           stderr);

Need to test on jenkins to see if there are any other conflicts.

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)

trx_sys->mutex in #1 conflicts with trx->mutex held earlier.

=== modified file 'Percona-Server/storage/innobase/lock/lock0lock.cc'
--- Percona-Server/storage/innobase/lock/lock0lock.cc 2013-10-06 16:00:48 +0000
+++ Percona-Server/storage/innobase/lock/lock0lock.cc 2013-10-08 22:00:44 +0000
@@ -1608,9 +1608,7 @@
     else
      fputs("\n*** Victim TRANSACTION:\n",
            stderr);
- mutex_enter(&trx_sys->mutex);
     trx_print_latched(stderr, trx, 3000);
- mutex_exit(&trx_sys->mutex);

     if (bf_other)
      fputs("\n*** Priority TRANSACTION:\n",
@@ -1618,9 +1616,7 @@
     else
      fputs("\n*** Victim TRANSACTION:\n",
            stderr);
- mutex_enter(&trx_sys->mutex);
     trx_print_latched(stderr, lock->trx, 3000);
- mutex_exit(&trx_sys->mutex);

     fputs("*** WAITING FOR THIS LOCK TO BE GRANTED:\n",
           stderr);
@@ -2414,6 +2410,8 @@

  DBUG_EXECUTE_IF("innodb_report_deadlock", return(DB_DEADLOCK););

+ if (wsrep_log_conflicts)
+ mutex_enter(&trx_sys->mutex);
  trx = thr_get_trx(thr);
  trx_mutex_enter(trx);

@@ -2458,11 +2456,14 @@
   ut_ad(lock == NULL);
 enqueue_waiting:
 #ifdef WITH_WSREP
+ if (wsrep_log_conflicts)
+ mutex_exit(&trx_sys->mutex);
   /* c_lock is NULL here if jump to enqueue_waiting happened
   but it's ok because lock is not NULL in that case and c_lock
   is not used. */
   err = lock_rec_enqueue_waiting(c_lock,
    mode, block, heap_no, lock, index, thr);
+ goto released;
 #else
   err = lock_rec_enqueue_waiting(
    mode, block, heap_no, lock, index, thr);
@@ -2477,7 +2478,10 @@
   err = DB_SUCCESS_LOCKED_REC;
  }

- trx_mutex_exit(trx);
+ if (wsrep_log_conflicts)
+ mutex_exit(&trx_sys->mutex);
+released:
+ trx_mutex_exit(trx);

  return(err);
 }

=======================================================

seems to be fixing it.

Changed in codership-mysql:
importance: Undecided → High
milestone: none → 5.6.14-24.1
status: New → Confirmed
Changed in codership-mysql:
status: Confirmed → In Progress
Seppo Jaakola (seppo-jaakola) wrote :

Pushed a fix into wsrep-5.6, in revision:
http://bazaar.launchpad.net/~codership/codership-mysql/5.6/revision/4012

Note that this fix which is different than what was proposed above, trx_print_latched() turned out to give simple workaround for this occasion.

@Seppo,

Yes, trx_print_latched was indeed added in earlier fix (though didn't add that in comment).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers