Bug #1233383 “innodb: unprotected access to was_chosen_as_deadlo...” : Bugs : MySQL patches by Codership

Teemu Ollakka (teemu-ollakka) on 2013-09-30

Changed in codership-mysql:
milestone:	none → 5.6.14-24.1
assignee:	nobody → Teemu Ollakka (teemu-ollakka)
importance:	Undecided → Critical

Teemu Ollakka (teemu-ollakka) on 2013-10-02

Changed in codership-mysql:
assignee:	Teemu Ollakka (teemu-ollakka) → Seppo Jaakola (seppo-jaakola)

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2013-10-02:

#1

a potential fix pushed in here: http://bazaar.launchpad.net/~codership/codership-mysql/5.6/revision/3953
tested with 2 node cluster, seriously conflicting sqlgen load through glb

Changed in codership-mysql:
status:	New → In Progress

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-10-02:

#2

Download full text (6.1 KiB)

There is another related crash with following assertion:

2013-10-03 01:15:02 7ff4cd778700 InnoDB: Assertion failure in thread 140689395910400 in file lock0lock.cc line 7101
InnoDB: Failing assertion: mutex_own(&lock_sys->mutex)

======================================================================================================
(gdb) bt
#0 0x00007ff4f1bb40b1 in pthread_kill () from /usr/lib/libpthread.so.0
#1 0x0000000000941cbf in my_write_core (sig=6) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/mysys/stacktrace.c:422
#2 0x00000000006d83ec in handle_fatal_signal (sig=6) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/signal_handler.cc:254
#3 <signal handler called>
#4 0x00007ff4eff443d9 in raise () from /usr/lib/libc.so.6
#5 0x00007ff4eff457d8 in abort () from /usr/lib/libc.so.6
#6 0x00000000009a3a9a in lock_cancel_waiting_and_release (lock=lock@entry=0x7ff44c0c1e40) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:7101
#7 0x000000000096a55d in wsrep_innobase_kill_one_trx (bf_trx=bf_trx@entry=0x7ff45c007cf8, victim_trx=victim_trx@entry=0x7ff44c0182d8, signal=signal@entry=1)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:17276
#8 0x000000000096ad60 in wsrep_abort_transaction (hton=<optimized out>, bf_thd=<optimized out>, victim_thd=0x2d40800, signal=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:17368
#9 0x0000000000648e1a in ha_wsrep_abort_transaction (bf_thd=bf_thd@entry=0x2d2ab40, victim_thd=victim_thd@entry=0x2d40800, signal=signal@entry=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/handler.cc:7747
#10 0x000000000063b073 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x2d2ab40, victim_thd_ptr=victim_thd_ptr@entry=0x2d40800, signal=signal@entry=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/wsrep_thd.cc:457
#11 0x00000000006312d4 in wsrep_grant_mdl_exception (requestor_ctx=requestor_ctx@entry=0x2d2ac78, ticket=ticket@entry=0x7ff44c017ef0)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/wsrep_mysqld.cc:1386
#12 0x00000000006cae19 in MDL_lock::can_grant_lock (this=this@entry=0x2c90e10, type_arg=MDL_EXCLUSIVE, requestor_ctx=requestor_ctx@entry=0x2d2ac78, ignore_lock_priority=ignore_lock_priority@entry=false)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/mdl.cc:1804
#13 0x00000000006cbc68 in MDL_context::try_acquire_lock_impl (this=this@entry=0x2d2ac78, mdl_request=mdl_request@entry=0x7ff45c005160, out_ticket=out_ticket@entry=0x7ff4cd775fb8)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/mdl.cc:2123
#14 0x00000000006cbddd in MDL_context::acquire_lock (this=this@entry=0x2d2ac78, mdl_request=0x7ff45c005160, lock_wait_timeout=lock_wait_timeout@entry=31536000)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/mdl.cc:2275
#15 0x00000000006cc974 in MDL_context::acquire_locks (this=...

There is another related crash with following assertion:

2013-10-03 01:15:02 7ff4cd778700  InnoDB: Assertion failure in thread 140689395910400 in file lock0lock.cc line 7101
InnoDB: Failing assertion: mutex_own(&lock_sys->mutex)

======================================================================================================
(gdb) bt
#0  0x00007ff4f1bb40b1 in pthread_kill () from /usr/lib/libpthread.so.0
#1  0x0000000000941cbf in my_write_core (sig=6) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/mysys/stacktrace.c:422
#2  0x00000000006d83ec in handle_fatal_signal (sig=6) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/signal_handler.cc:254
#3  <signal handler called>
#4  0x00007ff4eff443d9 in raise () from /usr/lib/libc.so.6
#5  0x00007ff4eff457d8 in abort () from /usr/lib/libc.so.6
#6  0x00000000009a3a9a in lock_cancel_waiting_and_release (lock=lock@entry=0x7ff44c0c1e40) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:7101
#7  0x000000000096a55d in wsrep_innobase_kill_one_trx (bf_trx=bf_trx@entry=0x7ff45c007cf8, victim_trx=victim_trx@entry=0x7ff44c0182d8, signal=signal@entry=1)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:17276
#8  0x000000000096ad60 in wsrep_abort_transaction (hton=<optimized out>, bf_thd=<optimized out>, victim_thd=0x2d40800, signal=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:17368
#9  0x0000000000648e1a in ha_wsrep_abort_transaction (bf_thd=bf_thd@entry=0x2d2ab40, victim_thd=victim_thd@entry=0x2d40800, signal=signal@entry=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/handler.cc:7747
#10 0x000000000063b073 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x2d2ab40, victim_thd_ptr=victim_thd_ptr@entry=0x2d40800, signal=signal@entry=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/wsrep_thd.cc:457
#11 0x00000000006312d4 in wsrep_grant_mdl_exception (requestor_ctx=requestor_ctx@entry=0x2d2ac78, ticket=ticket@entry=0x7ff44c017ef0)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/wsrep_mysqld.cc:1386
#12 0x00000000006cae19 in MDL_lock::can_grant_lock (this=this@entry=0x2c90e10, type_arg=MDL_EXCLUSIVE, requestor_ctx=requestor_ctx@entry=0x2d2ac78, ignore_lock_priority=ignore_lock_priority@entry=false)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/mdl.cc:1804
#13 0x00000000006cbc68 in MDL_context::try_acquire_lock_impl (this=this@entry=0x2d2ac78, mdl_request=mdl_request@entry=0x7ff45c005160, out_ticket=out_ticket@entry=0x7ff4cd775fb8)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/mdl.cc:2123
#14 0x00000000006cbddd in MDL_context::acquire_lock (this=this@entry=0x2d2ac78, mdl_request=0x7ff45c005160, lock_wait_timeout=lock_wait_timeout@entry=31536000)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/mdl.cc:2275
#15 0x00000000006cc974 in MDL_context::acquire_locks (this=this@entry=0x2d2ac78, mdl_requests=mdl_requests@entry=0x7ff4cd7760c0, lock_wait_timeout=lock_wait_timeout@entry=31536000)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/mdl.cc:2442
#16 0x000000000070dc93 in lock_table_names (thd=thd@entry=0x2d2ab40, tables_start=tables_start@entry=0x7ff45c004dc0, tables_end=tables_end@entry=0x0, lock_wait_timeout=31536000, flags=flags@entry=0)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_base.cc:4856
#17 0x00000000007a6e48 in mysql_rm_table (thd=thd@entry=0x2d2ab40, tables=tables@entry=0x7ff45c004dc0, if_exists=<optimized out>, drop_temporary=<optimized out>)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_table.cc:2110
#18 0x0000000000756b26 in mysql_execute_command (thd=thd@entry=0x2d2ab40) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:4253
#19 0x000000000075a84a in mysql_parse (thd=thd@entry=0x2d2ab40, rawbuf=rawbuf@entry=0x7ff45c004c50 "DROP TABLE `table100_innodb_key_pk_parts_2_int_autoinc`", length=length@entry=55,
    parser_state=parser_state@entry=0x7ff4cd777820) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:7048
#20 0x000000000075ae96 in wsrep_mysql_parse (thd=thd@entry=0x2d2ab40, rawbuf=0x7ff45c004c50 "DROP TABLE `table100_innodb_key_pk_parts_2_int_autoinc`", length=length@entry=55,
    parser_state=parser_state@entry=0x7ff4cd777820) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:6800
#21 0x000000000075be6d in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x2d2ab40, packet=packet@entry=0x2cdaaf1 " DROP TABLE `table100_innodb_key_pk_parts_2_int_autoinc` ",
    packet_length=packet_length@entry=57) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:1628
#22 0x000000000075d3c3 in do_command (thd=0x2d2ab40) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_parse.cc:1133
#23 0x0000000000729a7d in do_handle_one_connection (thd_arg=thd_arg@entry=0x2d2ab40) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_connect.cc:1640
#24 0x0000000000729ba5 in handle_one_connection (arg=0x2d2ab40) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/sql_connect.cc:1544
#25 0x00007ff4f1baf0a2 in start_thread () from /usr/lib/libpthread.so.0
#26 0x00007ff4efff443d in clone () from /usr/lib/libc.so.6

=====================================================================================================================================

The reason, as can be seen, we are not holding lock_sys->mutex 
anywhere in the call chain.

We have to hold it before calling lock_cancel_waiting_and_release

So, we will have to hold lock_sys->mutex with lock_mutex_enter 
much before wsrep_innobase_kill_one_trx or right before the call 
to lock_cancel_waiting_and_release.

(I can report this as a separate bug if required)

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-10-02:

#3

trx_mutex also needs to be taken care of.

=== modified file 'Percona-Server/storage/innobase/handler/ha_innodb.cc'
--- Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-10-02 09:14:44 +0000
+++ Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-10-02 20:17:33 +0000
@@ -17273,7 +17273,11 @@
    if (wait_lock) {
     WSREP_DEBUG("canceling wait lock");
     victim_trx->lock.was_chosen_as_deadlock_victim= TRUE;
+ lock_mutex_enter();
+ trx_mutex_enter(victim_trx);
     lock_cancel_waiting_and_release(wait_lock);
+ trx_mutex_exit(victim_trx);
+ lock_mutex_exit();
    }

wsrep_thd_awake(thd, signal);

fixes it. (Though there is another crash but not related)

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-10-02:

#4

Getting

""
2013-10-03 02:35:50 13788 [ERROR] /pxc56/bin/mysqld: Sort aborted: Deadlock found when trying to get lock; try restarting transaction
WSREP: conc slot cancel not supported
WSREP: conc slot cancel not supported
WSREP: conc slot cancel not supported
2013-10-03 02:35:52 13788 [ERROR] /pxc56/bin/mysqld: Sort aborted: Query execution was interrupted
2013-10-03 02:35:52 7fb430336700 InnoDB: Assertion failure in thread 140411879515904 in file lock0lock.cc line 861
InnoDB: Failing assertion: lock_get_wait(lock)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.

""
from:

#5 0x00007fb4429e97d8 in abort () from /usr/lib/libc.so.6
#6 0x000000000099c5d0 in lock_reset_lock_and_trx_wait (lock=lock@entry=0x7fb3c408e2e8) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:861
#7 0x00000000009a3bac in lock_cancel_waiting_and_release (lock=lock@entry=0x7fb3c408e2e8) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:7123
#8 0x000000000096a595 in wsrep_innobase_kill_one_trx (bf_trx=bf_trx@entry=0x7fb3bc007cf8, victim_trx=victim_trx@entry=0x7fb3c40435b8, signal=signal@entry=1)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:17278
#9 0x000000000096ada9 in wsrep_abort_transaction (hton=<optimized out>, bf_thd=<optimized out>, victim_thd=0x32b2dd0, signal=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:17372
#10 0x0000000000648e1a in ha_wsrep_abort_transaction (bf_thd=bf_thd@entry=0x3246100, victim_thd=victim_thd@entry=0x32b2dd0, signal=signal@entry=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/handler.cc:7747
#11 0x000000000063b073 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x3246100, victim_thd_ptr=victim_thd_ptr@entry=0x32b2dd0, signal=signal@entry=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/wsrep_thd.cc:457

This implies if lock_mutex is needed earlier.

Getting

""
2013-10-03 02:35:50 13788 [ERROR] /pxc56/bin/mysqld: Sort aborted: Deadlock found when trying to get lock; try restarting transaction
WSREP: conc slot cancel not supported
WSREP: conc slot cancel not supported
WSREP: conc slot cancel not supported
2013-10-03 02:35:52 13788 [ERROR] /pxc56/bin/mysqld: Sort aborted: Query execution was interrupted
2013-10-03 02:35:52 7fb430336700  InnoDB: Assertion failure in thread 140411879515904 in file lock0lock.cc line 861
InnoDB: Failing assertion: lock_get_wait(lock)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.

""
from:

#5  0x00007fb4429e97d8 in abort () from /usr/lib/libc.so.6
#6  0x000000000099c5d0 in lock_reset_lock_and_trx_wait (lock=lock@entry=0x7fb3c408e2e8) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:861
#7  0x00000000009a3bac in lock_cancel_waiting_and_release (lock=lock@entry=0x7fb3c408e2e8) at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/lock/lock0lock.cc:7123
#8  0x000000000096a595 in wsrep_innobase_kill_one_trx (bf_trx=bf_trx@entry=0x7fb3bc007cf8, victim_trx=victim_trx@entry=0x7fb3c40435b8, signal=signal@entry=1)
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:17278
#9  0x000000000096ada9 in wsrep_abort_transaction (hton=<optimized out>, bf_thd=<optimized out>, victim_thd=0x32b2dd0, signal=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/storage/innobase/handler/ha_innodb.cc:17372
#10 0x0000000000648e1a in ha_wsrep_abort_transaction (bf_thd=bf_thd@entry=0x3246100, victim_thd=victim_thd@entry=0x32b2dd0, signal=signal@entry=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/handler.cc:7747
#11 0x000000000063b073 in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x3246100, victim_thd_ptr=victim_thd_ptr@entry=0x32b2dd0, signal=signal@entry=1 '\001')
    at /media/Tintin/Work/code/percona-xtradb-cluster/pxc56/Percona-Server/sql/wsrep_thd.cc:457

This implies if lock_mutex is needed earlier.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-10-02:

#5

lock_mutex and trx_mutex are required much earlier. (otherwise it fails as above)

================================================================================

=== modified file 'Percona-Server/storage/innobase/handler/ha_innodb.cc'
--- Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-10-02 09:14:44 +0000
+++ Percona-Server/storage/innobase/handler/ha_innodb.cc 2013-10-02 21:30:59 +0000
@@ -17364,8 +17364,12 @@

        if (victim_trx)
        {
+ lock_mutex_enter();
+ trx_mutex_enter(victim_trx);
                int rcode = wsrep_innobase_kill_one_trx(bf_trx, victim_trx,
                                                        signal);
+ trx_mutex_exit(victim_trx);
+ lock_mutex_exit();
                wsrep_srv_conc_cancel_wait(victim_trx);
                DBUG_RETURN(rcode);
        } else {

================================================
This is because other callers (from lock0lock.cc) to wsrep_innobase_kill_one_trx hold lock_mutex and trx_mutex.

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2013-11-05:

#7

Conflict tests are passing now, later fixes pushed in revisions:

http://bazaar.launchpad.net/~codership/codership-mysql/5.6/revision/3957 and
http://bazaar.launchpad.net/~codership/codership-mysql/5.6/revision/3958

Changed in codership-mysql:
status:	In Progress → Fix Committed

Teemu Ollakka (teemu-ollakka) on 2013-11-18

Changed in codership-mysql:
status:	Fix Committed → Fix Released

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

#8

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1459

	Status	Importance	Assigned to	Milestone
MySQL patches by Codership	Fix Released	Critical	Seppo Jaakola	MySQL patches by Codership 5.6.14-25.1
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Invalid	Undecided	Unassigned
5.6	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.14-25.1

MySQL patches by Codership

innodb: unprotected access to was_chosen_as_deadlock_victim

Bug Description

Other bug subscribers

Remote bug watches