InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED in file trx0trx.ic line 60 | handle_fatal_signal (sig=6) in raise > abort > trx_state_eq

Bug #1206008 reported by Roel Van de Paar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Yura Sorokin
5.1
Invalid
Undecided
Unassigned
5.5
Invalid
Undecided
Unassigned
5.6
Fix Released
High
Yura Sorokin

Bug Description

Thread 1 (LWP 23532):
+bt
#0 0x000000340cc0c69c in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000000686bde in handle_fatal_signal (sig=6) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/sql/signal_handler.cc:251
#2 <signal handler called>
#3 0x000000340c4328a5 in raise () from /lib64/libc.so.6
#4 0x000000340c434085 in abort () from /lib64/libc.so.6
#5 0x0000000000afe474 in trx_state_eq (trx=<optimized out>, state=<optimized out>) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/include/trx0trx.ic:60
#6 0x0000000000b01e0c in srv_error_monitor_thread (arg=<optimized out>) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/srv/srv0srv.cc:2054
#7 0x000000340cc07851 in start_thread () from /lib64/libpthread.so.0
#8 0x000000340c4e890d in clone () from /lib64/libc.so.6

2013-07-29 04:32:58 23517 [Note] /ssd/Percona-Server-5.6.12-rc60.4-406-debug.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.6.12-rc60.4-debug-log' socket: '/ssd/329723/current1_8/tmp/master.sock' port: 13140 Percona Server with XtraDB (GPL), Release rc60.4, Revision 406-debug
2013-07-29 04:36:56 7fa2eb991700 InnoDB: Assertion failure in thread 140337714108160 in file trx0trx.ic line 60
InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

Adding another occurence

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

This is caused by the kill idle transaction feature.

One thread is waiting at
...
#5 0x0000000000b29a98 in pfs_mutex_enter_func (mutex=0x36522d8, file_name=0xdde870 "/bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/include/read0read.ic", line=117) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/include/sync0sync.ic:251
#6 0x0000000000b2c298 in read_view_remove (own_mutex=false, view=0x7f522c0419a0) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/include/read0read.ic:117
#7 trx_commit_in_memory (trx=0x7f522c000b28, lsn=0) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/trx/trx0trx.cc:1352
#8 0x0000000000b2e278 in trx_commit_low (trx=0x7f522c000b28, mtr=0x0) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/trx/trx0trx.cc:1567
#9 0x0000000000b2e616 in trx_commit (trx=<optimized out>) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/trx/trx0trx.cc:1588
#10 0x0000000000b2e6ab in trx_commit_for_mysql (trx=0x7f522c000b28) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/trx/trx0trx.cc:1816
...

at this point its trx->state == TRX_STATE_NOT_STARTED, but it's going to present in the InnoDB trx lists. Then kill idle transaction feature at the InnoDB error monitor thread finds this thread and calls, among other things, trx_state_eq(trx, TRX_STATE_ACTIVE), which asserts.

One possible fix is to relax trx_state_eq() asserts.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Or not using trx_state_eq(), just like 5.5 does.

tags: added: 56qual
Revision history for this message
Roel Van de Paar (roel11) wrote :

kill idle trx feature, the fix is clear, the bug is debug only, so --56qual, ++target next release

tags: added: kit
tags: removed: 56qual
tags: added: low-hanging-fruit
tags: added: kill-idle-trx
removed: kit low-hanging-fruit
Revision history for this message
Yura Sorokin (yura-sorokin) wrote :

The problem turned out to be with a race condition between "srv_error_monitor_thread" in srv0srv.cc
while (trx) {
  if (!trx_state_eq(trx, TRX_STATE_NOT_STARTED)
       && trx_state_eq(trx, TRX_STATE_ACTIVE)
       && trx->mysql_thd
       && innobase_thd_is_idle(trx->mysql_thd))
  {...}
  ...
}

and a query execution thread like the one in comment #11, in particular "trx_commit_in_memory" in trx0trx.cc
...
ut_ad(trx_state_eq(trx, TRX_STATE_ACTIVE));
trx->state = TRX_STATE_NOT_STARTED;
read_view_remove(trx->global_read_view, false);
MONITOR_INC(MONITOR_TRX_NL_RO_COMMIT);
...

The case which causes the assertion (and the crash) is the following
1. thread 1("srv_error_monitor_thread") is running with trx->state == TRX_STATE_ACTIVE
2. thread 1 executes the first part in the "if" statement condition
      !trx_state_eq(trx, TRX_STATE_NOT_STARTED)
3. trx_state_eq() in thread 1 returns "false" and therefore the next part of the && operator is going to be evaluated.
4. context switches to thread 2 (query execution thread)
5. thread 2 changes transaction state to TRX_STATE_NOT_STARTED
      trx->state = TRX_STATE_NOT_STARTED;
6. at some point context changes back to thread 1 ("srv_error_monitor_thread")
7. the next part of the && condition of the "if" statement is evaluated
      && trx_state_eq(trx, TRX_STATE_ACTIVE)
      with trx->state == TRX_STATE_NOT_STARTED (changed by thread 2)
8. inside trx_state_eq() we run into an assertion
      switch (trx->state) {
        ...
        case TRX_STATE_NOT_STARTED:
          /* This state is not allowed for running transactions. */
          ut_a(state == TRX_STATE_NOT_STARTED);
          ...
       }

At the same time the following two assertions
  ut_ad(!trx->in_rw_trx_list);
  ut_ad(!trx->in_ro_trx_list);
pass without any problems.

In other words it was
  ut_a(state == TRX_STATE_NOT_STARTED);
causing the crash, not the following two "in_rw_trx_list" and "in_ro_trx_list" checks.

The suggested fix is to rework
  if (!trx_state_eq(trx, TRX_STATE_NOT_STARTED)
       && trx_state_eq(trx, TRX_STATE_ACTIVE)
       && ...)
statement
to just
  if (trx_state_eq(trx, TRX_STATE_ACTIVE)
      && ...)

and to remove
  ut_a(state == TRX_STATE_NOT_STARTED);
check from the trx_state_eq()

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-705

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.