InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED in file trx0trx.ic line 60 | handle_fatal_signal (sig=6) in raise > abort > trx_state_eq

Bug #1206008 reported by Roel Van de Paar on 2013-07-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Yura Sorokin
5.1
Invalid
Undecided
Unassigned
5.5
Invalid
Undecided
Unassigned
5.6
Fix Released
High
Yura Sorokin

Bug Description

Thread 1 (LWP 23532):
+bt
#0 0x000000340cc0c69c in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000000686bde in handle_fatal_signal (sig=6) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/sql/signal_handler.cc:251
#2 <signal handler called>
#3 0x000000340c4328a5 in raise () from /lib64/libc.so.6
#4 0x000000340c434085 in abort () from /lib64/libc.so.6
#5 0x0000000000afe474 in trx_state_eq (trx=<optimized out>, state=<optimized out>) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/include/trx0trx.ic:60
#6 0x0000000000b01e0c in srv_error_monitor_thread (arg=<optimized out>) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/srv/srv0srv.cc:2054
#7 0x000000340cc07851 in start_thread () from /lib64/libpthread.so.0
#8 0x000000340c4e890d in clone () from /lib64/libc.so.6

2013-07-29 04:32:58 23517 [Note] /ssd/Percona-Server-5.6.12-rc60.4-406-debug.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.6.12-rc60.4-debug-log' socket: '/ssd/329723/current1_8/tmp/master.sock' port: 13140 Percona Server with XtraDB (GPL), Release rc60.4, Revision 406-debug
2013-07-29 04:36:56 7fa2eb991700 InnoDB: Assertion failure in thread 140337714108160 in file trx0trx.ic line 60
InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED

Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :

Adding another occurence

Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :

This is caused by the kill idle transaction feature.

One thread is waiting at
...
#5 0x0000000000b29a98 in pfs_mutex_enter_func (mutex=0x36522d8, file_name=0xdde870 "/bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/include/read0read.ic", line=117) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/include/sync0sync.ic:251
#6 0x0000000000b2c298 in read_view_remove (own_mutex=false, view=0x7f522c0419a0) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/include/read0read.ic:117
#7 trx_commit_in_memory (trx=0x7f522c000b28, lsn=0) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/trx/trx0trx.cc:1352
#8 0x0000000000b2e278 in trx_commit_low (trx=0x7f522c000b28, mtr=0x0) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/trx/trx0trx.cc:1567
#9 0x0000000000b2e616 in trx_commit (trx=<optimized out>) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/trx/trx0trx.cc:1588
#10 0x0000000000b2e6ab in trx_commit_for_mysql (trx=0x7f522c000b28) at /bzr/bug1191582_dbg/Percona-Server-5.6.12-rc60.4/storage/innobase/trx/trx0trx.cc:1816
...

at this point its trx->state == TRX_STATE_NOT_STARTED, but it's going to present in the InnoDB trx lists. Then kill idle transaction feature at the InnoDB error monitor thread finds this thread and calls, among other things, trx_state_eq(trx, TRX_STATE_ACTIVE), which asserts.

One possible fix is to relax trx_state_eq() asserts.

Or not using trx_state_eq(), just like 5.5 does.

tags: added: 56qual
Roel Van de Paar (roel11) wrote :

kill idle trx feature, the fix is clear, the bug is debug only, so --56qual, ++target next release

tags: added: kit
tags: removed: 56qual
tags: added: low-hanging-fruit
tags: added: kill-idle-trx
removed: kit low-hanging-fruit
Yura Sorokin (yura-sorokin) wrote :

The problem turned out to be with a race condition between "srv_error_monitor_thread" in srv0srv.cc
while (trx) {
  if (!trx_state_eq(trx, TRX_STATE_NOT_STARTED)
       && trx_state_eq(trx, TRX_STATE_ACTIVE)
       && trx->mysql_thd
       && innobase_thd_is_idle(trx->mysql_thd))
  {...}
  ...
}

and a query execution thread like the one in comment #11, in particular "trx_commit_in_memory" in trx0trx.cc
...
ut_ad(trx_state_eq(trx, TRX_STATE_ACTIVE));
trx->state = TRX_STATE_NOT_STARTED;
read_view_remove(trx->global_read_view, false);
MONITOR_INC(MONITOR_TRX_NL_RO_COMMIT);
...

The case which causes the assertion (and the crash) is the following
1. thread 1("srv_error_monitor_thread") is running with trx->state == TRX_STATE_ACTIVE
2. thread 1 executes the first part in the "if" statement condition
      !trx_state_eq(trx, TRX_STATE_NOT_STARTED)
3. trx_state_eq() in thread 1 returns "false" and therefore the next part of the && operator is going to be evaluated.
4. context switches to thread 2 (query execution thread)
5. thread 2 changes transaction state to TRX_STATE_NOT_STARTED
      trx->state = TRX_STATE_NOT_STARTED;
6. at some point context changes back to thread 1 ("srv_error_monitor_thread")
7. the next part of the && condition of the "if" statement is evaluated
      && trx_state_eq(trx, TRX_STATE_ACTIVE)
      with trx->state == TRX_STATE_NOT_STARTED (changed by thread 2)
8. inside trx_state_eq() we run into an assertion
      switch (trx->state) {
        ...
        case TRX_STATE_NOT_STARTED:
          /* This state is not allowed for running transactions. */
          ut_a(state == TRX_STATE_NOT_STARTED);
          ...
       }

At the same time the following two assertions
  ut_ad(!trx->in_rw_trx_list);
  ut_ad(!trx->in_ro_trx_list);
pass without any problems.

In other words it was
  ut_a(state == TRX_STATE_NOT_STARTED);
causing the crash, not the following two "in_rw_trx_list" and "in_ro_trx_list" checks.

The suggested fix is to rework
  if (!trx_state_eq(trx, TRX_STATE_NOT_STARTED)
       && trx_state_eq(trx, TRX_STATE_ACTIVE)
       && ...)
statement
to just
  if (trx_state_eq(trx, TRX_STATE_ACTIVE)
      && ...)

and to remove
  ut_a(state == TRX_STATE_NOT_STARTED);
check from the trx_state_eq()

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-705

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers