The problem turned out to be with a race condition between "srv_error_monitor_thread" in srv0srv.cc
while (trx) {
if (!trx_state_eq(trx, TRX_STATE_NOT_STARTED)
&& trx_state_eq(trx, TRX_STATE_ACTIVE)
&& trx->mysql_thd
&& innobase_thd_is_idle(trx->mysql_thd))
{...}
...
}
and a query execution thread like the one in comment #11, in particular "trx_commit_in_memory" in trx0trx.cc
...
ut_ad(trx_state_eq(trx, TRX_STATE_ACTIVE));
trx->state = TRX_STATE_NOT_STARTED;
read_view_remove(trx->global_read_view, false);
MONITOR_INC(MONITOR_TRX_NL_RO_COMMIT);
...
The case which causes the assertion (and the crash) is the following
1. thread 1("srv_error_monitor_thread") is running with trx->state == TRX_STATE_ACTIVE
2. thread 1 executes the first part in the "if" statement condition
!trx_state_eq(trx, TRX_STATE_NOT_STARTED)
3. trx_state_eq() in thread 1 returns "false" and therefore the next part of the && operator is going to be evaluated.
4. context switches to thread 2 (query execution thread)
5. thread 2 changes transaction state to TRX_STATE_NOT_STARTED
trx->state = TRX_STATE_NOT_STARTED;
6. at some point context changes back to thread 1 ("srv_error_monitor_thread")
7. the next part of the && condition of the "if" statement is evaluated
&& trx_state_eq(trx, TRX_STATE_ACTIVE)
with trx->state == TRX_STATE_NOT_STARTED (changed by thread 2)
8. inside trx_state_eq() we run into an assertion
switch (trx->state) {
...
case TRX_STATE_NOT_STARTED:
/* This state is not allowed for running transactions. */ ut_a(state == TRX_STATE_NOT_STARTED);
...
}
At the same time the following two assertions
ut_ad(!trx->in_rw_trx_list);
ut_ad(!trx->in_ro_trx_list);
pass without any problems.
In other words it was
ut_a(state == TRX_STATE_NOT_STARTED);
causing the crash, not the following two "in_rw_trx_list" and "in_ro_trx_list" checks.
The suggested fix is to rework
if (!trx_state_eq(trx, TRX_STATE_NOT_STARTED)
&& trx_state_eq(trx, TRX_STATE_ACTIVE)
&& ...)
statement
to just
if (trx_state_eq(trx, TRX_STATE_ACTIVE)
&& ...)
and to remove
ut_a(state == TRX_STATE_NOT_STARTED);
check from the trx_state_eq()
The problem turned out to be with a race condition between "srv_error_ monitor_ thread" in srv0srv.cc NOT_STARTED) thd_is_ idle(trx- >mysql_ thd))
while (trx) {
if (!trx_state_eq(trx, TRX_STATE_
&& trx_state_eq(trx, TRX_STATE_ACTIVE)
&& trx->mysql_thd
&& innobase_
{...}
...
}
and a query execution thread like the one in comment #11, in particular "trx_commit_ in_memory" in trx0trx.cc state_eq( trx, TRX_STATE_ACTIVE)); NOT_STARTED; remove( trx->global_ read_view, false); INC(MONITOR_ TRX_NL_ RO_COMMIT) ;
...
ut_ad(trx_
trx->state = TRX_STATE_
read_view_
MONITOR_
...
The case which causes the assertion (and the crash) is the following error_monitor_ thread" ) is running with trx->state == TRX_STATE_ACTIVE trx_state_ eq(trx, TRX_STATE_ NOT_STARTED) NOT_STARTED NOT_STARTED; monitor_ thread" ) NOT_STARTED (changed by thread 2) NOT_STARTED:
ut_a( state == TRX_STATE_ NOT_STARTED) ;
1. thread 1("srv_
2. thread 1 executes the first part in the "if" statement condition
!
3. trx_state_eq() in thread 1 returns "false" and therefore the next part of the && operator is going to be evaluated.
4. context switches to thread 2 (query execution thread)
5. thread 2 changes transaction state to TRX_STATE_
trx->state = TRX_STATE_
6. at some point context changes back to thread 1 ("srv_error_
7. the next part of the && condition of the "if" statement is evaluated
&& trx_state_eq(trx, TRX_STATE_ACTIVE)
with trx->state == TRX_STATE_
8. inside trx_state_eq() we run into an assertion
switch (trx->state) {
...
case TRX_STATE_
/* This state is not allowed for running transactions. */
...
}
At the same time the following two assertions !trx->in_ rw_trx_ list); !trx->in_ ro_trx_ list);
ut_ad(
ut_ad(
pass without any problems.
In other words it was NOT_STARTED) ;
ut_a(state == TRX_STATE_
causing the crash, not the following two "in_rw_trx_list" and "in_ro_trx_list" checks.
The suggested fix is to rework NOT_STARTED)
if (!trx_state_eq(trx, TRX_STATE_
&& trx_state_eq(trx, TRX_STATE_ACTIVE)
&& ...)
statement
to just
if (trx_state_eq(trx, TRX_STATE_ACTIVE)
&& ...)
and to remove NOT_STARTED) ;
ut_a(state == TRX_STATE_
check from the trx_state_eq()