WSREP: FSM: no such a transition COMMITTING -> REPLICATING or REPLICATING -> REPLICATING in shift_to call

Bug #1734312 reported by Sveta Smirnova on 2017-11-24
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Invalid
Undecided
Unassigned
5.6
Invalid
Undecided
Unassigned
5.7
Fix Committed
Medium
Unassigned

Bug Description

PXC crashes when executes events, calling stored routine with BEGIN TRANSACTION ... COMMIT block in parallel.

How to repeat:

1. cd YOUR_PXC_INSTALLATION_DIR/mysql-test
2. ./mtr --suite=galera_3nodes --mysqld=--wsrep-provider=`pwd`/../lib/libgalera_smm.so --start galera_certification_ccc &
3. alias mysqlmtr='../bin/mysql -h127.0.0.1 -P13000 -uroot'
4. mysqlmtr -e "set global event_scheduler=1"
5. for i in `seq 1 1 100`; do mysqlmtr -P13000 test -e "create table t_$i(id int not null auto_increment primary key, f1 int) engine=innodb"; mysqlmtr -P13000 test --delimiter='|' -e "create procedure p_$i() begin declare i int; start transaction; set i = 0; while i < 100 do insert into t_$i(f1) values(i);set i = i+1;end while;commit;end"; mysqlmtr -P13000 test --delimiter='|' -e "create event ev_$i on schedule every 1 minute do call p_$i()"; done
6. Wait for a crash

2017-11-24T10:28:23.468528Z 313 [Note] Event Scheduler: scheduler thread started with id 313
2017-11-24T10:28:27.481413Z 318 [ERROR] WSREP: FSM: no such a transition COMMITTING -> REPLICATING
10:28:27 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=3
max_threads=152
thread_count=6
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61228 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f12e400a790
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f133d4adc0f thread_stack 0x40000
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(my_print_stacktrace+0x2e)[0x103684e]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(handle_fatal_signal+0x4a1)[0x959511]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f134fd01390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f134dd9b428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f134dd9d02a]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/../lib/libgalera_smm.so(_ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2_+0x1ae)[0x7f134925224e]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/../lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM9replicateEPNS_9TrxHandleEP14wsrep_trx_meta+0xbb)[0x7f1349246bfb]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/mysql-test/../lib/libgalera_smm.so(galera_replicate+0xfb)[0x7f134925a09b]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z15wsrep_replicateP3THD+0x9eb)[0xefae5b]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z14ha_prepare_lowP3THDb+0x76)[0x9c8936]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z15ha_commit_transP3THDbb+0x180)[0x9c7fb0]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z12trans_commitP3THD+0x56)[0xecf826]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z21mysql_execute_commandP3THDb+0x30f8)[0xe2c7b8]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x5d)[0xdb4bdd]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN12sp_lex_instr23reset_lex_and_exec_coreEP3THDPjb+0x35c)[0xdb617c]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN12sp_lex_instr29validate_lex_and_execute_coreEP3THDPjb+0x98)[0xdb6cf8]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x268)[0xdb7a58]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN7sp_head7executeEP3THDb+0x76a)[0xdb28aa]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN7sp_head17execute_procedureEP3THDP4ListI4ItemE+0x70e)[0xdb330e]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_Z21mysql_execute_commandP3THDb+0x48b2)[0xe2df72]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x5d)[0xdb4bdd]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN12sp_lex_instr23reset_lex_and_exec_coreEP3THDPjb+0x35c)[0xdb617c]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN12sp_lex_instr29validate_lex_and_execute_coreEP3THDPjb+0x98)[0xdb6cf8]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x268)[0xdb7a58]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN7sp_head7executeEP3THDb+0x76a)[0xdb28aa]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN7sp_head17execute_procedureEP3THDP4ListI4ItemE+0x70e)[0xdb330e]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN14Event_job_data7executeEP3THDb+0x6bf)[0xef4b7f]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(_ZN19Event_worker_thread3runEP3THDP28Event_queue_element_for_exec+0x174)[0xfa3074]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(event_worker_thread+0x4f)[0xfa313f]
/home/sveta/mysql_packages/Percona-XtraDB-Cluster-5.7.19-rel17-29.22.3.Linux.x86_64.ssl100/bin/mysqld(pfs_spawn_thread+0x1b4)[0x10a9164]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f134fcf76ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f134de6d3dd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f12e001f1d0): commit
Connection ID (thread ID): 318
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
Writing a core file
safe_process[1046]: Child process: 1047, killed by signal: 6

Sveta Smirnova (svetasmirnova) wrote :

Bug is not repeatable with version 5.6

tags: added: regression
Sveta Smirnova (svetasmirnova) wrote :

Actually BEGIN TRANSACTION ... COMMIT is not required. Crash is repeatable with:

for i in `seq 1 1 100`; do mysqlmtr -P13000 test -e "create table t_$i(id int not null auto_increment primary key, f1 int) engine=innodb"; mysqlmtr -P13000 test --delimiter='|' -e "create procedure p_$i() begin declare i int; set i = 0; while i < 100 do insert into t_$i(f1) values(i);set i = i+1;end while;end"; mysqlmtr -P13000 test --delimiter='|' -e "create event ev_$i on schedule every 1 minute do call p_$i()"; done

Krunal Bauskar (krunal-bauskar) wrote :

commit 12a9e13401b4d9e7b781034eddf04ffe957c07d0
Author: Krunal Bauskar <email address hidden>
Date: Wed Nov 29 17:17:22 2017 +0530

    - PXC#806: Early read of query-id result in stale trx handle assignment.

      * During an execution of a Stored Procedure (SP) as a background
        action (say through event scheduler), query-id get assigned at
        later stage, before PXC captures it to derive wsrep_next_trx_id.

      * PXC uses this wsrep_next_trx_id as unique id to maintain
        TrxHandle and thread mapping.

      * This early capture of query-id resulted in assigning an invalid
        wsrep_next_trx_id (read 0). Problem starts if multiple threads
        gets the same invalid (read 0) id as their unique id to look
        for TrxHandle that eventually cause sharing of TrxHandle.

      Solution:
      --------

      * Fixed by ensuring that when PXC reads query_id, it is a valid id.
        If not, then a valid query_id is generated.
        (MySQL flow anyway generates it but at later stage).

      * We could have also corrected it by skipping setting of
        wsrep_next_trx_id at the said stage and waiting for MySQL to
        generate it but that means a drill-down code inside
        the main logic. So keeping it at outer layer
        makes it more manageable.

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-2020

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers