WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with LOAD DATA INFILE

Bug #1292842 reported by Raghavendra D Prabhu
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Won't Fix
Critical
Unassigned
5.6
Fix Released
Critical
Krunal Bauskar

Bug Description

This is reproduceable with load-data grammar in bzr+ssh://bazaar.launchpad.net/~raghavendra-prabhu/randgen/pxc/

#0 0x00007f69e86b90c1 in pthread_kill () from /usr/lib/libpthread.so.0
#1 0x00000000006a951f in handle_fatal_signal (sig=6) at /tmp/bld1/sql/signal_handler.cc:250
#2 <signal handler called>
#3 0x00007f69e6c53389 in raise () from /usr/lib/libc.so.6
#4 0x00007f69e6c54788 in abort () from /usr/lib/libc.so.6
#5 0x00007f69e61bc1bc in galera::FSM<galera::TrxHandle::State, galera::TrxHandle::Transition, galera::EmptyGuard, galera::EmptyAction>::shift_to (this=this@entry=0x7f69380d2cf0,
    state=state@entry=galera::TrxHandle::S_ROLLED_BACK) at galera/src/fsm.hpp:81
#6 0x00007f69e61b36c6 in set_state (state=galera::TrxHandle::S_ROLLED_BACK, this=0x7f69380d2c60) at galera/src/trx_handle.hpp:229
#7 galera::ReplicatorSMM::post_rollback (this=0x171b9d0, trx=0x7f69380d2c60) at galera/src/replicator_smm.cpp:964
#8 0x00007f69e61c4589 in galera_post_rollback (gh=<optimized out>, trx_handle=0x29eb7b8) at galera/src/wsrep_provider.cpp:380
#9 0x0000000000659cb8 in wsrep_rollback (thd=0x29e9b60, all=<optimized out>, hton=<optimized out>) at /tmp/bld1/sql/wsrep_hton.cc:208
#10 0x0000000000659d88 in wsrep_rollback (hton=<optimized out>, thd=<optimized out>, all=<optimized out>) at /tmp/bld1/sql/wsrep_hton.cc:196
#11 0x00000000006abe0e in ha_rollback_trans (thd=thd@entry=0x29e9b60, all=<optimized out>) at /tmp/bld1/sql/handler.cc:1615
#12 0x00000000006ac222 in ha_commit_trans (thd=thd@entry=0x29e9b60, all=all@entry=false) at /tmp/bld1/sql/handler.cc:1459
#13 0x0000000000650529 in trans_commit_stmt (thd=thd@entry=0x29e9b60) at /tmp/bld1/sql/transaction.cc:383
#14 0x000000000059cfe1 in mysql_execute_command (thd=thd@entry=0x29e9b60) at /tmp/bld1/sql/sql_parse.cc:5157
#15 0x00000000005a4cc1 in mysql_parse (thd=thd@entry=0x29e9b60, rawbuf=rawbuf@entry=0x7f6938004ba0 "LOAD DATA INFILE '/tmp/gentest18131.tmp' INTO TABLE `table1_innodb_compressed_key_pk_parts_2_int`",
    length=length@entry=97, parser_state=parser_state@entry=0x7f69a04f8620) at /tmp/bld1/sql/sql_parse.cc:6475
#16 0x00000000005a563c in wsrep_mysql_parse (thd=thd@entry=0x29e9b60, rawbuf=0x7f6938004ba0 "LOAD DATA INFILE '/tmp/gentest18131.tmp' INTO TABLE `table1_innodb_compressed_key_pk_parts_2_int`", length=97,
    parser_state=parser_state@entry=0x7f69a04f8620) at /tmp/bld1/sql/sql_parse.cc:6279
#17 0x00000000005a7c37 in dispatch_command (command=<optimized out>, command@entry=COM_QUERY, thd=thd@entry=0x29e9b60,
    packet=packet@entry=0x296e0c1 " LOAD DATA INFILE '/tmp/gentest18131.tmp' INTO TABLE `table1_innodb_compressed_key_pk_parts_2_int`", packet_length=packet_length@entry=98) at /tmp/bld1/sql/sql_parse.cc:1241
#18 0x00000000005a8c52 in do_command (thd=0x29e9b60) at /tmp/bld1/sql/sql_parse.cc:870
#19 0x0000000000642e98 in do_handle_one_connection (thd_arg=thd_arg@entry=0x29e9b60) at /tmp/bld1/sql/sql_connect.cc:1426
#20 0x00000000006430aa in handle_one_connection (arg=0x29e9b60) at /tmp/bld1/sql/sql_connect.cc:1338
#21 0x00007f69e86b40a2 in start_thread () from /usr/lib/libpthread.so.0
#22 0x00007f69e6d03d1d in clone () from /usr/lib/libc.so.6

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Backtraces

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Datadir

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

wsrep_load_data_splitting=ON?

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Actually I got it with/out wsrep_load_data_splitting=ON (it was first suspicion).

summary: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with LOAD
- DATA INFILE
+ DATA INFILE in 5.5
Revision history for this message
Mark Grennan (mgrennan) wrote : Re: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with LOAD DATA INFILE in 5.5
Download full text (3.5 KiB)

This is from the error log:

140618 15:28:21 [Warning] WSREP: SQL statement was ineffective, THD: 14204276, buf: 139952
QUERY: LOAD DATA LOCAL INFILE "tmp20131.txt.001" REPLACE INTO TABLE point_obs.metar_nc (date_time,stn,lat,lon,data_source,temp_F,dewpt_F,dir,vrb_wind,speed_mps,gust_mps,slp_mb,altim_mb,visib_m,cld_cover,cavok,precip_type,tmin6_F,tmax6_F,tmin24_F,tmax24_F,snow_mm,snow24_mm,precip_1hr_mm,precip_24hr_mm,weather,waveht_m,wavedir,waveper,sst_F,sensor_status,auto_indicator,raw_metar)
 => Skipping replication
140618 15:28:21 [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK
15:28:21 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=501
max_threads=502
thread_count=66
connection_count=66
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1131562 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0xd7dfb60
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fc8daa17d68 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7eca55]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x6bf674]
/lib64/libpthread.so.0[0x392740f710]
/lib64/libc.so.6(gsignal+0x35)[0x3927032925]
/lib64/libc.so.6(abort+0x175)[0x3927034105]
/usr/lib64/libgalera_smm.so(_ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2_+0x2d9)[0x7fcc8dd3dfb9]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM13post_rollbackEPNS_9TrxHandleE+0x2e)[0x7fcc8dd58cee]
/usr/lib64/libgalera_smm.so(galera_post_rollback+0x61)[0x7fcc8dd706e1]
/usr/sbin/mysqld[0x66d2c9]
/usr/sbin/mysqld(_Z17ha_rollback_transP3THDb+0xb9)[0x6c2709]
/usr/sbin/mysqld(_Z19trans_rollback_stmtP3THD+0x27)[0x656cf7]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2a4)[0x59e244]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x33b)[0x5a3cfb]
/usr/sbin/mysqld[0x5a3e62]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x254c)[0x5a731c]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x167)[0x5a7c67]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x14f)[0x6483ff]
/usr/sbin/mysqld(handle_one_connection+0x51)[0x6485e1]
/lib64/libpthread.so.0[0x39274079d1]
/lib64/libc.so.6(clone+0x6d)[0x39270e8b6d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fc771f64850): is an invalid pointer
Connection ID (thread ID): 14203726
Status: NOT_KILLED

You may download the Percona XtraDB Cluster ope...

Read more...

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Mark,

a)
Is this reproducible with

LOAD DATA LOCAL INFILE "tmp20131.txt.001" REPLACE INTO TABLE point_obs.metar_nc
(date_time,stn,lat,lon,data_source,temp_F,dewpt_F,dir,vrb_wind,speed_mps,gust_mps,slp_mb,altim_mb,visib_m,cld_cover,cavok,precip_type,tmin6_F,tmax6_F,tmin24_F,tmax24_F,snow_mm,snow24_mm,precip_1hr_mm,precip_24hr_mm,weather,waveht_m,wavedir,waveper,sst_F,sensor_status,auto_indicator,raw_metar

in a repeatable way?

b) Are there more than one LOAD DATA INFILE running (the RQG does more than one
in concurrently).

c) Do you have wsrep-load-data-splitting ON or OFF?

d) Can you also share your my.cnf/error log etc.

Revision history for this message
Mark Grennan (mgrennan) wrote : RE: [Bug 1292842] Re: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with LOAD DATA INFILE in 5.5
Download full text (6.1 KiB)

a) Yes - in that it happens often. I have not tried to reduce it down to a simple example.

b) To be deterred. I don't think so at this time. I'll turn on slow logs with time set to zero and do a review.

c) +---------------------------+-------+
    | Variable_name | Value |
    +---------------------------+-------+
    | wsrep_load_data_splitting | ON |
    +---------------------------+-------+

d) all attached them to the report now
Mark Grennan | Database Administrator
Mobile: +1 (405) 343-9332 | <email address hidden>
Live 24/7 Support +1 (866) 429-5578 | <email address hidden>

Weather Decision Technologies, Inc - Office: +1 (405) 579-7675 260 | Fax: +1 (405) 579-7780
201 David L Boren Blvd, Suite 270 Norman OK 73072

READ CAREFULLY. The information in this electronic mail and any attachments may
be proprietary, private or otherwise confidential and are transmitted for the
exclusive and restricted use of the intended recipient. If the reader of this
message is not the intended recipient, you are notified that retention, use,
dissemination, distribution, or copying of this message or any attachments, or
the use of any information transmitted, is strictly prohibited. If you receive
this message in error, please notify us immediately by electronic mail or
telephone +1 (405) 579-7675 and delete this message and any attachments.

.

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Raghavendra D Prabhu
Sent: Friday, June 20, 2014 9:54 AM
To: Mark Grennan
Subject: [Bug 1292842] Re: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with LOAD DATA INFILE in 5.5
Importance: High

@Mark,

a)
Is this reproducible with

LOAD DATA LOCAL INFILE "tmp20131.txt.001" REPLACE INTO TABLE point_obs.metar_nc (date_time,stn,lat,lon,data_source,temp_F,dewpt_F,dir,vrb_wind,speed_mps,gust_mps,slp_mb,altim_mb,visib_m,cld_cover,cavok,precip_type,tmin6_F,tmax6_F,tmin24_F,tmax24_F,snow_mm,snow24_mm,precip_1hr_mm,precip_24hr_mm,weather,waveht_m,wavedir,waveper,sst_F,sensor_status,auto_indicator,raw_metar

in a repeatable way?

b) Are there more than one LOAD DATA INFILE running (the RQG does more than one
in concurrently).

c) Do you have wsrep-load-data-splitting ON or OFF?

d) Can you also share your my.cnf/error log etc.

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1292842

Title:
  WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with LOAD
  DATA INFILE in 5.5

Status in Percona XtraDB Cluster - HA scalable solution for MySQL:
  New
Status in Percona XtraDB Cluster 5.6 series:
  New
Status in Percona XtraDB Cluster trunk series:
  New

Bug description:
  This is reproduceable with load-data grammar in
  bzr+ssh://bazaar.launchpad.net/~raghavendra-prabhu/randgen/pxc/

  #0 0x00007f69e86b90c1 in pthread_kill () from /usr/lib/libpthread.so.0
  #1 0x00000000006a951f in handle_fatal_signal (sig=6) at /tmp/bld1/sql/signal_handler.cc:250
  #2 <signal handler called>
  #3 0x00007f69e6c53389 in raise () from /usr/lib/libc.so.6
...

Read more...

summary: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with LOAD
- DATA INFILE in 5.5
+ DATA INFILE
tags: added: load-data
tags: added: i59934
Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :
Download full text (6.6 KiB)

commit 794f3cddb0c194767a760dc51b30b00ab94c55ac
Merge: 304970e be2dd53
Author: Krunal Bauskar <email address hidden>
Date: Mon Nov 16 19:49:55 2015 +0530

    Merge pull request #33 from kbauskar/3.x-pxc-456

    - PXC#456: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BAC…

commit be2dd5305479d621c94ba26992610efd84ca9752
Author: Krunal Bauskar <email address hidden>
Date: Mon Nov 16 10:58:42 2015 +0530

    - PXC#456: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with
      LOAD DATA INFILE

      Issue:
      -----

      LDI for that matter DML statement can fail due to multiple reasons.
      Some probable reasons are:
      - Creating table w/o pk and setting wsrep_certify_nonPK = off
      - Existing bug that causes partitioned table LDI to fail.
      ....etc.

      Statement failure will skip append_key which besides appending key also
      set valid trx_id.
      Such failed statements are rolled back with trx_id = default.
      Galera-Plugin try to check if there is an existing Trx Object with
      given trx_id before creating a new one.

      If there are 2 independent connections (connected to same cluster node)
      and both of these connections execute a failing statement then
      both of them will try to rollback with trx_id = default.

      Logic that cached trx_id to trx-object never considered this situation
      and one of the such connection will get reference to a object that belongs
      to other connection which is logically wrong as both connection are unrelated.
      This also causes operational in-consistency as latter connection accesses
      state already modified by former connection.
      (Causing the famous ROLLBACK -> ROLLBACK assert).

      Solution(s):
      -----------

commit 794f3cddb0c194767a760dc51b30b00ab94c55ac
Merge: 304970e be2dd53
Author: Krunal Bauskar <email address hidden>
Date: Mon Nov 16 19:49:55 2015 +0530

    Merge pull request #33 from kbauskar/3.x-pxc-456

    - PXC#456: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BAC…

commit be2dd5305479d621c94ba26992610efd84ca9752
Author: Krunal Bauskar <email address hidden>
Date: Mon Nov 16 10:58:42 2015 +0530

    - PXC#456: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with
      LOAD DATA INFILE

      Issue:
      -----

      LDI for that matter DML statement can fail due to multiple reasons.
      Some probable reasons are:
      - Creating table w/o pk and setting wsrep_certify_nonPK = off
      - Existing bug that causes partitioned table LDI to fail.
      ....etc.

      Statement failure will skip append_key which besides appending key also
      set valid trx_id.
      Such failed statements are rolled back with trx_id = default.
      Galera-Plugin try to check if there is an existing Trx Object with
      given trx_id before creating a new one.

      If there are 2 independent connections (connected to same cluster node)
      and both of these connections execute a failing statement then
      both of them will try to rollback with trx_id = default.

      Logic that cached trx_id to trx...

Read more...

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-931

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.