Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

db crash during sql dump import

Bug #1294259 reported by John C. on 2014-03-18

This bug report is a duplicate of: Bug #1267507: cluster crashes on importing data. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	New	Undecided	Unassigned

Bug Description

full db cluster crash while restoring sql dump. Something to do with galera replication/memory allocation?

error log(inconsistent sometimes get signal 6, 11. Also have to note that was able to restore dump than using single node - initial cluster startup. After that stt to second node was successfull):

2014-03-18 12:17:09 55053 [Note] WSREP: rollbacker thread exiting
2014-03-18 12:17:09 55053 [Note] Giving 0 client threads a chance to die gracefully
2014-03-18 12:17:09 55053 [Note] Event Scheduler: Purging the queue. 0 events
2014-03-18 12:17:09 55053 [Note] Shutting down slave threads
2014-03-18 12:17:09 55053 [Note] Forcefully disconnecting 0 remaining clients
2014-03-18 12:17:09 55053 [Note] WSREP: dtor state: SYNCED
2014-03-18 12:17:09 55053 [Note] WSREP: Closing send monitor...
2014-03-18 12:17:09 55053 [Note] WSREP: Closed send monitor.
2014-03-18 12:17:09 55053 [Note] WSREP: mon: entered 1925 oooe fraction 0 oool fraction 0
2014-03-18 12:17:09 55053 [Note] WSREP: mon: entered 1925 oooe fraction 0 oool fraction 0
2014-03-18 12:17:09 55053 [Note] WSREP: mon: entered 1979 oooe fraction 0 oool fraction 0
2014-03-18 12:17:09 55053 [Note] WSREP: cert index usage at exit 0
2014-03-18 12:17:09 55053 [Note] WSREP: cert trx map usage at exit 129
2014-03-18 12:17:09 55053 [Note] WSREP: deps set usage at exit 0
2014-03-18 12:17:09 55053 [Note] WSREP: avg deps dist 66.0078
terminate called after throwing an instance of 'gu::Exception'
what(): Buffer too short: expected 17778, got 5888: 22 (Invalid argument)
at galera/src/key_set.cpp:throw_buffer_too_short():122
10:17:09 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=3
max_threads=153
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 69222 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x902445]
/usr/sbin/mysqld(handle_fatal_signal+0x4c4)[0x680114]
/lib64/libpthread.so.0(+0xf710)[0x7f7eb0375710]
/lib64/libc.so.6(gsignal+0x35)[0x7f7eae7bf925]
/lib64/libc.so.6(abort+0x175)[0x7f7eae7c1105]
/usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x12d)[0x7f7eaf079a5d]
/usr/lib64/libstdc++.so.6(+0xbcbe6)[0x7f7eaf077be6]
/usr/lib64/libstdc++.so.6(+0xbcc13)[0x7f7eaf077c13]
/usr/lib64/libstdc++.so.6(+0xbcd0e)[0x7f7eaf077d0e]
/usr/lib64/libgalera_smm.so(_ZN2gu10ThrowErrorD2Ev+0x1ac)[0x7f7e8f079d2c]
/usr/lib64/libgalera_smm.so(_ZN6galera6KeySet7KeyPart22throw_buffer_too_shortEmm+0x1c7)[0x7f7e8f166597]
/usr/lib64/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x7b)[0x7f7e8f173a1b]
/usr/lib64/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xd2)[0x7f7e8f17c252]
/usr/lib64/libgalera_smm.so(_ZN6galera13CertificationD2Ev+0x6c)[0x7f7e8f176c7c]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMMD1Ev+0xc1)[0x7f7e8f1a1e61]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMMD0Ev+0x9)[0x7f7e8f1a2499]
/usr/lib64/libgalera_smm.so(galera_tear_down+0x16)[0x7f7e8f1b5296]
/usr/sbin/mysqld(wsrep_unload+0x1f)[0xb8d12f]
/usr/sbin/mysqld(_Z12wsrep_deinitv+0x18)[0x5b1618]
/usr/sbin/mysqld[0x5ade25]
/usr/sbin/mysqld(kill_server_thread+0xe)[0x5ae03e]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xb3bdca]
/lib64/libpthread.so.0(+0x79d1)[0x7f7eb036d9d1]
/lib64/libc.so.6(clone+0x6d)[0x7f7eae875b6d]
You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
-------------------------------------------------------------------------------------------------------------------------------
16:11:04 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=3
max_threads=153
thread_count=3
connection_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 69222 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f8bd4000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f8bd8d20d98 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x902445]
/usr/sbin/mysqld(handle_fatal_signal+0x4c4)[0x680114]
/lib64/libpthread.so.0(+0xf710)[0x7f8c0581a710]
/lib64/libc.so.6(gsignal+0x35)[0x7f8c03c64925]
/lib64/libc.so.6(abort+0x175)[0x7f8c03c66105]
/usr/lib64/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x265)[0x7f8be45e4c05]
/usr/lib64/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xd2)[0x7f8be45ed252]
/usr/lib64/libgalera_smm.so(_ZN6galera13Certification16purge_trxs_upto_Elb+0x91)[0x7f8be45e7911]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM18process_commit_cutEll+0x88)[0x7f8be4615858]
/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x315)[0x7f8be45f3be5]
/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63)[0x7f8be45f4073]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x93)[0x7f8be4612983]
/usr/lib64/libgalera_smm.so(galera_recv+0x23)[0x7f8be4627673]
/usr/sbin/mysqld[0x5be0af]
/usr/sbin/mysqld(start_wsrep_THD+0x480)[0x5ae4d0]
/lib64/libpthread.so.0(+0x79d1)[0x7f8c058129d1]
/lib64/libc.so.6(clone+0x6d)[0x7f8c03d1ab6d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 1
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
140318 18:11:05 mysqld_safe Number of processes running now: 0
140318 18:11:05 mysqld_safe WSREP: not restarting wsrep node automatically
140318 18:11:05 mysqld_safe mysqld from pid file /var/run/mysql/mysql.pid ended

-------------------------------------------------------------------------------------------------------------------------------

2 node cluster with arbitrator. configs, just basic install for the proof of concept:

node1:

[mysqld]
innodb_buffer_pool_size=8192M
max_allowed_packet=1024M
datadir=/var/lib/mysql
user=mysql
pid-file=/var/run/mysql/mysql.pid

# Path to Galera library
wsrep_provider=/usr/lib64/libgalera_smm.so

# Cluster connection URL contains the IPs of node#1, node#2 and node#3
wsrep_cluster_address=gcomm://172.31.48.28,172.31.48.29

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node #1 address
wsrep_node_address=172.31.48.28

# SST method
wsrep_sst_method=xtrabackup

# Cluster name
wsrep_cluster_name=my_centos_cluster

# Authentication for SST method
wsrep_sst_auth="sstuser:s3cret"

node 2:
[mysqld]
innodb_buffer_pool_size=8192M
max_allowed_packet=1024M
datadir=/var/lib/mysql
user=mysql
pid-file=/var/run/mysql/mysql.pid

# Path to Galera library
wsrep_provider=/usr/lib64/libgalera_smm.so

# Cluster connection URL contains the IPs of node#1, node#2 and node#3
wsrep_cluster_address=gcomm://172.31.48.28,172.31.48.29

# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node #1 address
wsrep_node_address=172.31.48.28

# SST method
wsrep_sst_method=xtrabackup

# Cluster name
wsrep_cluster_name=my_centos_cluster

# Authentication for SST method
wsrep_sst_auth="sstuser:s3cret"

arbitrator:

/usr/bin/garbd -d -a gcomm://172.31.48.28:4567,172.31.48.29:4567 -g my_centos_cluster -o gmcast.listen_addr=tcp://172.31.48.27:4567

packages:
Percona-XtraDB-Cluster-galera-3-3.3-1.207.rhel6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.15-25.4.731.rhel6.x86_64
Percona-Server-shared-51-5.1.73-rel14.11.603.rhel6.x86_64
Percona-XtraDB-Cluster-client-56-5.6.15-25.4.731.rhel6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.15-25.4.731.rhel6.x86_64
Percona-XtraDB-Cluster-56-5.6.15-25.4.731.rhel6.x86_64
percona-xtrabackup-2.1.8-733.rhel6.x86_64
libstdc++-4.4.7-4.el6.x86_64
glibc-common-2.12-1.132.el6.x86_64
glibc-2.12-1.132.el6.x86_64

OS:
CentOS release 6.5 (Final)
Linux db2.tn.telecom.lt 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

testing on virtual hosts 1vcpu 12GB ram. db ~6GB

Revision history for this message

John C. (jonas-ciucelis) wrote on 2014-03-20:

duplicate of https://bugs.launchpad.net/galera/+bug/1267507 ?

Revision history for this message

John C. (jonas-ciucelis) wrote on 2014-03-20:

#0 0x00007f81059a1925 in raise () from /lib64/libc.so.6
#1 0x00007f81059a3105 in abort () from /lib64/libc.so.6
#2 0x00007f81003cdc05 in base_size (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/key_set.hpp:267
#3 serial_size (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/key_set.hpp:282
#4 serial_size (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/key_set.hpp:208
#5 serial_size (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/key_set.hpp:212
#6 next_base<galera::KeySet::KeyPart> (this=0x1e97d58, trx=0x7f80c496c990) at galerautils/src/gu_rset.hpp:346
#7 next (this=0x1e97d58, trx=0x7f80c496c990) at galerautils/src/gu_rset.hpp:420
#8 next (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/key_set.hpp:713
#9 galera::Certification::purge_for_trx_v3 (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/certification.cpp:120
#10 0x00007f81003d6252 in galera::Certification::PurgeAndDiscard::operator() (this=0x7f80da9ddff0, vt=...) at galera/src/certification.hpp:140
#11 0x00007f81003d0911 in for_each<std::_Rb_tree_iterator<std::pair<long const, galera::TrxHandle*> >, galera::Certification::PurgeAndDiscard> (this=0x1e97d58, seqno=69757, handle_gcache=true)
at /usr/include/c++/4.4.7/bits/stl_algo.h:4200
#12 galera::Certification::purge_trxs_upto_ (this=0x1e97d58, seqno=69757, handle_gcache=true) at galera/src/certification.cpp:937
#13 0x00007f81003fe858 in purge_trxs_upto (this=0x1e974c0, seq=69757, seqno_l=<value optimized out>) at galera/src/certification.hpp:75
#14 galera::ReplicatorSMM::process_commit_cut (this=0x1e974c0, seq=69757, seqno_l=<value optimized out>) at galera/src/replicator_smm.cpp:1247
#15 0x00007f81003dcbe5 in galera::GcsActionSource::dispatch (this=0x1e97aa8, recv_ctx=0x7f80d4000990, act=<value optimized out>, exit_loop=@0x7f80da9df04f) at galera/src/gcs_action_source.cpp:127
#16 0x00007f81003dd073 in galera::GcsActionSource::process (this=0x1e97aa8, recv_ctx=0x7f80d4000990, exit_loop=@0x7f80da9df04f) at galera/src/gcs_action_source.cpp:177
#17 0x00007f81003fb983 in galera::ReplicatorSMM::async_recv (this=0x1e974c0, recv_ctx=0x7f80d4000990) at galera/src/replicator_smm.cpp:354
#18 0x00007f8100410673 in galera_recv (gh=<value optimized out>, recv_ctx=<value optimized out>) at galera/src/wsrep_provider.cpp:226
#19 0x00000000005be0af in wsrep_replication_process (thd=0x7f80d4000990) at /usr/src/debug/Percona-XtraDB-Cluster-5.6.15/sql/wsrep_thd.cc:309
#20 0x00000000005ae4d0 in start_wsrep_THD (arg=0x5be060) at /usr/src/debug/Percona-XtraDB-Cluster-5.6.15/sql/mysqld.cc:5484
#21 0x00007f810754f9d1 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f8105a57b6d in clone () from /lib64/libc.so.6

#0  0x00007f81059a1925 in raise () from /lib64/libc.so.6
#1  0x00007f81059a3105 in abort () from /lib64/libc.so.6
#2  0x00007f81003cdc05 in base_size (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/key_set.hpp:267
#3  serial_size (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/key_set.hpp:282
#4  serial_size (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/key_set.hpp:208
#5  serial_size (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/key_set.hpp:212
#6  next_base<galera::KeySet::KeyPart> (this=0x1e97d58, trx=0x7f80c496c990) at galerautils/src/gu_rset.hpp:346
#7  next (this=0x1e97d58, trx=0x7f80c496c990) at galerautils/src/gu_rset.hpp:420
#8  next (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/key_set.hpp:713
#9  galera::Certification::purge_for_trx_v3 (this=0x1e97d58, trx=0x7f80c496c990) at galera/src/certification.cpp:120
#10 0x00007f81003d6252 in galera::Certification::PurgeAndDiscard::operator() (this=0x7f80da9ddff0, vt=...) at galera/src/certification.hpp:140
#11 0x00007f81003d0911 in for_each<std::_Rb_tree_iterator<std::pair<long const, galera::TrxHandle*> >, galera::Certification::PurgeAndDiscard> (this=0x1e97d58, seqno=69757, handle_gcache=true)
    at /usr/include/c++/4.4.7/bits/stl_algo.h:4200
#12 galera::Certification::purge_trxs_upto_ (this=0x1e97d58, seqno=69757, handle_gcache=true) at galera/src/certification.cpp:937
#13 0x00007f81003fe858 in purge_trxs_upto (this=0x1e974c0, seq=69757, seqno_l=<value optimized out>) at galera/src/certification.hpp:75
#14 galera::ReplicatorSMM::process_commit_cut (this=0x1e974c0, seq=69757, seqno_l=<value optimized out>) at galera/src/replicator_smm.cpp:1247
#15 0x00007f81003dcbe5 in galera::GcsActionSource::dispatch (this=0x1e97aa8, recv_ctx=0x7f80d4000990, act=<value optimized out>, exit_loop=@0x7f80da9df04f) at galera/src/gcs_action_source.cpp:127
#16 0x00007f81003dd073 in galera::GcsActionSource::process (this=0x1e97aa8, recv_ctx=0x7f80d4000990, exit_loop=@0x7f80da9df04f) at galera/src/gcs_action_source.cpp:177
#17 0x00007f81003fb983 in galera::ReplicatorSMM::async_recv (this=0x1e974c0, recv_ctx=0x7f80d4000990) at galera/src/replicator_smm.cpp:354
#18 0x00007f8100410673 in galera_recv (gh=<value optimized out>, recv_ctx=<value optimized out>) at galera/src/wsrep_provider.cpp:226
#19 0x00000000005be0af in wsrep_replication_process (thd=0x7f80d4000990) at /usr/src/debug/Percona-XtraDB-Cluster-5.6.15/sql/wsrep_thd.cc:309
#20 0x00000000005ae4d0 in start_wsrep_THD (arg=0x5be060) at /usr/src/debug/Percona-XtraDB-Cluster-5.6.15/sql/mysqld.cc:5484
#21 0x00007f810754f9d1 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f8105a57b6d in clone () from /lib64/libc.so.6

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2014-04-24:

@John,

It indeed looks like a duplicate.

The fix is available here http://www.percona.com/downloads/TESTING/Percona-XtraDB-Cluster-galera-56/galera-3.x/215/RPM/

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1267507 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.