percona-xtradb-cluster-server-5.5 crash

Bug #1159837 reported by Niall Hallett
This bug report is a duplicate of:  Bug #1188641: Nodes Crashed. Edit Remove
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
New
Undecided
Unassigned

Bug Description

5.5.29-23.7.2-389.squeeze

14:59:42 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=4
max_threads=151
thread_count=2
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338001 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0xa246560
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = ffffffffec29437c thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x33)[0x84357c3]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x82f37c4]
[0xf774e400]
/usr/sbin/mysqld(_Z14wsrep_apply_cbPvPKvjx+0xad)[0x81ca01d]
/usr/lib/libgalera_smm.so(+0x1b1ef7)[0xf6cb1ef7]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x260)[0xf6cbe370]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM11process_trxEPvPNS_9TrxHandleE+0x4b)[0xf6cbf49b]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_action+0x3bf)[0xf6c8c9df]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPv+0xe0)[0xf6c8d470]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x8a)[0xf6cb3d8a]
/usr/lib/libgalera_smm.so(galera_recv+0x35)[0xf6cd4ae5]
/usr/sbin/mysqld(_Z25wsrep_replication_processP3THD+0x50)[0x81c9350]
/usr/sbin/mysqld(start_wsrep_THD+0x3c7)[0x8143d47]
/lib/i686/cmov/libpthread.so.0(+0x5955)[0xf7733955]
/lib/i686/cmov/libc.so.6(clone+0x5e)[0xf764b1de]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 2
Status: NOT_KILLED

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
130325 10:59:42 mysqld_safe Number of processes running now: 0
130325 10:59:42 mysqld_safe WSREP: not restarting wsrep node automatically
130325 10:59:42 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

affects: percona-server → percona-xtradb-cluster
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

The crash trace looks similar to the one described here -- https://groups.google.com/forum/?fromgroups=#!topic/codership-team/dD9-D8BETTU

@Niall,

Can you upload the error log in entirety?

Revision history for this message
Niall Hallett (niall-hallett) wrote :
Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Unfortunately the log tells nothing.

1) Could you think of anything unusual (queries) executed on the cluster at the moment of crash?
2) Any reason why innodb_buffer_pool_size is only 128M?
3) Could you post the output of 'SHOW GLOBAL VARIABLES\G' from the crashed node?

Revision history for this message
Niall Hallett (niall-hallett) wrote :

1) I don't know what queries were taking place at that moment. The crashing node (apollo - node 3) isn't being directly used for anything except replication. Nor is node 2. Only the original node 1 (mustang) is being actively used. It's currently processing an average of 110 queries per second.

2) There's no reason apart from being the default. It should be more than 1GB as the innodb data size is 1.1GB.

3) I did restart mysql on node 3, which duly transferred all the data again and ran for about 12 hours before crashing for the second time. I've started the node as standalone to get the global variables attachment.

Revision history for this message
Niall Hallett (niall-hallett) wrote :

I upgraded the software to 5.5.30-23.7.4-405.squeeze on 21st April, re-synced to the cluster and it's been running without incident until today:

09:28:18 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=3
max_threads=153
thread_count=2
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 342362 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x9904418
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = ffffffffea44a37c thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x33)[0x843c643]
/usr/sbin/mysqld(handle_fatal_signal+0x4bc)[0x82fa59c]
[0xf76f2400]
/usr/sbin/mysqld(_Z14wsrep_apply_cbPvPKvjx+0xad)[0x81cc82d]
/usr/lib/libgalera_smm.so(+0x1a7664)[0xf4ca5664]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x25d)[0xf4cadbed]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM11process_trxEPvPNS_9TrxHandleE+0x4b)[0xf4cb177b]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_action+0x387)[0xf4c82da7]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPv+0xe0)[0xf4c83540]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x8a)[0xf4ca731a]
/usr/lib/libgalera_smm.so(galera_recv+0x35)[0xf4cc79b5]
/usr/sbin/mysqld(_Z25wsrep_replication_processP3THD+0x50)[0x81cbb60]
/usr/sbin/mysqld(start_wsrep_THD+0x3c7)[0x8146247]
/lib/i686/cmov/libpthread.so.0(+0x5955)[0xf76d7955]
/lib/i686/cmov/libc.so.6(clone+0x5e)[0xf744a1de]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 2
Status: NOT_KILLED

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
130503 05:28:18 mysqld_safe Number of processes running now: 0
130503 05:28:18 mysqld_safe WSREP: not restarting wsrep node automatically
130503 05:28:18 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

Revision history for this message
Niall Hallett (niall-hallett) wrote :
Download full text (3.2 KiB)

Our standalone node has now crashed (5.5.30-23.7.4-405.squeeze):

10:17:40 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=35
max_threads=153
thread_count=20
connection_count=20
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 343043 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fcad93c6f80
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fce0ea18e78 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7ed245]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x6ba864]
/lib/libpthread.so.0(+0xeff0)[0x7fceb0f2eff0]
/usr/sbin/mysqld(my_b_safe_tell+0x11)[0x7da711]
/usr/sbin/mysqld(_ZN9Log_event12write_headerEP11st_io_cachem+0x118)[0x763408]
/usr/sbin/mysqld(_ZN15Query_log_event5writeEP11st_io_cache+0x328)[0x7666a8]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG5writeEP9Log_event+0x5be)[0x75749e]
/usr/sbin/mysqld(_ZN3THD12binlog_queryENS_22enum_binlog_query_typeEPKcmbbbi+0xb7)[0x57bde7]
/usr/sbin/mysqld(_ZN13select_insert8send_eofEv+0x140)[0x58bd10]
/usr/sbin/mysqld(_ZN13select_create8send_eofEv+0x1f)[0x58f24f]
/usr/sbin/mysqld[0x5d0bfe]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0xc62)[0x5e5ea2]
/usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_un
itP13st_select_lex+0x12c)[0x5e766c]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x1cd)[0x5e812d]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x4cd8)[0x5a9328]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x343)[0x5a9d33]
/usr/sbin/mysqld[0x5aadd2]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1a92)[0x5acf72]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x167)[0x5ad567]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x14f)[0x64b4cf]
/usr/sbin/mysqld(handle_one_connection+0x51)[0x64b6b1]
/lib/libpthread.so.0(+0x68ca)[0x7fceb0f268ca]
/lib/libc.so.6(clone+0x6d)[0x7fceafbcfb6d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fcb6bb017a0): is an invalid pointer
Connection ID (thread ID): 1075420
Status: NOT_KILLED

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
130528 11:17:41 mysqld_safe Number of processes running now: 0
130528 11:17:41 mysqld_safe WSREP: not restarting wsrep node automatically
130528 11:17:41 ...

Read more...

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :
Revision history for this message
Niall Hallett (niall-hallett) wrote :

yep, this was the probable culprit:

CREATE TEMPORARY TABLE tempFlexiTime (employee_id int unsigned, date_from date, time_from time, time_to time) as SELECT employee_id, date_from, time_from, time_to FROM webdb2.flexitime ORDER by employee_id ASC, date_from DESC

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Niall, you can watch https://bugs.launchpad.net/codership-mysql/+bug/1160854 for updates on the issue you reported in #6

For the original issue, I am marking this a duplicate of lp:1188641

Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :

@Niall, your first variable output shows that you are using binlog_format=STATEMENT. Have you later changed to ROW format ? Note that only ROW format is fully supported atm.

Revision history for this message
Niall Hallett (niall-hallett) wrote :

I've set the binlog_format = ROW on all the servers.

I've just upgraded one of the unused nodes to 5.5.31-23.7.5-438.squeeze and it won't even start without crashing. I emptied the /var/lib/mysql directory to see if that made any difference and attached the err log.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.