Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

SEGV in XtraDB Cluster

Bug #1078346 reported by Dan Rogers on 2012-11-13

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	MySQL patches by Codership	Fix Released	Medium	Seppo Jaakola	MySQL patches by Codership 5.5.29-23.7.1
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.5.29-23.7.1

Bug Description

While in the process of switching a database server from Percona Server 5.5.28-rel29 (in a master-master relationship) to Percona XtraDB Cluster, I encountered a situation where any attempt to start the slave on the Cluster server would cause a SEGV.

A similar (or possibly identical) crash was reported as part of https://bugs.launchpad.net/galera/+bug/1036774 back in September.

As with the above report, the only way to get past that point in the binlogs was to switch back to the base Percona server version.

We've also seen 4.5 million "WSREP: skipping FK key append" messages, as described in https://bugs.launchpad.net/codership-mysql/+bug/1057910

Here's the dump from the logs. Config file is attached.

---

121108 12:24:45 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.002763' at position 84270110, relay log '/mysqlroot/mysql/relaylog/relay.000270' position: 84270256
121108 12:24:45 [Note] Slave I/O thread: connected to master '<email address hidden>:3306',replication started in log 'mysql-bin.002764' at position 402960631
18:24:45 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=16777216
read_buffer_size=524288
max_used_connections=45
max_threads=750
thread_count=35
connection_count=35
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1945858 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x8fd29400
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fcf47cb6e58 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7c5fb5]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x6a00f4]
/lib64/libpthread.so.0(+0xf500)[0x7fdbef6a6500]
/usr/sbin/mysqld(wsrep_append_foreign_key+0xa2)[0x816cc2]
/usr/sbin/mysqld[0x84dc80]
/usr/sbin/mysqld[0x85100e]
/usr/sbin/mysqld[0x85218a]
/usr/sbin/mysqld[0x83ba01]
/usr/sbin/mysqld[0x81bc2f]
/usr/sbin/mysqld(_ZN7handler13ha_delete_rowEPKh+0x5e)[0x6a4aee]
/usr/sbin/mysqld(_ZN21Delete_rows_log_event11do_exec_rowEPK14Relay_log_info+0x148)[0x7428f8]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x22d)[0x7480fd]
/usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x125)[0x5317b5]
/usr/sbin/mysqld[0x535af7]
/usr/sbin/mysqld(handle_slave_sql+0xa45)[0x537025]
/lib64/libpthread.so.0(+0x7851)[0x7fdbef69e851]
/lib64/libc.so.6(clone+0x6d)[0x7fdbee92411d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 783
Status: NOT_KILLED

Revision history for this message

Dan Rogers (drogers-l) wrote on 2012-11-13:

my.cnf from affected server Edit (4.2 KiB, text/plain)

Revision history for this message

pservit (pservit) wrote on 2012-11-21:

Percona-XtraDB-Cluster-server-5.5.28-23.7.369.rhel6.x86_64

Today 2 of 3 nodes crashed with similar backtrace:

16:40:13 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=296
max_threads=1000
thread_count=91
connection_count=91
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1197416 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x588a2c00
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
121121 10:40:15 [Note] WSREP: (0cb18548-322e-11e2-0800-bf8bb6ded1ca, 'tcp://0.0.0.0:4567') reconnecting to b2aaa747-32f1-11e2-
0800-9dbb5a52fe4a (tcp://10.0.1.11:4567), attempt 0
stack_bottom = 7fe9da186e48 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7c83f5]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x6a1e04]
/lib64/libpthread.so.0(+0xf500)[0x7ff77cf01500]
/usr/sbin/mysqld(wsrep_append_foreign_key+0xa2)[0x83baf2]
/usr/sbin/mysqld[0x874ab0]
/usr/sbin/mysqld[0x877e3e]
/usr/sbin/mysqld[0x878fba]
/usr/sbin/mysqld[0x862831]
/usr/sbin/mysqld[0x840daf]
/usr/sbin/mysqld(_ZN7handler13ha_delete_rowEPKh+0x5e)[0x6a68ee]
/usr/sbin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP10SQL_I_ListI8st_orderEyy+0xafd)[0x78653d]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2833)[0x595f93]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x333)[0x598de3]
/usr/sbin/mysqld[0x5997a0]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1a82)[0x59c072]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x59c499]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x15f)[0x63710f]
/usr/sbin/mysqld(handle_one_connection+0x51)[0x6372d1]
/lib64/libpthread.so.0(+0x7851)[0x7ff77cef9851]
/lib64/libc.so.6(clone+0x6d)[0x7ff77c17f11d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fe91833f960): is an invalid pointer
Connection ID (thread ID): 98095
Status: NOT_KILLED

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
121121 10:40:17 mysqld_safe Number of processes running now: 0
121121 10:40:17 mysqld_safe WSREP: not restarting wsrep node automatically
------------------------------------------------------------------

I tried to start mysql again but it crashed with same backtrace.
After downgrade to 5.5.24-23.6 it started successfully.

Percona-XtraDB-Cluster-server-5.5.28-23.7.369.rhel6.x86_64

Today 2 of 3 nodes crashed with similar backtrace:

16:40:13 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=296
max_threads=1000
thread_count=91
connection_count=91
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1197416 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fe91833f960): is an invalid pointer
Connection ID (thread ID): 98095
Status: NOT_KILLED

I tried to start mysql again but it crashed with same backtrace. 
After downgrade to 5.5.24-23.6 it started successfully.

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2012-11-21:

> Percona-XtraDB-Cluster-server-5.5.28-23.7.369.rhel6.x86_64

What is galera version there?
What is wsrep_slave_threads value?

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2012-11-22:

The log message: "WSREP: skipping FK key append" has been dropped from the latest release. Do you still see these messages in your log, or are they from time when running with previous PXC release?

The problem happens when processing a delete for a table with foreign key constraint (probably cascading). Can you pinpoint which table(s) might be accessed during this delete? Can you show the table definition for the table(s)? If sensitive information, you can send through email to: <email address hidden>.

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2012-11-22:

Do you have a 'core' file in your mysql data directory, related to this crash? That would help in troubleshooting this crash.

Changed in galera:
assignee:	nobody → Seppo Jaakola (seppo-jaakola)

Revision history for this message

pservit (pservit) wrote on 2012-11-22:

>> Percona-XtraDB-Cluster-server-5.5.28-23.7.369.rhel6.x86_64

> What is galera version there?

wsrep_provider_version | 2.2(r137)

But it seems like bug in percona cluster, not galera because after downgrade to 5.5.24-23.6 with same galera version I can start server.

> What is wsrep_slave_threads value?

wsrep_slave_threads | 64

> The problem happens when processing a delete for a table with foreign key constraint (probably cascading).
> Can you pinpoint which table(s) might be accessed during this delete? Can you show the table definition for the table(s)?

Sorry, not clues in log files. It's very busy server, around 90G of data and 10k requests per second.

> Do you have a 'core' file in your mysql data directory, related to this crash? That would help in troubleshooting this crash.

No. I set ulimit -c unlimited but core file not generated.

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2012-11-23:

> No. I set ulimit -c unlimited but core file not generated.

could you please set it in /etc/security/limits.conf and also add "core-file" option to [mysqld] section in my.cnf. Core file would be truly lovely.

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2012-11-23:

This is hard to diagnose with the information at hand. Can you double check that you are not seeing warnings like:
"WSREP: skipping FK key append"
any more. They should not be produced by PXC 5.5.28 version

If you can show the full database schema, it may be possible to track the potential offending foreign key construct from there. e.g. mysqldump with --no-data option would produce that. (As schema spec is probably confidential, you better not attach it here)

Revision history for this message

Dan Rogers (drogers-l) wrote on 2012-11-26:

I've emailed Seppo a copy of our database schema. Hopefully this will help.

Seppo Jaakola (seppo-jaakola) on 2012-11-26

Changed in codership-mysql:
assignee:	nobody → Seppo Jaakola (seppo-jaakola)
status:	New → In Progress

Revision history for this message

Dan Rogers (drogers-l) wrote on 2012-11-28:

#10

I've reinstalled Percona XtraDB Cluster on one of our production slaves, and enabled core files. Hopefully I can get it to crash so a core file will be generated.

FYI, it was also necessary to perform the steps from http://www.mysqlperformanceblog.com/2011/08/26/getting-mysql-core-file-on-linux/ to cause a core file to be dumped.

Seppo Jaakola (seppo-jaakola) on 2012-11-28

Changed in codership-mysql:
status:	In Progress → Fix Committed
milestone:	none → 5.5.28-23.7
importance:	Undecided → Medium
no longer affects:	galera

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2012-11-28:

#11

Thanks Dan!

I have pushed a potential fix for the problem discovered in pservit's case. It would be good to compare with the core analysis we can get from your setup.

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2012-11-28:

#12

The fix was pushed in revision: http://bazaar.launchpad.net/~codership/codership-mysql/wsrep-5.5/revision/3826

Seppo Jaakola (seppo-jaakola) on 2012-11-28

Changed in codership-mysql:
status:	Fix Committed → Fix Released

Seppo Jaakola (seppo-jaakola) on 2012-12-10

Changed in codership-mysql:
status:	Fix Released → In Progress

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2012-12-10:

#13

The problem still happens with the suggested fix.

Dan proposed a new fix which seems to work in his case, many thanks for this extended troubleshooting, I will push this fix in after some more testing.

Fundamental issue seems to be that Percona Server either does not load all tables, or drops some tables, from dictionary cache. Dan's fix will pull the referenced table back in cache.

Seppo Jaakola (seppo-jaakola) on 2012-12-10

Changed in codership-mysql:
milestone:	5.5.28-23.7 → 5.5.28-23.8
status:	In Progress → Fix Committed

Vadim Tkachenko (vadim-tk) on 2013-01-09

Changed in percona-xtradb-cluster:
status:	New → Confirmed

Vadim Tkachenko (vadim-tk) on 2013-01-09

Changed in percona-xtradb-cluster:
milestone:	none → 5.5.29-23.8

Vadim Tkachenko (vadim-tk) on 2013-01-21

Changed in percona-xtradb-cluster:
milestone:	5.5.29-23.8 → none

Revision history for this message

Dan Rogers (drogers-l) wrote on 2013-01-22:

#14

Does that mean that this will no longer be in the next release of Percona Cluster?

This bug is a show stopper for us in one of our clusters, and will prevent us from upgrading to a version that doesn't have my hand-patched change.

Seppo Jaakola (seppo-jaakola) on 2013-01-22

Changed in codership-mysql:
milestone:	5.5.29-24.8 → 5.5.29-23.7.1

Vadim Tkachenko (vadim-tk) on 2013-01-22

Changed in percona-xtradb-cluster:
milestone:	none → 5.5.29-23.7.1
status:	Confirmed → Fix Committed

Revision history for this message

Dan Rogers (drogers-l) wrote on 2013-01-22:

#15

Thanks guys. :)

Vadim Tkachenko (vadim-tk) on 2013-01-31

Changed in percona-xtradb-cluster:
status:	Fix Committed → Fix Released

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2013-03-11:

#16

fix released in 23.7.3

Changed in codership-mysql:
status:	Fix Committed → Fix Released

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

#17

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1251

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

my.cnf from affected server Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.