Percona XtraDB Cluster - HA scalable solution for MySQL

Standalone 5.5.29 wsrep_23.7.2.r3843 server crash at rem0rec.c line 564

Reported by Nickolay Ihalainen on 2013-02-27
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Undecided
Seppo Jaakola
Percona XtraDB Cluster
Status tracked in Trunk
5.6
Undecided
Unassigned
Trunk
Undecided
Raghavendra D Prabhu

Bug Description

version info: 5.5.29-log Percona XtraDB Cluster (GPL), wsrep_23.7.2.r3843

There is no cluster configured, just single host used.

The error happened only once and couldn't be easily reproduced.

130225 1:10:17 InnoDB: Assertion failure in thread 47035804027200 in file rem0rec.c line 564

Thread pointer: 0x2ac855100b60
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 2ac7606400c8 thread_stack 0x20000

0x7c51e5 my_print_stacktrace + 53
0x69e894 handle_fatal_signal + 1188
0x3f26a0ebe0 _end + 630504832
0x3f25a30265 _end + 613864453
0x3f25a31d10 _end + 613871280
0x7f14ad ha_innobase_add_index::~ha_innobase_add_index() + 48013
0x7f1e4c wsrep_rec_get_foreign_key + 108
0x7d479f wsrep_append_foreign_key + 911
0x9049c3 fil_space_is_corrupt + 416083
0x905662 fil_space_is_corrupt + 419314
0x80ed33 wsrep_rec_get_foreign_key + 118611
0x80ff47 wsrep_rec_get_foreign_key + 123239
0x8105b1 wsrep_rec_get_foreign_key + 124881
0x7fba11 wsrep_rec_get_foreign_key + 39985
0x7d7e4f ha_innobase::wsrep_append_keys(THD*, bool, unsigned char const*, unsigned char const*) + 8559
0x6a422c handler::ha_delete_row(unsigned char const*) + 92
0x78330d mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long) + 2813
0x594364 mysql_execute_command(THD*) + 10116
0x597173 mysql_parse(THD*, char*, unsigned int, Parser_state*) + 819
0x597b30 handle_bootstrap + 336
0x59a372 dispatch_command(enum_server_command, THD*, char*, unsigned int) + 6802
0x59a799 do_command(THD*) + 361
0x6356df do_handle_one_connection(THD*) + 351
0x6358a1 handle_one_connection + 81
0x3f26a0677d _end + 630470941
0x3f25ad49ad _end + 614538061

Query (2ac83e1e6300): is an invalid pointer
Connection ID (thread ID): 1048283
Status: NOT_KILLED

A similar bug was fixed in lp:1078346 , however, there is also a lp:1113073 but it is in Xtrabackup restore, may/not be related.

@Nickolay,

Had this box recently crashed or so? Just to see if there are/were any inconsistencies with this database

Adding codership-mysql as we have seen this crash before as well and it may be a known issue (as per previous discusssions on this).

tags: added: crash foreign-keys
removed: foreign keys
Nickolay Ihalainen (ihanick) wrote :

The server crashed just once and restarted automatically by mysqld_safe. The problem wasn't repeated after restart. There is no information about specific table schema or core dump.

Changed in codership-mysql:
status: New → Invalid
tags: added: i29137
tags: added: i29731
removed: rem0rec.c
Shadow Roldan (shadow-roldan) wrote :
Download full text (4.2 KiB)

We have also come across this bug as described here. I have a snapshot of our db in a state where I can consistently crash the server with a simple update statement. Please advise if I can provide more detailed information.

Things of note:
* The update works fine if I disable foreign key checks
* Forcing a recreation the foreign key seems to help get past this crash in some cases even though the foreign key itself remains the same
* Observations indicate that the length of the string used as a foreign key could be relevant, or possibly the contents of the string.

example update statement causing the crash:
mysql> update appglu_storage_files set url = concat('1', url);
ERROR 2013 (HY000): Lost connection to MySQL server during query

From error log:

130307 18:10:46 InnoDB: Assertion failure in thread 140317682054912 in file rem0rec.c line 564
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
18:10:46 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=8388608
max_used_connections=5
max_threads=5000
thread_count=4
connection_count=4
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 51296010 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x6305bb0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f9e41982e70 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7d357e]
/usr/sbin/mysqld(handle_fatal_signal+0x484)[0x6aa084]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f9e40ba3cb0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f9e4080b425]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f9e4080eb8b]
/usr/sbin/mysqld[0x815238]
/usr/sbin/mysqld(wsrep_rec_get_foreign_key+0x6c)[0x81744c]
/usr/sbin/mysqld(wsrep_append_foreign_key+0x407)[0x803907]
/usr/sbin/mysqld[0x93b18f]
/usr/sbin/mysqld[0x93ca7a]
/usr/sbin/mysqld[0x8315dc]
/usr/sbin/mysqld[0x832466]
/usr/sbin/mysqld[0x8359ce]
/usr/sbin/mysqld[0x81f1a1]
/usr/sbin/mysqld[0x804a02]
/usr/sbin/mysqld(_ZN7handler13ha_update_rowEPKhPh+0x66)[0x6b1ae6]
/usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_ordery15enum_duplicatesbPySB_+0x131d)[0x61f59d]
/usr/sbin/mysqld(_Z2...

Read more...

Shadow Roldan (shadow-roldan) wrote :

Server info was missing from previous comment, below

Server version: 5.5.29-55-log Percona XtraDB Cluster (GPL), wsrep_23.7.2.r3843

Changed in percona-xtradb-cluster:
assignee: nobody → Raghavendra D Prabhu (raghavendra-prabhu)

@Shadow,

Thanks for the details.

Since you are able to reproduce this easily, can you provide
either/or both of the following:

1) A backtrace from the core file. You can use procedure here -- http://www.mysqlperformanceblog.com/2011/08/26/getting-mysql-core-file-on-linux/ to produce one. Make sure to add 'core-file-size=unlimited' under mysqld_safe in my.cnf as well (not mentioned there).

To produce backtrace from the core:

  gdb `which mysqld` --core ${core} --batch --quiet -ex "thread apply all bt full" -ex "quit"

  and attach that file here.

2) If you can provide a table(s), the statement and a minimal configuration on which it crashes, that would be for the best, since then I will be able to replicate it on my box and debug it better.

Since it doesn't crash without foreign keys, it certainly looks
like an issue with wsrep and not with PS, however, we need to
test so to reliably concur that.

@Shadow, are you able to repeat it with MySQL-wsrep (ie.
mysql+wsrep distributed by codership)?

One more thing, the crash reported originally and the
penultimate crash seem to be originating in different locations -
the first one at ha_delete_row and the latter at ha_update_row.

@Shadow, another thing - make sure all debug symbol packages
related to Percona Server are installed (including the galera
debug libraries). You earlier stack-trace was lacking it.

@Nickolay, while resolving the stack-trace did you also consider
galera shared library into account? I presume the following may
be incorrect:
""
 0x9049c3 fil_space_is_corrupt + 416083
0x905662 fil_space_is_corrupt + 419314
0x80ed33 wsrep_rec_get_foreign_key + 118611
0x80ff47 wsrep_rec_get_foreign_key + 123239
0x8105b1 wsrep_rec_get_foreign_key + 124881
0x7fba11 wsrep_rec_get_foreign_key + 39985
""

because from other stack trace

 /usr/sbin/mysqld(wsrep_rec_get_foreign_key+0x6c)[0x81744c]
/usr/sbin/mysqld(wsrep_append_foreign_key+0x407)[0x803907]
/usr/sbin/mysqld[0x93b18f]
/usr/sbin/mysqld[0x93ca7a]
/usr/sbin/mysqld[0x8315dc]
/usr/sbin/mysqld[0x832466]
/usr/sbin/mysqld[0x8359ce]
/usr/sbin/mysqld[0x81f1a1]
/usr/sbin/mysqld[0x804a02]
/usr/sbin/mysqld(_ZN7handler13ha_update_rowEPKhPh+0x66)[0x6b1ae6]

The intermediate ones are unresolved because the galera debug
libs may not be present here.

From the code, it looks like code makes an entry into galera
after wsrep->append_key.

One more thing, also make sure to run 'CHECK TABLE' on the said
table (and the tables it is related to by FK relation) to rule
out a table corruption causing this crash.

Is this bug still present and/or recurring after the latest
release?

Changed in codership-mysql:
status: Invalid → Confirmed
milestone: none → 5.5.31-23.7.5
Seppo Jaakola (seppo-jaakola) wrote :

I was able to reproduce this problem while troubleshooting: https://mariadb.atlassian.net/browse/MDEV-4624

The full scenario is as follows:

1. use two nodes connected with MySQL native replication,
   * configure InnoDB to use row format COMPACT
   * configure both nodes to use binlog_format=ROW

2. use at least 3 tables having cyclic ON DELETE CASCADE foreign key constraints
    parent -> child1
    parent -> child2
    child1-> child2

3. run test load at MySQL master to populate the tables with sufficient rows and
    then start deleting rows from all tables through several conflicting sessions

4. in the slave node issue sequences of:
   STOP SLAVE
   ALTER TABLE child2 ROW_FORMAT=COMPRESSED
   START SLAVE
   STOP SLAVE
   ALTER TABLE child2 ROW_FORMAT=COMPACT
   START SLAVE
   ...

=> slave node should crash occasionally. I was able to reproduce with 5.5.29 based build, but not anymore with 5.5.31 RC build

Problem seems to relate to the lifetime of clustered record. When cascaded delete progresses, the state of target record gets altered and wsrep key population from the record can happen too late. I will merge the fix from 5.5.29 tree because 5.5.31 may also be vulnerable, although my testing did surface this.

Changed in codership-mysql:
status: Confirmed → In Progress
Seppo Jaakola (seppo-jaakola) wrote :

Pushed fix in revision: http://bazaar.launchpad.net/~codership/codership-mysql/5.5-23/revision/3872

It turns out that cascading FK processing does some unnecessary FK checks, which are needed only for the purpose of getting FK parent reference for child table modifications. These can be optimized away, will continue on this part before closing this issue.

Changed in percona-xtradb-cluster:
milestone: none → 5.5.31-24.8
Seppo Jaakola (seppo-jaakola) wrote :

Optimization to avoid excessive calls for 'wsrep_row_upd_check_foreign_constraints()' when processing cascading FKs was pushed in revision: http://bazaar.launchpad.net/~codership/codership-mysql/5.5-23/revision/3873

Changed in codership-mysql:
status: In Progress → Fix Committed
Changed in percona-xtradb-cluster:
status: New → Fix Released
Changed in codership-mysql:
status: Fix Committed → Fix Released
Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers