percona-server slave crashes when applying row-based binlog entries in cascading replication for Merge storage engine

Bug #1191263 reported by Mario Splivalo
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.1
Invalid
Undecided
Unassigned
5.5
Fix Released
Medium
Unassigned
5.6
Fix Released
Medium
Unassigned
5.7
Fix Released
Medium
Unassigned

Bug Description

In cascading replication setup (master -> master/slave -> slave), a slave server crashes when applying row-based binlog entries that change data in Merge engine tables. Note that master/slave server (in the setup described) won't crash, only the slave (the last one in chain) crashes.

Steps to reproduce:

- establish cascading replication between three servers: "master" -> "master/slave" -> "slave" (here, "master/slave" has both binlogs/relaylogs enabled).
- create database 'mergetest' on "master" (it will propagate across all the slaves)
- create two myisam tables, t_1 and t_2, and table t, a merge table:

create table t_1 (id int primary key auto_increment, data varchar(200)) ENGINE=MyISAM;
create table t_2 (id int primary key auto_increment, data varchar(200)) ENGINE=MyISAM;
create table t (id int primary key auto_increment, data varchar(200)) ENGINE=MERGE UNION=(t_1,t_2) INSERT_METHOD=FIRST;

- fill the table with some data:
insert into t (data) values ('robor'), ('tobor'), ('dobor'), ('borbor');

- force the row-based binlog entry (or make sure binlog_format is set to 'row'):
update t set data = 'changed' limit 1;

This update is propagated to the "master/slave" server with no issues, however, the "slave" server crashes when trying to apply it. Here is the excerpt from the slave's error log file:

------------
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fe541da99d0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x8bf70e]
/usr/sbin/mysqld(handle_fatal_signal+0x491)[0x699061]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7fe555d93030]
/usr/sbin/mysqld(_ZNK9table_def15compatible_withEP3THDP14Relay_log_infoP5TABLEPS5_+0xc7)[0x882d97]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x748)[0x86a618]
/usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x67)[0x8633f7]
/usr/sbin/mysqld(_Z26apply_event_and_update_posPP9Log_eventP3THDP14Relay_log_info+0x1a6)[0x88e716]
/usr/sbin/mysqld(handle_slave_sql+0x9f5)[0x8910f5]
/usr/sbin/mysqld(pfs_spawn_thread+0x16f)[0x8f679f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7fe555d8ab50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe5544ada7d]
-------------

This is verified and reproducible on:
Debian Squeeze with percona-server-server-5.5.31 ( 5.5.31-rel30.3-520.squeeze)
Debian Wheezy with percona-server-server-5.5.31 (5.5.31-rel30.3-520.wheezy)
Debian Wheezy with percona-server-server-5.6.11 (5.6.11-rc60.3-375.wheezy)

It is not reproducible with percona-server-5.1 (5.1.69-rel14.7-572.wheezy or 5.1.68-rel14.6-551.squeeze)

This bug seems(!) like a duplicate of http://bugs.mysql.com/bug.php?id=47103 and/or duplicate of these bugs: https://bugs.launchpad.net/percona-server/+bug/1019125, https://bugs.launchpad.net/percona-server/+bug/873044. The only difference is that all those bugs mention only master/slave setup - here the issue is in the 'second' slave in the cascading replication setup.

Attached file are my.cnf from all three servers (server_id and report_host is the only difference between the three files, and slav-only server doesn't have binlogs enabled) as well as error.log file from the slave (the last in the chain).

Cascading setup is used quite often in master-master setup scenarios where each master have several read-only slaves attached to it.

Tags: upstream
Revision history for this message
Mario Splivalo (mariosplivalo) wrote :
description: updated
Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

I was able to reproduce this as described with PS 5.6.11 on a sandbox circular replication setup created like this:

make_replication_sandbox --circular=4 /home/openxs/builds/Percona-Server-5.6.11-rc60.3-387-valgrind.Linux.i686.tar.gz

This is what I see in the error log on node3:

...
2013-06-25 16:10:39 5836 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000001' at position 10825, relay log './mysql_sandbox15103-relay-bin.000002' position: 8443
13:10:39 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=3
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68426 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0xab74cd0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = ffffffffa5951ef4 thread_stack 0x30000
/home/openxs/builds/5.6.11/bin/mysqld(my_print_stacktrace+0x33)[0x855e3b3]
/home/openxs/builds/5.6.11/bin/mysqld(handle_fatal_signal+0x4aa)[0x830040a]
[0xb77ac400]
/home/openxs/builds/5.6.11/bin/mysqld[0x851ef6d]
/home/openxs/builds/5.6.11/bin/mysqld(_ZNK9table_def15compatible_withEP3THDP14Relay_log_infoP5TABLEPS5_+0xc2)[0x8520432]
/home/openxs/builds/5.6.11/bin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x765)[0x8506de5]
/home/openxs/builds/5.6.11/bin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x57)[0x85018e7]
/home/openxs/builds/5.6.11/bin/mysqld(_Z26apply_event_and_update_posPP9Log_eventP3THDP14Relay_log_info+0x1b7)[0x852b907]
/home/openxs/builds/5.6.11/bin/mysqld[0x8531740]
/home/openxs/builds/5.6.11/bin/mysqld(handle_slave_sql+0x726)[0x8535346]
/home/openxs/builds/5.6.11/bin/mysqld(pfs_spawn_thread+0x209)[0x85ebb19]
/lib/i386-linux-gnu/i686/cmov/libpthread.so.0(+0x5c39)[0xb778dc39]
/lib/i386-linux-gnu/i686/cmov/libc.so.6(clone+0x5e)[0xb737078e]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 4
Status: NOT_KILLED
...

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

Same with PS 5.5.31:

...
130625 18:07:17 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000001' at position 6746, relay log './mysql_sandbox17103-relay-bin.000002' position: 4843
15:07:17 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=2
max_threads=153
thread_count=2
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 342315 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x95c4978
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = ffffffffa5963384 thread_stack 0x30000
/home/openxs/5.5.31/bin/mysqld(my_print_stacktrace+0x33)[0x843b473]
/home/openxs/5.5.31/bin/mysqld(handle_fatal_signal+0x43e)[0x82e51fe]
[0xb7717400]
/home/openxs/5.5.31/bin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x1098)[0x83b0428]
/home/openxs/5.5.31/bin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x193)[0x813f063]
/home/openxs/5.5.31/bin/mysqld[0x8147bd7]
/home/openxs/5.5.31/bin/mysqld(handle_slave_sql+0xea3)[0x8149293]
/lib/i386-linux-gnu/i686/cmov/libpthread.so.0(+0x5c39)[0xb76f8c39]
/lib/i386-linux-gnu/i686/cmov/libc.so.6(clone+0x5e)[0xb73e478e]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 10
Status: NOT_KILLED

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

It seems problem comes from upstream MySQL server, check http://bugs.mysql.com/bug.php?id=69574.

tags: added: upstream
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1385

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.