Replication segfaults slave server

Bug #1380010 reported by Oscar Elfving
66
This bug affects 11 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
Critical
Laurynas Biveinis
5.6
Fix Released
Critical
Laurynas Biveinis

Bug Description

After a while when replication ahs been running, the slave server segfaults and restarts, here's the full stacktrace:

#0 0x000000000086eb32 in unpack_row (rli=<optimized out>, table=0x7ffb6d9e6400, colcnt=<optimized out>, row_data=0x7ffb6d9a5e90 "", cols=0x7ffb6d9b7a98, current_row_end=0x7ffb6d9b7bb0, master_reclength=0x7ffb6d9b7b58, row_end=0x7ffb6d9a69d0 "\360\037")
    at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/rpl_record.cc:265
        conv_field = <optimized out>
        f = <optimized out>
        master_null_byte_count = <optimized out>
        pack_ptr = 0x7ffb6d9a5e92 "\t00209-85A\200"
        tabledef = <optimized out>
        conv_table = <optimized out>
        end_ptr = <optimized out>
        null_mask = <optimized out>
        null_bits = <optimized out>
        null_ptr = <optimized out>
        begin_ptr = 0x7ffb6d9e7a50
        field_ptr = 0x7ffb6d9e7a50
        i = <optimized out>
        max_cols = <optimized out>
#1 0x0000000000853324 in unpack_current_row (cols=0x7ffb6d9b7a98, rli=0x1c724d60, this=0x7ffb6d9b7940) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.h:4301
No locals.
#2 Write_rows_log_event::write_row (this=this@entry=0x7ffb6d9b7940, rli=0x1c724d60, overwrite=false) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:12752
        table = 0x7ffb6d9e6400
        error = <optimized out>
        keynum = 0
        key = {m_ptr = 0x0}
#3 0x00000000008536b2 in Write_rows_log_event::do_exec_row (this=0x7ffb6d9b7940, rli=<optimized out>) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:12970
        error = <optimized out>
#4 0x00000000008431ef in Rows_log_event::do_apply_row (this=this@entry=0x7ffb6d9b7940, rli=rli@entry=0x1c724d60) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:10309
        error = 0
        old_thd = 0x1c755980
#5 0x000000000085188f in Rows_log_event::do_apply_event (this=0x7ffb6d9b7940, rli=0x1c724d60) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:11422
        after_image = <optimized out>
        saved_sql_mode = 1073741824
        saved_m_curr_row = 0x7ffb6d9a49f0 ""
        do_apply_row_ptr = (int (Rows_log_event::*)(Rows_log_event * const, const Relay_log_info *)) 0x8431c0 <Rows_log_event::do_apply_row(Relay_log_info const*)>
        error = 0
        table = 0x7ffb6d9e6400
#6 0x000000000084abf7 in Log_event::apply_event (this=0x7ffb6d9b7940, rli=0x1c724d60) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:3030
        skip = false
        parallel = <optimized out>
        actual_exec_mode = <optimized out>
        thd = 0x1c755980
#7 0x000000000087dc53 in apply_event_and_update_pos (ptr_ev=ptr_ev@entry=0x7ffb831299c0, thd=thd@entry=0x1c755980, rli=rli@entry=0x1c724d60)
    at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/rpl_slave.cc:3549
        exec_res = 0
        skip_event = false
        ev = 0x7ffb6d9b7940
        reason = <optimized out>
#8 0x0000000000880330 in exec_relay_log_event (rli=0x1c724d60, thd=0x1c755980) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/rpl_slave.cc:4045
        exec_res = <optimized out>
        ev = 0x7ffb6d9b7940
        ptr_ev = 0x7ffb831299c0
#9 handle_slave_sql (arg=arg@entry=0x1c6e2f20) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/rpl_slave.cc:5894
        thd_added = true
        llbuff1 = "59365801", '\000' <repeats 13 times>
        saved_master_log_name = "/tmp/\000\000\000\300#7\001\000\000\000\000\001\000\000\000\000\000\000\000\360_7\001\000\000\000\000\260\235\022\203\373\177\000\000`\240/\001\000\000\000\000\060\235\022\203\373\177\000\000\342Z\261\000\000\000\000\000\300#7\001", '\000' <repeats 20 times>, "\001\000\001\000\000\000\000\000\002", '\000' <repeats 11 times>, "%\000\000\000\001\000\001\000\000\000\000\000\n\000\000\000\000\000\000\000\360_7\001\000\000\000\000pp0\001\000\000\000\000P\324\061\001\000\000\000\000mC\261\000\000\000\000\000\000\000\006\000\000\000\000\000\300#7\001", '\000' <repeats 20 times>"\260, \235\022\203\373\177\000\000@=\325\341\377\177\000\000`\236\022\203\373\177\000\000\367\217\254", '\000' <repeats 25 times>, "\004\000\000\000\001\000\000\000\000\000\000\000\037\024\352\245\345\317o\000\220B\325\341\377\177\000\000\\B\325"...
        saved_log_pos = 0
        saved_master_log_pos = 0
        rli = 0x1c724d60
        errmsg = 0x0
        llbuff = "59365638", '\000' <repeats 13 times>
        saved_log_name = "\003\000\000\000\016\000\000\000\000\060\215\344\377\177\000\000\260\004v\034\000\000\000\000\200[3\001\000\000\000\000@=\325\341\377\177\000\000\000\000\000\000\000\000\000\000\323i\v\217\321{%\000p[\261", '\000' <repeats 381 times>"\331, w\261\000\000\000\000\000\340\234\022\203\373\177\000\000\300#7\001\000\000\000\000\001\200\000\200\000\000\000\000\000C\035\001\000\000\000\000\220\234\022\203\373\177\000\000$Q\261\000\000\000\000\000\002\000\000\000\000\000\000\000\260\235\022\203\373\177\000"
        saved_skip = 0
        mts_inited = false
        __FUNCTION__ = "handle_slave_sql"
        thd = 0x1c755980
#10 0x0000000000af3e00 in pfs_spawn_thread (arg=0x1c71c1d0) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/storage/perfschema/pfs.cc:1860
        typed_arg = 0x1c71c1d0
        user_arg = 0x1c6e2f20
        pfs = <optimized out>
        user_start_routine = 0x87fa50 <handle_slave_sql(void*)>
        reason = <optimized out>
#8 0x0000000000880330 in exec_relay_log_event (rli=0x1c724d60, thd=0x1c755980) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/rpl_slave.cc:4045
        exec_res = <optimized out>
        ev = 0x7ffb6d9b7940
        ptr_ev = 0x7ffb831299c0
#9 handle_slave_sql (arg=arg@entry=0x1c6e2f20) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/rpl_slave.cc:5894
        thd_added = true
        llbuff1 = "59365801", '\000' <repeats 13 times>
        saved_master_log_name = "/tmp/\000\000\000\300#7\001\000\000\000\000\001\000\000\000\000\000\000\000\360_7\001\000\000\000\000\260\235\022\203\373\177\000\000`\240/\001\000\000\000\000\060\235\022\203\373\177\000\000\342Z\261\000\000\000\000\000\300#7\001", '\000' <repeats 20 times>, "\001\000\001\000\000\000\000\000\002", '\000' <repeats 11 times>, "%\000\000\000\001\000\001\000\000\000\000\000\n\000\000\000\000\000\000\000\360_7\001\000\000\000\000pp0\001\000\000\000\000P\324\061\001\000\000\000\000mC\261\000\000\000\000\000\000\000\006\000\000\000\000\000\300#7\001", '\000' <repeats 20 times>"\260, \235\022\203\373\177\000\000@=\325\341\377\177\000\000`\236\022\203\373\177\000\000\367\217\254", '\000' <repeats 25 times>, "\004\000\000\000\001\000\000\000\000\000\000\000\037\024\352\245\345\317o\000\220B\325\341\377\177\000\000\\B\325"...
        saved_log_pos = 0
        saved_master_log_pos = 0
        rli = 0x1c724d60
        errmsg = 0x0
        llbuff = "59365638", '\000' <repeats 13 times>
        saved_log_name = "\003\000\000\000\016\000\000\000\000\060\215\344\377\177\000\000\260\004v\034\000\000\000\000\200[3\001\000\000\000\000@=\325\341\377\177\000\000\000\000\000\000\000\000\000\000\323i\v\217\321{%\000p[\261", '\000' <repeats 381 times>"\331, w\261\000\000\000\000\000\340\234\022\203\373\177\000\000\300#7\001\000\000\000\000\001\200\000\200\000\000\000\000\000C\035\001\000\000\000\000\220\234\022\203\373\177\000\000$Q\261\000\000\000\000\000\002\000\000\000\000\000\000\000\260\235\022\203\373\177\000"
        saved_skip = 0
        mts_inited = false
        __FUNCTION__ = "handle_slave_sql"
        thd = 0x1c755980
#10 0x0000000000af3e00 in pfs_spawn_thread (arg=0x1c71c1d0) at /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/storage/perfschema/pfs.cc:1860
        typed_arg = 0x1c71c1d0
        user_arg = 0x1c6e2f20
        pfs = <optimized out>
        user_start_routine = 0x87fa50 <handle_slave_sql(void*)>
        klass = <optimized out>
#11 0x00007ffff7bc7b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#12 0x00007ffff5eb3e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#13 0x0000000000000000 in ?? ()
No symbol table info available.

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

Thank you for the problem report. Please, send also your my.cnf file content from slave and its entire error log content.

Had you identified problematic table and statement that caused this crash while applied by slave (from the binary/relay log)?

Changed in percona-server:
status: New → Incomplete
Revision history for this message
Oscar Elfving (sofam84) wrote :

Hi, I have attached the my.cnf from the slave server, there was no error log except the stack trace because I am running mysqld under gdb (still attached).
I haven't identified which statement broke the replication, how can I find that out? The server is stopped at the segmentation fault right now so I guess I can find out exactly what statement it is.

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

You had written that server restarts. Upon restart it has to write something to the log about binary/relay log positions. My idea was to check binary log content around that position. This may be problematic because of the following settings:

sync_master_info = 0
sync_relay_log = 0
sync_relay_log_info = 0

So, after crash, both relay log content and information about positions may be not correct.

Revision history for this message
Oscar Elfving (sofam84) wrote :
Download full text (4.3 KiB)

Yes, here are the daemon logs from when it restarts:
Oct 10 15:04:09 ngvm1725 mysqld: 13:04:09 UTC - mysqld got signal 11 ;
Oct 10 15:04:09 ngvm1725 mysqld: This could be because you hit a bug. It is also possible that this binary
Oct 10 15:04:09 ngvm1725 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Oct 10 15:04:09 ngvm1725 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Oct 10 15:04:09 ngvm1725 mysqld: We will try our best to scrape up some info that will hopefully help
Oct 10 15:04:09 ngvm1725 mysqld: diagnose the problem, but since we have already crashed,
Oct 10 15:04:09 ngvm1725 mysqld: something is definitely wrong and this may fail.
Oct 10 15:04:09 ngvm1725 mysqld: Please help us make Percona Server better by reporting any
Oct 10 15:04:09 ngvm1725 mysqld: bugs at http://bugs.percona.com/
Oct 10 15:04:09 ngvm1725 mysqld:
Oct 10 15:04:09 ngvm1725 mysqld: key_buffer_size=33554432
Oct 10 15:04:09 ngvm1725 mysqld: read_buffer_size=4194304
Oct 10 15:04:09 ngvm1725 mysqld: max_used_connections=0
Oct 10 15:04:09 ngvm1725 mysqld: max_threads=1002
Oct 10 15:04:09 ngvm1725 mysqld: thread_count=2
Oct 10 15:04:09 ngvm1725 mysqld: connection_count=0
Oct 10 15:04:09 ngvm1725 mysqld: It is possible that mysqld could use up to
Oct 10 15:04:09 ngvm1725 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6203781 K bytes of memory
Oct 10 15:04:09 ngvm1725 mysqld: Hope that's ok; if not, decrease some variables in the equation.
Oct 10 15:04:09 ngvm1725 mysqld:
Oct 10 15:04:09 ngvm1725 mysqld: Thread pointer: 0x29f7af0
Oct 10 15:04:09 ngvm1725 mysqld: Attempting backtrace. You can use the following information to find out
Oct 10 15:04:09 ngvm1725 mysqld: where mysqld died. If you see no messages after this, something went
Oct 10 15:04:09 ngvm1725 mysqld: terribly wrong...
Oct 10 15:04:09 ngvm1725 mysqld: stack_bottom = 7f4832c319b0 thread_stack 0x30000
Oct 10 15:04:09 ngvm1725 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x8adbfe]
Oct 10 15:04:09 ngvm1725 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x36c)[0x667e7c]
Oct 10 15:04:09 ngvm1725 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7f4ca76d8030]
Oct 10 15:04:09 ngvm1725 mysqld: /usr/sbin/mysqld(_Z10unpack_rowPK14Relay_log_infoP5TABLEjPKhPK9st_bitmapPS5_PmS5_+0x192)[0x86eb32]
Oct 10 15:04:09 ngvm1725 mysqld: /usr/sbin/mysqld(_ZN14Rows_log_event24do_index_scan_and_updateEPK14Relay_log_info+0x1a7)[0x852337]
Oct 10 15:04:09 ngvm1725 mysqld: /usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x37f)[0x85188f]
Oct 10 15:04:09 ngvm1725 mysqld: /usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x67)[0x84abf7]
Oct 10 15:04:09 ngvm1725 mysqld: /usr/sbin/mysqld(_Z26apply_event_and_update_posPP9Log_eventP3THDP14Relay_log_info+0x1a3)[0x87dc53]
Oct 10 15:04:09 ngvm1725 mysqld: /usr/sbin/mysqld(handle_slave_sql+0x8e0)[0x880330]
Oct 10 15:04:09 ngvm1725 mysqld: /usr/sbin/mysqld(pfs_spawn_thread+0x140)[0xaf3e00]
Oct 10 15:04:09 ngvm1725 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7f4ca76cfb50]
Oct 10 15:04:09 ngvm1725 mysqld...

Read more...

Changed in percona-server:
status: Incomplete → New
Revision history for this message
Oscar Elfving (sofam84) wrote :

I have checked the replication log for what operation was in progress when the server segfaulted, it looks like it's row replication on the following table.

#141011 6:17:30 server id 1 end_log_pos 48926394 CRC32 0xaba51a82 Table_map: `magento`.`xxx_tmp_import_prices` mapped to number 1215
# at 48926557
#141011 6:17:30 server id 1 end_log_pos 48934590 CRC32 0xeabdd083 Write_rows: table id 1215
#141011 6:17:30 server id 1 end_log_pos 49048993 CRC32 0x4b1709ad Write_rows: table id 1215
# at 49049156
#141011 6:17:30 server id 1 end_log_pos 49057145 CRC32 0x79e88af1 Write_rows: table id 1215
# at 49057308
#141011 6:17:30 server id 1 end_log_pos 49065322 CRC32 0x309a133b Write_rows: table id 1215
# at 49065485
#141011 6:17:30 server id 1 end_log_pos 49073518 CRC32 0xc0f0bc00 Write_rows: table id 1215
# at 49073681
#141011 6:17:30 server id 1 end_log_pos 49081665 CRC32 0xf1b916d3 Write_rows: table id 1215
# at 49081828
#141011 6:17:30 server id 1 end_log_pos 49089861 CRC32 0x997f2443 Write_rows: table id 1215
# at 49090024
#141011 6:17:30 server id 1 end_log_pos 49098035 CRC32 0x36d469e9 Write_rows: table id 1215
# at 49098198
#141011 6:17:30 server id 1 end_log_pos 49106190 CRC32 0xb4cd15d2 Write_rows: table id 1215
# at 49106353
#141011 6:17:30 server id 1 end_log_pos 49114349 CRC32 0x679b351e Write_rows: table id 1215
# at 49114512
#141011 6:17:30 server id 1 end_log_pos 49122533 CRC32 0xfe031286 Write_rows: table id 1215
# at 49122696
#141011 6:17:30 server id 1 end_log_pos 49130729 CRC32 0x071061ac Write_rows: table id 1215
# at 49130892
#141011 6:17:30 server id 1 end_log_pos 49138918 CRC32 0xc68c1f68 Write_rows: table id 1215
# at 49139081
#141011 6:17:30 server id 1 end_log_pos 49147103 CRC32 0xb598841f Write_rows: table id 1215
# at 49147266
#141011 6:17:30 server id 1 end_log_pos 49155299 CRC32 0x73f7058c Write_rows: table id 1215
# at 49155462
#141011 6:17:30 server id 1 end_log_pos 49163495 CRC32 0x92c7d334 Write_rows: table id 1215
# at 49163658
#141011 6:17:30 server id 1 end_log_pos 49171660 CRC32 0x657ca4a1 Write_rows: table id 1215

And ends abruptly when it segfaults, no rows made it into the table itself.

Revision history for this message
Ben Kochie (bjk) wrote :

I would like to report an additional possible occurance of this bug.

I ran into this bug after upgrading a testing slave from 5.6.17 to 5.6.21. In order to isolate the problem I re-installed the system clean (Debian wheezy) and installed 5.6.21-69.0-675.wheezy. I then restored our dataset from SQL dumps to avoid any binary data problems with 5.6.

About 15 seconds after starting slave replication for the first time the master crashed.

The master I am replicating from is running 5.5.36-34.2-648.wheezy

I have attached the mysqld error log.

Revision history for this message
Jeremy Bowers (cubiclelord) wrote :

I also ran into the bug when testing an upgrade from percona-5.5 to 5.6. And like the previous commenter I reinstalled from a clean slate and imported our data from a .sql file. Is it possible that we only trigger this bug when replicating from 5.5 to 5.6?

Revision history for this message
Oscar Elfving (sofam84) wrote :

In my case, we were running mixed mode replication, when I switched back to statement based replication, the segfaulting stopped.

Revision history for this message
Jeremy Bowers (cubiclelord) wrote : Re: [Bug 1380010] Re: Replication segfaults slave server
Download full text (10.0 KiB)

We've been using row replication, but can't really use statement because of
our routines/triggers.

On Thu, Oct 16, 2014 at 10:32 AM, Oscar Elfving <email address hidden> wrote:

> In my case, we were running mixed mode replication, when I switched back
> to statement based replication, the segfaulting stopped.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1380010
>
> Title:
> Replication segfaults slave server
>
> Status in Percona Server with XtraDB:
> New
>
> Bug description:
> After a while when replication ahs been running, the slave server
> segfaults and restarts, here's the full stacktrace:
>
> #0 0x000000000086eb32 in unpack_row (rli=<optimized out>,
> table=0x7ffb6d9e6400, colcnt=<optimized out>, row_data=0x7ffb6d9a5e90 "",
> cols=0x7ffb6d9b7a98, current_row_end=0x7ffb6d9b7bb0,
> master_reclength=0x7ffb6d9b7b58, row_end=0x7ffb6d9a69d0 "\360\037")
> at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/rpl_record.cc:265
> conv_field = <optimized out>
> f = <optimized out>
> master_null_byte_count = <optimized out>
> pack_ptr = 0x7ffb6d9a5e92 "\t00209-85A\200"
> tabledef = <optimized out>
> conv_table = <optimized out>
> end_ptr = <optimized out>
> null_mask = <optimized out>
> null_bits = <optimized out>
> null_ptr = <optimized out>
> begin_ptr = 0x7ffb6d9e7a50
> field_ptr = 0x7ffb6d9e7a50
> i = <optimized out>
> max_cols = <optimized out>
> #1 0x0000000000853324 in unpack_current_row (cols=0x7ffb6d9b7a98,
> rli=0x1c724d60, this=0x7ffb6d9b7940) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.h:4301
> No locals.
> #2 Write_rows_log_event::write_row (this=this@entry=0x7ffb6d9b7940,
> rli=0x1c724d60, overwrite=false) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:12752
> table = 0x7ffb6d9e6400
> error = <optimized out>
> keynum = 0
> key = {m_ptr = 0x0}
> #3 0x00000000008536b2 in Write_rows_log_event::do_exec_row
> (this=0x7ffb6d9b7940, rli=<optimized out>) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:12970
> error = <optimized out>
> #4 0x00000000008431ef in Rows_log_event::do_apply_row (this=this@entry=0x7ffb6d9b7940,
> rli=rli@entry=0x1c724d60) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:10309
> error = 0
> old_thd = 0x1c755980
> #5 0x000000000085188f in Rows_log_event::do_apply_event
> (this=0x7ffb6d9b7940, rli=0x1c724d60) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:11422
> after_image = <optimized out>
> saved_sql_mode = 107374182...

Revision history for this message
Ben Kochie (bjk) wrote :

I had to restore the database from backup, but I have downgraded to 5.6.20-68.0-656.wheezy and the system is now running correctly. We use MIXED replication.

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

Do you all replicate from 5.5 to 5.6.21? based on the tracks of the private upstream bug in Google:

MySQL Bugs: #74188: 5.7.5+ crashes when receive row ...
bugs.mysql.com/bug.php?id=74188
Oct 2, 2014 - 2 posts - ‎1 author
Description: Crashes debug/release builds of 5.7.5+ Version: ... Rows_log_event::unpack_current_row()[log_event.h:4448] mysqld-debug.exe! ... sql/rpl_record.cc:265 How to repeat: Replicate from 5.5.40 -> 5.7.5 using RBR.

I'd say you hit the same bug in 5.6.21...

Revision history for this message
Ben Kochie (bjk) wrote :
Download full text (10.3 KiB)

Yes. We are testing 5.6 slaves on a production 5.5 cluster before we swap
out the master.
On Oct 16, 2014 8:01 PM, "Valerii Kravchuk" <email address hidden>
wrote:

> Do you all replicate from 5.5 to 5.6.21? based on the tracks of the
> private upstream bug in Google:
>
> MySQL Bugs: #74188: 5.7.5+ crashes when receive row ...
> bugs.mysql.com/bug.php?id=74188
> Oct 2, 2014 - 2 posts - ‎1 author
> Description: Crashes debug/release builds of 5.7.5+ Version: ...
> Rows_log_event::unpack_current_row()[log_event.h:4448] mysqld-debug.exe!
> ... sql/rpl_record.cc:265 How to repeat: Replicate from 5.5.40 -> 5.7.5
> using RBR.
>
> I'd say you hit the same bug in 5.6.21...
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1380010
>
> Title:
> Replication segfaults slave server
>
> Status in Percona Server with XtraDB:
> New
>
> Bug description:
> After a while when replication ahs been running, the slave server
> segfaults and restarts, here's the full stacktrace:
>
> #0 0x000000000086eb32 in unpack_row (rli=<optimized out>,
> table=0x7ffb6d9e6400, colcnt=<optimized out>, row_data=0x7ffb6d9a5e90 "",
> cols=0x7ffb6d9b7a98, current_row_end=0x7ffb6d9b7bb0,
> master_reclength=0x7ffb6d9b7b58, row_end=0x7ffb6d9a69d0 "\360\037")
> at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/rpl_record.cc:265
> conv_field = <optimized out>
> f = <optimized out>
> master_null_byte_count = <optimized out>
> pack_ptr = 0x7ffb6d9a5e92 "\t00209-85A\200"
> tabledef = <optimized out>
> conv_table = <optimized out>
> end_ptr = <optimized out>
> null_mask = <optimized out>
> null_bits = <optimized out>
> null_ptr = <optimized out>
> begin_ptr = 0x7ffb6d9e7a50
> field_ptr = 0x7ffb6d9e7a50
> i = <optimized out>
> max_cols = <optimized out>
> #1 0x0000000000853324 in unpack_current_row (cols=0x7ffb6d9b7a98,
> rli=0x1c724d60, this=0x7ffb6d9b7940) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.h:4301
> No locals.
> #2 Write_rows_log_event::write_row (this=this@entry=0x7ffb6d9b7940,
> rli=0x1c724d60, overwrite=false) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:12752
> table = 0x7ffb6d9e6400
> error = <optimized out>
> keynum = 0
> key = {m_ptr = 0x0}
> #3 0x00000000008536b2 in Write_rows_log_event::do_exec_row
> (this=0x7ffb6d9b7940, rli=<optimized out>) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:12970
> error = <optimized out>
> #4 0x00000000008431ef in Rows_log_event::do_apply_row (this=this@entry=0x7ffb6d9b7940,
> rli=rli@entry=0x1c724d60) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:10309...

Revision history for this message
Oscar Elfving (sofam84) wrote :

We were replicating 5.6.21-69 to 5.6.21-69 when this issue occurred.

Revision history for this message
markus_albe (markus-albe) wrote :

Customer from issue 46906 reports " I do want to note that both servers were 5.6.21, both the slave and the master, so I'm not sure a mix of binary logging was the problem." so Oscar's assertion on #13 seems valid, and the issue is not restricted to 5.5 => 5.6

tags: added: i46906
Revision history for this message
Vince Sweeney (vince-j) wrote :

I get the same issue with 5.6.21-rel69, downgrading to 5.6.20-rel68 fixes the problem.

mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=0
max_threads=4098
thread_count=2
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 42154990 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7eeb25412000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f2727ceb850 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8bd73c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x650e31]
/lib64/libpthread.so.0(+0xf710)[0x7f272a20c710]
/usr/sbin/mysqld(_Z10unpack_rowPK14Relay_log_infoP5TABLEjPKhPK9st_bitmapPS5_PmS5_+0x228)[0x875de8]
/usr/sbin/mysqld(_ZN21Update_rows_log_event11do_exec_rowEPK14Relay_log_info+0x93)[0x84c093]
/usr/sbin/mysqld(_ZN14Rows_log_event12do_apply_rowEPK14Relay_log_info+0x26)[0x84a196]
/usr/sbin/mysqld(_ZN14Rows_log_event24do_index_scan_and_updateEPK14Relay_log_info+0x25b)[0x85849b]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x45e)[0x8576ee]
/usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x5e)[0x8512be]
/usr/sbin/mysqld(_Z26apply_event_and_update_posPP9Log_eventP3THDP14Relay_log_info+0x23e)[0x885f8e]
/usr/sbin/mysqld(handle_slave_sql+0x10eb)[0x8896fb]
/usr/sbin/mysqld(pfs_spawn_thread+0x143)[0xac7c83]
/lib64/libpthread.so.0(+0x79d1)[0x7f272a2049d1]
/lib64/libc.so.6(clone+0x6d)[0x7f272903786d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 1
Status: NOT_KILLED

Revision history for this message
Dave (akxws32zf6g92mu3-dave) wrote :

We also had this exact situation replicating to the matching version number 5.6.21-rel69 in our case we were using row base rep from 5.1 and it had been running for days then failed. Error log contents match examples here.

Confirmed downgrading to rel68 and restarting service / replication things carry on correctly.

Changed in percona-server:
status: New → Incomplete
status: Incomplete → New
Revision history for this message
Ovais Tariq (ovais-tariq) wrote :
Download full text (10.3 KiB)

Hi Nil,

Your test-case is wrong. AFAIK ALTER TABLE is not what causes the crash.
The crashes happen during regular replication when the master runs with RBR.

On Thu, Nov 6, 2014 at 3:27 AM, Nilnandan Joshi <<email address hidden>
> wrote:

> Hi,
>
> I have tried with simple test case. Take a table from 5.5 with DATETIME
> , copied to 5.6 replication (Master -Slave) and tried to ALTER TABLE but
> unable to reproduce this issue. Is it possible for you to provide simple
> test case which can be repeatable?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1380010
>
> Title:
> Replication segfaults slave server
>
> Status in Percona Server with XtraDB:
> New
>
> Bug description:
> After a while when replication ahs been running, the slave server
> segfaults and restarts, here's the full stacktrace:
>
> #0 0x000000000086eb32 in unpack_row (rli=<optimized out>,
> table=0x7ffb6d9e6400, colcnt=<optimized out>, row_data=0x7ffb6d9a5e90 "",
> cols=0x7ffb6d9b7a98, current_row_end=0x7ffb6d9b7bb0,
> master_reclength=0x7ffb6d9b7b58, row_end=0x7ffb6d9a69d0 "\360\037")
> at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/rpl_record.cc:265
> conv_field = <optimized out>
> f = <optimized out>
> master_null_byte_count = <optimized out>
> pack_ptr = 0x7ffb6d9a5e92 "\t00209-85A\200"
> tabledef = <optimized out>
> conv_table = <optimized out>
> end_ptr = <optimized out>
> null_mask = <optimized out>
> null_bits = <optimized out>
> null_ptr = <optimized out>
> begin_ptr = 0x7ffb6d9e7a50
> field_ptr = 0x7ffb6d9e7a50
> i = <optimized out>
> max_cols = <optimized out>
> #1 0x0000000000853324 in unpack_current_row (cols=0x7ffb6d9b7a98,
> rli=0x1c724d60, this=0x7ffb6d9b7940) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.h:4301
> No locals.
> #2 Write_rows_log_event::write_row (this=this@entry=0x7ffb6d9b7940,
> rli=0x1c724d60, overwrite=false) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:12752
> table = 0x7ffb6d9e6400
> error = <optimized out>
> keynum = 0
> key = {m_ptr = 0x0}
> #3 0x00000000008536b2 in Write_rows_log_event::do_exec_row
> (this=0x7ffb6d9b7940, rli=<optimized out>) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:12970
> error = <optimized out>
> #4 0x00000000008431ef in Rows_log_event::do_apply_row (this=this@entry=0x7ffb6d9b7940,
> rli=rli@entry=0x1c724d60) at
> /mnt/workspace/percona-server-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-server-5.6-5.6.21-69.0/sql/log_event.cc:10309
> error = 0
> old_thd = 0x1c755980
> #5 0x000000000085188f in Rows_log_event::do_apply_event
> (this=0x7ffb6d9b7940, rli=0x1c724d60) at
> /mnt...

Revision history for this message
Oscar Elfving (sofam84) wrote :

I have a core dump available if it could help you?

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

What could help more is the output of:

show create table `magento`.`xxx_tmp_import_prices`\G

and additional check if you ALTER this table on master.

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :
Download full text (15.6 KiB)

I see one way to reproduce something very close (if not same) as this bug (based on stack traces). I've used older PS 5.5.36 started like this:

[openxs@centos p5.5]$ bin/mysqld_safe --no-defaults --log-bin --binlog-format=ROW --socket=/tmp/p55.sock --port=3355 &

to generate binary log and then replayed it on 5.6.21-70 to get a crash:

[openxs@centos p5.5]$ bin/mysql --no-defaults --socket=/tmp/p55.sock test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.5.36-34.0-log Source distribution

Copyright (c) 2009-2014 Percona LLC and/or its affiliates
Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show master status;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| centos-bin.000015 | 107 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> drop table if exists ttt;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> create table ttt (
    -> c1 char(250),
    -> c2 time not null
    -> ) engine=myisam;
Query OK, 0 rows affected (0.05 sec)

mysql> insert into ttt(c1) values (1000*rand()), (1000*rand()), (1000*rand()), (1000*rand());
Query OK, 4 rows affected, 1 warning (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 1

mysql> show warnings;
+---------+------+-----------------------------------------+
| Level | Code | Message |
+---------+------+-----------------------------------------+
| Warning | 1364 | Field 'c2' doesn't have a default value |
+---------+------+-----------------------------------------+
1 row in set (0.00 sec)

mysql> insert into ttt(c1) select 1000*rand() from ttt t1, ttt t2, ttt t3;
Query OK, 64 rows affected, 1 warning (0.01 sec)
Records: 64 Duplicates: 0 Warnings: 1

mysql> delete from ttt order by rand();
Query OK, 68 rows affected (0.01 sec)

mysql> flush logs;
Query OK, 0 rows affected (0.05 sec)

mysql> show master status;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| centos-bin.000016 | 107 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> exit
Bye
[openxs@centos p5.5]$ mysqlbinlog data/centos-bin.000015 | mysql -uroot test
ERROR 2013 (HY000) at line 66: Lost connection to MySQL server during query
[openxs@centos p5.5]$ su
Password:
[root@centos p5.5]# tail -120 /var/log/mysqld.log
CONFIG: use_epoll=1
CONFIG: readsize=0
CONFIG: conn_per_thread=1024(default)
CO...

Eddie Stassen (estassen)
description: updated
Revision history for this message
Peiran Song (peiran-song) wrote :

Laurynas,

This is another bug on slave crashing with the same version 5.6.21 - 69, https://bugs.launchpad.net/percona-server/+bug/1381526. The stack trace is not exactly the same. Please see if the issues are related. Thanks.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Peiran, possible that they are related, but too little information on 1381526 to confirm.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Everybody affected by this bug -

If you are comfortable with building the server from source and if you were not affected by http://bugs.mysql.com/bug.php?id=72610, please set up a test slave built from lp:~laurynas-biveinis/percona-server/bug1380010. Does it still crash?

The change on that branch is a revert of fix for 72610. I am looking into whether it is feasible to have a fix that does not revert it.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

I have pushed a new branch at lp:~laurynas-biveinis/percona-server/bug1380010 that does not revert the fix of 72610.

tags: added: upstream
Revision history for this message
Ben Kochie (bjk) wrote :

Confirmed fixed - Deployed latest percona-server-server-5.6 (5.6.21-70.1-698.wheezy) on production canary, replicating from percona-server 5.5. No more crashes.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Upstream fix in 5.6.22:

$ bzr log -r 6230
------------------------------------------------------------
revno: 6230
committer: Venkatesh Duggirala<email address hidden>
branch nick: mysql-5.6
timestamp: Thu 2014-10-30 22:33:43 +0530
message:
  Bug#19704825 TEMPORARY SLAVE TYPE CONVERSION TABLES RELEASED TO EARLY

  Problem: The memory used in preparing slave type conversion temporary table
  is getting released early and causing unexpected results

  Analysis: As part of bug#18770469 fix, We introduced an event
  m_event_mem_root (a special mem root), added to Log_event
  class. While server is creating the temporary table, the memory needed
  is allocated from this special mem root which will be freed in ~Log_event()
  i.e., the scope of this memroot is one event. But it could happen
  that in some cases, server might need to access this
  conversion temporary table for next following event.
  For eg: A nested row event (insert is causing insert in a trigger)
  In this situation, the memory is getting delayed too early
  and causing issues when the server is trying to access the temporary
  table inside the trigger.

  Fix: We cannot use a mem_root whose scope is limited to an event
  execution in this situation. With some further analysis, found out
  that clearing a thd->mem_root at the end of statement (upon applying
  an event which has STMT_END_F flag) will solve out of memory problem
  while running a long transactions (bug#18770469) and will also
  make this reported problem (memory is getting released early) to go away.

tags: added: i48098
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-384

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.