Slave crash with malformed binlog event

Bug #1464468 reported by markus_albe
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Vlad Lesin
5.5
New
Undecided
Unassigned
5.6
Fix Released
High
Vlad Lesin

Bug Description

The fix for https://bugs.launchpad.net/percona-server/+bug/1438990 seems to have introduced a bug of it's own:

mysqlbinlog attempts on binlog from surviving (master) node show:

"ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 2180319, event_type: 0
ERROR: Could not read entry at offset 216385052: Error in log format or read error.
WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output."

Customer added:
If I look in the error log of a crashed node I see that the last applied Xid is:
"2015-06-11 09:51:25 48997 [ERROR] WSREP: applier could not read binlog event, seqno: 12575799, len: 551
2015-06-11 09:51:25 48997 [Warning] WSREP: Failed to apply app buffer: seqno: 12575799, status: 1
at galera/src/trx_handle.cpp:apply():351"

I searched for that Xid in binary logs on the primary and could not find it. The last one to be displayed 12575798.

So, given the fix for lp:1438990 was related to adding a table map, I'm assuming that it could have introduced a bug where it would not add a STMT_END_F under certain conditions.

Customer is working to isolate steps to reproduce, I will update here when we have more details.

Tags: i51914
Changed in percona-server:
assignee: nobody → Vlad Lesin (vlad-lesin)
Revision history for this message
Vlad Lesin (vlad-lesin) wrote :
Download full text (12.2 KiB)

The following analysis is based on the code revision number c250ec9791ed6c7695ae6a191397a3b65db2e994:

Analysis #1.

1) The assertion takes place on debug build here:

(gdb) bt
#0 0x00007ffff67c6cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff67ca0d8 in __GI_abort () at abort.c:89
#2 0x00007ffff67bfb86 in __assert_fail_base (fmt=0x7ffff6910830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x10886c9 "is_binlog_empty()", file=file@entry=0x1088548 "./sql/binlog.cc",
line=line@entry=433,
function=function@entry=0x108cdc0 <binlog_cache_data::reset()::PRETTY_FUNCTION> "virtual void binlog_cache_data::reset()") at assert.c:92
#3 0x00007ffff67bfc32 in GI_assert_fail (assertion=0x10886c9 "is_binlog_empty()",
file=0x1088548 "./sql/binlog.cc", line=433,
function=0x108cdc0 <binlog_cache_data::reset()::PRETTY_FUNCTION> "virtual void binlog_cache_data::reset()") at assert.c:101
#4 0x0000000000a47967 in binlog_cache_data::reset (this=0x7fff98024fe0) at ./sql/binlog.cc:433
#5 0x0000000000a47d2c in binlog_trx_cache_data::reset (this=0x7fff98024fe0) at ./sql/binlog.cc:648
#6 0x0000000000a32a7e in binlog_cache_data::flush (this=0x7fff98024fe0, thd=0x20adb10, bytes_written=0x7fffdc0d14b8, wrote_xid=0x7fffdc0d14eb)
at ./sql/binlog.cc:1270
#7 0x0000000000a4827a in binlog_cache_mngr::flush (this=0x7fff98024e70, thd=0x20adb10, bytes_written=0x7fffdc0d14f0, wrote_xid=0x7fffdc0d14eb)
at ./sql/binlog.cc:778
#8 0x0000000000a40b63 in MYSQL_BIN_LOG::flush_thread_caches (this=0x1859dc0 <mysql_bin_log>, thd=0x20adb10)
at ./sql/binlog.cc:6779
#9 0x0000000000a40cfc in MYSQL_BIN_LOG::process_flush_stage_queue (this=0x1859dc0 <mysql_bin_log>, total_bytes_var=0x7fffdc0d15f0,
rotate_var=0x7fffdc0d15de, out_queue_var=0x7fffdc0d15f8) at ./sql/binlog.cc:6835
#21 0x00007ffff737d182 in start_thread (arg=0x7fffdc0d5700) at pthread_create.c:312
#22 0x00007ffff688a47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

binlog_cache_data::is_binlog_empty() returns false because binlog_cache_data::cache_log.pos_in_file have some suspicious value (10344644715844964239).

2) binlog_cache_data::cache_log.pos_in_file is set from here:

Old value = 368
New value = 10344644715844964239
reinit_io_cache (info=0x7fff9404db48, type=WRITE_CACHE, seek_offset=10344644715844964239, use_async_io=0 '\000', clear_cache=0 '\000')
at ./mysys/mf_iocache.c:387
387 info->seek_not_done=1;
(gdb) bt
#0 reinit_io_cache (info=0x7fff9404db48, type=WRITE_CACHE, seek_offset=10344644715844964239, use_async_io=0 '\000', clear_cache=0 '\000')
at ./mysys/mf_iocache.c:387
#1 0x0000000000a47aaa in binlog_cache_data::truncate (this=0x7fff9404db40, pos=10344644715844964239)
at ./sql/binlog.cc:486
#2 0x0000000000a48011 in binlog_trx_cache_data::restore_savepoint (this=0x7fff9404db40, pos=10344644715844964239)
at ./sql/binlog.cc:691
#3 0x0000000000a34368 in binlog_savepoint_rollback (hton=0x188ca40, thd=0x20b38c0, sv=0x7fff94007040)
at ./sql/binlog.cc:1900
#4 0x00000000006391c7 in ha_rollback_to_savepoint (thd=0x20b38c0, sv=0x7fff94007010)
at ./sql/handler.cc:2132
#5 0x00000000008aac33 in trans_rollback_to_savepoint (thd=0x20b38c0, name=...) at ./...

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :
Download full text (13.1 KiB)

I have simplified the test case a little bit more:

DROP TABLE IF EXISTS `t`;
CREATE TABLE `t` (
`f1` int(10) unsigned NOT NULL,
PRIMARY KEY (`f1`)
) ENGINE=InnoDB;

DROP TABLE IF EXISTS `t1`;
CREATE TABLE `t1` (
`f1` int(10) unsigned NOT NULL,
PRIMARY KEY (`f1`)
) ENGINE=InnoDB;

DROP TRIGGER IF EXISTS t_insert_trig;

DELIMITER ;;
CREATE TRIGGER t_insert_trig AFTER INSERT ON t
FOR EACH ROW
BEGIN

DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
ROLLBACK TO savepoint_1;
INSERT INTO t1 VALUES(NEW.f1); //<------- Crash Line
END;

SAVEPOINT savepoint_1; //<---------SAVEPOINT Line

INSERT INTO dummy VALUES (1);

END ;;
DELIMITER ;

INSERT INTO `t` VALUES (2);
INSERT INTO `t` VALUES (3); //<-------SECOND INSERT Line

"Crash Line" is the line which absence cause the crash. If this line is commented out we have crash.

Why trans->ha_list does not contain binlog handlerton item to initialize it's per-engine savepoint data.

So there are two cases. The one is without crash (Case_Good) when "Crash Line" presents, the other one is with crash (Case_Crash) when "Crash Line" absents. The question is why thd->transaction.stmt.ha_list does not contain item which corresponds to binlog_handlerton for Case_Crash?

1) Consider Case_Good. The question can be restated as: "Why thd->transaction.stmt.ha_list contains item which corresponds to binlog_handlerton for the Case_Good?"

Consider events after executing first ha_savepoint() and ha_rollback_to_savepoint() calls.

a) binlog handlerton is set in write_locked_table_maps() if thd->binlog_table_maps is 0.
b) thd->binlog_table_maps is set to 0 only if binlog_cache_data::m_pending is not 0

#0 THD::clear_binlog_table_maps (this=0x20adb10)
at ./sql/sql_class.h:2646
#1 0x0000000000a47b30 in binlog_cache_data::flush_pending_event (this=0x7fffa000e660, thd=0x20adb10)
at ./sql/binlog.cc:499
#2 0x0000000000a3280f in binlog_cache_data::finalize (this=0x7fffa000e660, thd=0x20adb10,
end_event=0x7fffdc0d16d0) at ./sql/binlog.cc:1207
#3 0x0000000000a406e3 in MYSQL_BIN_LOG::commit (this=0x1859dc0 <mysql_bin_log>, thd=0x20adb10, all=false)
at ./sql/binlog.cc:6675
#4 0x0000000000637b29 in ha_commit_trans (thd=0x20adb10, all=false, ignore_global_read_lock=false)
at ./sql/handler.cc:1513
#5 0x00000000008aa3d2 in trans_commit_stmt (thd=0x20adb10)
at ./sql/transaction.cc:436
#6 0x00000000007dcace in mysql_execute_command (thd=0x20adb10)
at ./sql/sql_parse.cc:5573
#7 0x00000000007dfe00 in mysql_parse (thd=0x20adb10, rawbuf=0x7fffa0004ff0 "INSERT INTO `t` VALUES (2)",
length=26, parser_state=0x7fffdc0d41d0) at ./sql/sql_parse.cc:6955
#8 0x00000000007d1da0 in dispatch_command (command=COM_QUERY, thd=0x20adb10,
packet=0x2193b61 "INSERT INTO `t` VALUES (2)", packet_length=26)
at ./sql/sql_parse.cc:1442
#9 0x00000000007d0c55 in do_command (thd=0x20adb10)
at ./sql/sql_parse.cc:1054
#10 0x000000000079a5d8 in do_handle_one_connection (thd_arg=0x20adb10)
at ./sql/sql_connect.cc:1532
#11 0x000000000079a0ef in handle_one_connection (arg=0x20adb10)
at ./sql/sql_connect.cc:1443
#12 0x0000000000da1942 in pfs_spawn_thread (arg=0x214f5c0)
at ./storage/perfschema/pfs.cc:1860
#13 0x00007ffff737d182 in start_thread (arg=0x7fffdc0d5700) at pthread_create.c:3...

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-908

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.