Slave crash with malformed binlog event
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Percona Server moved to https://jira.percona.com/projects/PS |
Fix Released
|
High
|
Vlad Lesin | |
| 5.5 |
New
|
Undecided
|
Unassigned | |
| 5.6 |
Fix Released
|
High
|
Vlad Lesin |
Bug Description
The fix for https:/
mysqlbinlog attempts on binlog from surviving (master) node show:
"ERROR: Error in Log_event:
ERROR: Could not read entry at offset 216385052: Error in log format or read error.
WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output."
Customer added:
If I look in the error log of a crashed node I see that the last applied Xid is:
"2015-06-11 09:51:25 48997 [ERROR] WSREP: applier could not read binlog event, seqno: 12575799, len: 551
2015-06-11 09:51:25 48997 [Warning] WSREP: Failed to apply app buffer: seqno: 12575799, status: 1
at galera/
I searched for that Xid in binary logs on the primary and could not find it. The last one to be displayed 12575798.
So, given the fix for lp:1438990 was related to adding a table map, I'm assuming that it could have introduced a bug where it would not add a STMT_END_F under certain conditions.
Customer is working to isolate steps to reproduce, I will update here when we have more details.
Changed in percona-server: | |
assignee: | nobody → Vlad Lesin (vlad-lesin) |
Vlad Lesin (vlad-lesin) wrote : | #1 |
Vlad Lesin (vlad-lesin) wrote : | #2 |
I have simplified the test case a little bit more:
DROP TABLE IF EXISTS `t`;
CREATE TABLE `t` (
`f1` int(10) unsigned NOT NULL,
PRIMARY KEY (`f1`)
) ENGINE=InnoDB;
DROP TABLE IF EXISTS `t1`;
CREATE TABLE `t1` (
`f1` int(10) unsigned NOT NULL,
PRIMARY KEY (`f1`)
) ENGINE=InnoDB;
DROP TRIGGER IF EXISTS t_insert_trig;
DELIMITER ;;
CREATE TRIGGER t_insert_trig AFTER INSERT ON t
FOR EACH ROW
BEGIN
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
ROLLBACK TO savepoint_1;
INSERT INTO t1 VALUES(NEW.f1); //<------- Crash Line
END;
SAVEPOINT savepoint_1; //<----
INSERT INTO dummy VALUES (1);
END ;;
DELIMITER ;
INSERT INTO `t` VALUES (2);
INSERT INTO `t` VALUES (3); //<-------SECOND INSERT Line
"Crash Line" is the line which absence cause the crash. If this line is commented out we have crash.
Why trans->ha_list does not contain binlog handlerton item to initialize it's per-engine savepoint data.
So there are two cases. The one is without crash (Case_Good) when "Crash Line" presents, the other one is with crash (Case_Crash) when "Crash Line" absents. The question is why thd->transactio
1) Consider Case_Good. The question can be restated as: "Why thd->transactio
Consider events after executing first ha_savepoint() and ha_rollback_
a) binlog handlerton is set in write_locked_
b) thd->binlog_
#0 THD::clear_
at ./sql/sql_
#1 0x0000000000a47b30 in binlog_
at ./sql/binlog.cc:499
#2 0x0000000000a3280f in binlog_
end_event=
#3 0x0000000000a406e3 in MYSQL_BIN_
at ./sql/binlog.
#4 0x0000000000637b29 in ha_commit_trans (thd=0x20adb10, all=false, ignore_
at ./sql/handler.
#5 0x00000000008aa3d2 in trans_commit_stmt (thd=0x20adb10)
at ./sql/transacti
#6 0x00000000007dcace in mysql_execute_
at ./sql/sql_
#7 0x00000000007dfe00 in mysql_parse (thd=0x20adb10, rawbuf=
length=26, parser_
#8 0x00000000007d1da0 in dispatch_command (command=COM_QUERY, thd=0x20adb10,
packet=0x2193b61 "INSERT INTO `t` VALUES (2)", packet_length=26)
at ./sql/sql_
#9 0x00000000007d0c55 in do_command (thd=0x20adb10)
at ./sql/sql_
#10 0x000000000079a5d8 in do_handle_
at ./sql/sql_
#11 0x000000000079a0ef in handle_
at ./sql/sql_
#12 0x0000000000da1942 in pfs_spawn_thread (arg=0x214f5c0)
at ./storage/
#13 0x00007ffff737d182 in start_thread (arg=0x7fffdc0d
Shahriyar Rzayev (rzayev-sehriyar) wrote : | #3 |
Percona now uses JIRA for bug reports so this bug report is migrated to: https:/
The following analysis is based on the code revision number c250ec9791ed6c7 695ae6a191397a3 b65db2e994:
Analysis #1.
1) The assertion takes place on debug build here:
(gdb) bt sysdeps/ unix/sysv/ linux/raise. c:56 assertion@ entry=0x10886c9 "is_binlog_ empty() ", file=file@ entry=0x1088548 "./sql/binlog.cc", entry=433, function@ entry=0x108cdc0 <binlog_ cache_data: :reset( )::PRETTY_ FUNCTION> "virtual void binlog_ cache_data: :reset( )") at assert.c:92 0x10886c9 "is_binlog_ empty() ", cache_data: :reset( )::PRETTY_ FUNCTION> "virtual void binlog_ cache_data: :reset( )") at assert.c:101 cache_data: :reset (this=0x7fff980 24fe0) at ./sql/binlog.cc:433 trx_cache_ data::reset (this=0x7fff980 24fe0) at ./sql/binlog.cc:648 cache_data: :flush (this=0x7fff980 24fe0, thd=0x20adb10, bytes_written= 0x7fffdc0d14b8, wrote_xid= 0x7fffdc0d14eb) cc:1270 cache_mngr: :flush (this=0x7fff980 24e70, thd=0x20adb10, bytes_written= 0x7fffdc0d14f0, wrote_xid= 0x7fffdc0d14eb) LOG::flush_ thread_ caches (this=0x1859dc0 <mysql_bin_log>, thd=0x20adb10) cc:6779 LOG::process_ flush_stage_ queue (this=0x1859dc0 <mysql_bin_log>, total_bytes_ var=0x7fffdc0d1 5f0, var=0x7fffdc0d1 5de, out_queue_ var=0x7fffdc0d1 5f8) at ./sql/binlog. cc:6835 5700) at pthread_ create. c:312 unix/sysv/ linux/x86_ 64/clone. S:111
#0 0x00007ffff67c6cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/
#1 0x00007ffff67ca0d8 in __GI_abort () at abort.c:89
#2 0x00007ffff67bfb86 in __assert_fail_base (fmt=0x7ffff6910830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=
line=line@
function=
#3 0x00007ffff67bfc32 in GI_assert_fail (assertion=
file=0x1088548 "./sql/binlog.cc", line=433,
function=0x108cdc0 <binlog_
#4 0x0000000000a47967 in binlog_
#5 0x0000000000a47d2c in binlog_
#6 0x0000000000a32a7e in binlog_
at ./sql/binlog.
#7 0x0000000000a4827a in binlog_
at ./sql/binlog.cc:778
#8 0x0000000000a40b63 in MYSQL_BIN_
at ./sql/binlog.
#9 0x0000000000a40cfc in MYSQL_BIN_
rotate_
#21 0x00007ffff737d182 in start_thread (arg=0x7fffdc0d
#22 0x00007ffff688a47d in clone () at ../sysdeps/
binlog_ cache_data: :is_binlog_ empty() returns false because binlog_ cache_data: :cache_ log.pos_ in_file have some suspicious value (10344644715844 964239) .
2) binlog_ cache_data: :cache_ log.pos_ in_file is set from here:
Old value = 368 64239 4db48, type=WRITE_CACHE, seek_offset= 103446447158449 64239, use_async_io=0 '\000', clear_cache=0 '\000') mf_iocache. c:387 not_done= 1; 4db48, type=WRITE_CACHE, seek_offset= 103446447158449 64239, use_async_io=0 '\000', clear_cache=0 '\000') mf_iocache. c:387 cache_data: :truncate (this=0x7fff940 4db40, pos=10344644715 844964239) trx_cache_ data::restore_ savepoint (this=0x7fff940 4db40, pos=10344644715 844964239) savepoint_ rollback (hton=0x188ca40, thd=0x20b38c0, sv=0x7fff94007040) cc:1900 to_savepoint (thd=0x20b38c0, sv=0x7fff94007010) cc:2132 to_savepoint (thd=0x20b38c0, name=...) at ./...
New value = 103446447158449
reinit_io_cache (info=0x7fff940
at ./mysys/
387 info->seek_
(gdb) bt
#0 reinit_io_cache (info=0x7fff940
at ./mysys/
#1 0x0000000000a47aaa in binlog_
at ./sql/binlog.cc:486
#2 0x0000000000a48011 in binlog_
at ./sql/binlog.cc:691
#3 0x0000000000a34368 in binlog_
at ./sql/binlog.
#4 0x00000000006391c7 in ha_rollback_
at ./sql/handler.
#5 0x00000000008aac33 in trans_rollback_