InnoDB: Failing assertion: log_bmp_sys->parse_buf_end == log_bmp_sys->parse_buf in log_online_follow_log_group at log/log0online.cc:1092 | [ERROR] InnoDB: log block checksum mismatch: expected 1214088793, calculated checksum 1214079616

Bug #1318453 reported by Roel Van de Paar
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Hrvoje Matijakovic
5.1
Invalid
Undecided
Unassigned
5.5
Invalid
Undecided
Unassigned
5.6
Fix Released
High
Hrvoje Matijakovic

Bug Description

================== Error Log
2014-05-10 15:05:15 16242 [Note] /ssd/Percona-Server-5.6.17-rel65.0-592-debug.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.6.17-65.0-rel65.0-debug-log' socket: '/ssd/863939/current1_6/tmp/master.sock' port: 13100 Percona Server with XtraDB (GPL), Release rel65.0, Revision 592-debug
2014-05-10 22:04:40 16242 [ERROR] InnoDB: log block checksum mismatch: expected 1214088793, calculated checksum 1214079616
2014-05-10 22:04:41 7fd287c4a700 InnoDB: Assertion failure in thread 140542197671680 in file log0online.cc line 1092
InnoDB: Failing assertion: log_bmp_sys->parse_buf_end == log_bmp_sys->parse_buf

================== Backtrace
Thread 1 (Thread 0x7fd287c4a700 (LWP 16255)):
+bt
#0 0x00007fd2b95818ac in pthread_kill () from /lib64/libpthread.so.0
#1 0x000000000068e9ae in handle_fatal_signal (sig=6) at /bzr/percona-server-5.6.17-65.0-tokudb-7.1.6_dbg/sql/signal_handler.cc:254
#2 <signal handler called>
#3 0x00007fd2b7be3925 in raise () from /lib64/libc.so.6
#4 0x00007fd2b7be5105 in abort () from /lib64/libc.so.6
#5 0x0000000000a64152 in log_online_follow_log_group () at /bzr/percona-server-5.6.17-65.0-tokudb-7.1.6_dbg/storage/innobase/log/log0online.cc:1092
#6 log_online_follow_redo_log () at /bzr/percona-server-5.6.17-65.0-tokudb-7.1.6_dbg/storage/innobase/log/log0online.cc:1245
#7 0x0000000000b10925 in srv_redo_log_follow_thread (arg=<value optimized out>) at /bzr/percona-server-5.6.17-65.0-tokudb-7.1.6_dbg/storage/innobase/srv/srv0srv.cc:2311
#8 0x00007fd2b957c9d1 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fd2b7c99b6d in clone () from /lib64/libc.so.6

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

RQG: 960 <email address hidden>
PS: percona-server-5.6.17-65.0-tokudb-7.1.6 (with TokuDB Engine added)

Revision history for this message
Roel Van de Paar (roel11) wrote :

Crash seen only one time in about 25 trials, so likely not that big an issue.

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

A log block checksum mismatch in the redo log thread will result in this assert. log_online_follow_log_seg returns early if !log_online_is_valid_log_seg, leaving unparsed data and hitting the assert.

But another question is why are we getting log block checksum mismatches in the first place.

tags: added: bitmap xtradb
Revision history for this message
Ramesh Sivaraman (rameshvs02) wrote :

Found many "log block checksum mismatch" ALARMs ( 18 out of 23 trial) in PS-5.6 optimized RQG run.

===> vi trial100.log # cd vardir1_100; vi ./log/master.err; # vi vardir1_100/log/master.err <===
vardir1_100/log/master.err:2014-08-11 09:35:56 8849 [ERROR] InnoDB: log block checksum mismatch: expected 1869073046, calculated checksum 3773599344
vardir1_100/log/master.err:2014-08-11 09:35:56 8849 [ERROR] InnoDB: current InnoDB log checksum type: crc32, detected log checksum type: innodb
vardir1_100/log/master.err:2014-08-11 09:35:56 8849 [ERROR] InnoDB: STRICT method was specified for innodb_log_checksum, so we intentionally assert here.
===>

Version info
----------------
PS: Percona-Server-5.6.19-rel67.0-648.Linux.x86_64
RQG : 988 <email address hidden>

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

This is likely to be caused by InnoDB log block checksum algorithm variable being dynamic, thus it changes at some LSN. The redo log thread is catching up from some LSN2 < LSN, and attempting to use the new setting for the data written using the new algorithm.

The fix would be to be liberal in what checksum algorithm may apply in the old log block checksum. The exact tracking of LSN ranges and their associated algorithms is impractical.

Finally, on what evidence was this triaged for 5.1 and 5.5?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

OK, I triaged this myself for 5.1/5.5, which was a mistake.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Discussed with Laurynas. Not severe bug due to users not constantly changing log block checksum algorithm var like RQG does + you need log tracker running at same time. Triage ok.

Revision history for this message
Roel Van de Paar (roel11) wrote :

We're seeing this in other ways too in RQG runs (note the completely different code location!)

2014-09-10 09:11:58 31831 [ERROR] InnoDB: log block checksum mismatch: expected 658738216, calculated checksum 3735928559
2014-09-10 09:11:58 31831 [ERROR] InnoDB: current InnoDB log checksum type: none, detected log checksum type: innodb
2014-09-10 09:11:58 31831 [ERROR] InnoDB: STRICT method was specified for innodb_log_checksum, so we intentionally assert here.
2014-09-10 09:11:58 7f6b627ff700 InnoDB: Assertion failure in thread 140099190781696 in file ha_innodb.cc line 18507

And this is causing a lot of trials to fail with STATUS_ALARM (nearly 300!). Adding qablock as this is a serious blocker.

Would be great to see this one fixed or to have good workaround on how to disable this in grammars (provided it does not disable too much test functionality)

tags: added: qablock
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Roel, what is the stacktrace of #15 ?

Revision history for this message
Ramesh Sivaraman (rameshvs02) wrote :

bt of "log block checksum mismatch" error

Revision history for this message
Ramesh Sivaraman (rameshvs02) wrote :

Fulll bundle with all files (core, gdb stacks, error log, vardir and more)

Revision history for this message
Roel Van de Paar (roel11) wrote :

Another testcase in a bug marked as duplicate of this one - https://bugs.launchpad.net/percona-server/+bug/1384578 (3289_bundle.tar.gz)

Revision history for this message
Roel Van de Paar (roel11) wrote :

s/testcase/core dump and co/

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Proposing to make this a doc bug to document the migration procedure to a strict log block checksum as follows:
- note the current LSN;
- set the log block checksum algorithm to the non-strict version of the desired algorithm;
- wait until the current LSN advances to at least the previous LSN + log capacity;
- set the log block checksum algorithm to the strict version of the desired algorithm.

This ensures that any log data written using the previous algorithm is fully overwritten before strictness becomes effective.

Then I propose that no code changes would be done. Any change we'd do, would be equivalent to making the "strict" version less strict, and for that we have the non-strict version in the first place.

tags: added: doc
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Removing qablock. The QA implication is that any test manipulation of the variable must conform to the above

tags: removed: qablock
Revision history for this message
Hrvoje Matijakovic (hrvojem) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

Laurynas, is the documentation now outdated because of https://bugs.mysql.com/bug.php?id=75968 ?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Yes, please file a doc bug

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-787

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.