Sporadic percona_changed_page_bmp testcase warnings

Bug #1235730 reported by Laurynas Biveinis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
Low
Laurynas Biveinis
5.1
Invalid
Low
Unassigned
5.5
Invalid
Low
Unassigned
5.6
Fix Released
Low
Laurynas Biveinis

Bug Description

percona_changed_page_bmp may fail on loaded hosts with the following extra warning in the error log:

2013-10-05 19:17:51 19199 [ERROR] InnoDB: We scanned the log up to 29651456. A checkpoint was at 29651809 and the maximum LSN on a database page was 0. It is possible that the database is now corrupt!

This warning happens during a server startup of one of the server restarts by the testcase (4th restart in the testcase). This restart is preceded by a relatively big data write, and the error log shows that the server shutdown did not fully complete, but rather was killed by MTR shutdown_server timeout.

It is likely that this is caused by buffer pool flush not being able to flush all the written data in the allocated time.

The fix is to increase the shutdown_server timeout value to allow for a clean shutdown.

Tags: bitmap ci xtradb

Related branches

tags: added: bitmap ci xtradb
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Increasing timeouts uncovered a shutdown deadlock instead. The 10 second timeout still might be too short for loaded hosts, but this is best kept untouched for now.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

And fixing the deadlock showed that the 10 second timeout is still too low.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

This still appears to be a problem in some 5.6-param builds. Examples:

http://jenkins.percona.com/view/PS%205.6/job/percona-server-5.6-param/404/
http://jenkins.percona.com/view/PS%205.6/job/percona-server-5.6-param/407/

Note that the first one was for a branch without recent fixes to innodb_log_checksum_algorith, and the second one was for a branch with those fixes.

Not sure if this is the same underlying issue, feel free to fork into a separate bug if it's not.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

It might be a coincidence, but I don't think I saw any sporadic percona_changed_page_bmp failures on the branches that have https://code.launchpad.net/~laurynas-biveinis/percona-server/bug1239062/+merge/193885 merged (I have done five or so runs on different branches that have that MP merged).

Thus my plan would be to revisit this issue once the bug 1239062 fix is merged to trunk.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Bug 1239062 fix has been merged to trunk. I will keep an eye on trunk Jenkins results. Feel free to let me know of any param branch percona_changed_page_bmp failures, if these occur on branches that have lp:percona-server rev 503.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Two param build and one and a half trunk Jenkins runs later not a single percona_changed_page_bmp failure. Based on this and on the previous runs on feature branches that had bug 1239062 fix, I am leaving this bug for the original deadlock issue fixed in the 5.6.13-61.0 release, and re-closing it.

Should this occur again, let's open a new bug for this.

Thanks!

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-2044

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.