Percona Server moved to https://jira.percona.com/projects/PS

MySQL crashes on ERROR: the age of the last checkpoint is..

Bug #1417810 reported by mig5 on 2015-02-03

This bug affects 2 people

	Status	Importance	Assigned to
Percona Server moved to https://jira.percona.com/projects/PS	Status tracked in 5.7
5.1	Won't Fix	Undecided	Unassigned
5.5	Triaged	High	Unassigned
5.6	New	Undecided	Unassigned
5.7	New	Undecided	Unassigned

Bug Description

I have a server running Debian 6 (Squeeze) and Percona 5.5.

Logs indicate that for a while, there's been a warning that:

150202 22:19:18 InnoDB: ERROR: the age of the last checkpoint is 9448357,
InnoDB: which exceeds the log group capacity 9433498.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.

I am aware of what this means and that the innodb_log_buffer_size needs to be increased. Unfortunately no-one had noticed this alert on this server :/

But since the upgrade to 5.5.41-rel37.0-727.squeeze, eventually these error alerts in the log seem to culminate in a giant MySQL crash, which didn't used to happen. These lines, for example, appear right after the last checkpoint alert as above (but there had been many preceding it):

150202 22:19:18 InnoDB: Assertion failure in thread 1420909424 in file sync0rw.c line 569
InnoDB: Failing assertion: !lock->recursive
InnoDB: We intentionally generate a memory trap.

A full output of the log is attached (from the last checkpoint error, and onward). The corruption appeared to fix itself on this occasion (without anyone even noticing), but the next day it occurred again and we ended up with a totally corrupt database system after this. All I know is that I've not seen MySQL fatally crash on something like the checkpoint alert before. So unsure if it's a regression in the latest Percona 5.5, which we'd upgrade to a few days before.

System is 3.18.3-x86_64-linode51 #1 SMP Fri Jan 23 09:57:22 EST 2015 x86_64 GNU/Linux

See original description

Revision history for this message

mig5 (mig5) wrote on 2015-02-03:

mysql.txt Edit (51.8 KiB, text/plain)

description:

updated

Revision history for this message

mig5 (mig5) wrote on 2015-02-04:

mysql2.txt Edit (6.3 KiB, text/plain)

This error has occurred again for us today despite having made the innodb_log_buffer_size change. The error just appears out of the blue in the log (attached).

I have found it occurring on other servers also only since running 5.5.41-rel37.0-727.squeeze

Revision history for this message

David Bennett (dbpercona) wrote on 2015-02-04:

Can you share the output of:

mysql -u root -p -e "show variables like 'innodb_%'" | grep size

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

Here you go:

innodb_additional_mem_pool_size 8388608
innodb_buffer_pool_size 2147483648
innodb_dict_size_limit 0
innodb_ibuf_max_size 1073725440
innodb_log_block_size 512
innodb_log_buffer_size 8388608
innodb_log_file_size 104857600
innodb_max_bitmap_file_size 104857600
innodb_merge_sort_block_size 1048576
innodb_page_size 16384
innodb_purge_batch_size 20

Only the innodb_log_file_size has changed (since the crash), the other InnoDB variables in configuration have not

I should add, the only other change that's happened on the system (other than libc upgrade due to GHOST vuln) was the machine was rebooted into a newer kernel (3.18.3-x86_64-linode51) around the same time that Percona was upgraded to 5.5.41-rel37.0-727.squeeze

I am going to experiment with booting back into an older kernel to see if that's the issue. In the meantime is there any documentation on how to downgrade to older Percona version in Debian - I can't find the .deb's of older releases in the repo

Thanks

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

NM on the last question. I found the deb packages. Will hold off on downgrading though to 5.5.40, to see if older kernel fixes it.

Revision history for this message

Roel Van de Paar (roel11) wrote on 2015-02-05:

@mig5 - what steps did you follow to 1) change innodb_log_file_size and 2) upgrade Percona Server? Please list all steps done. What version did you upgrade from?

Note;
Query (4f8fa018): is an invalid pointer

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

@roel11

The upgrade was performed on the 16th January at 05:43 UTC via standard Debian 'apt-get update; apt-get upgrade'. It was an upgrade from 5.5.40-rel36.1-707.squeeze to 5.5.41-rel37.0-727.squeeze

Yesterday I increased the innodb_log_file_size with:

/etc/init.d/mysql stop (no, innodb_fast_shutdown was not set to 2)
mv /var/lib/mysql/ib_logfile* ~/
(edit innodb_log_file_size and increased to 100 from default)
/etc/init.d/mysql start

Please note the crashes began *before* innodb_log_file_size was increased from default. The crashes have occurred to us on:

server1:
150202 22:19:18
150203 15:53
150204 20:47:37

server2:
150122 4:28:39

Both servers were Debian Squeeze, were upgraded to same version of Percona at same time, and were rebooted into kernel 3.18.3-x86_64-linode51

Revision history for this message

Roel Van de Paar (roel11) wrote on 2015-02-05:

@mig5 - thanks. What was the last time you ran mysql_upgrade?

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

@roel11 I don't think I've ever run that command in a number of years of doing sysadmin. Does it not run as part of upgrade automatically, at least between certain versions?

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

#10

Also I ran mysqlcheck --all-databases and all tables reported OK.

Revision history for this message

David Bennett (dbpercona) wrote on 2015-02-05:

#11

@mig5 - in addition to Roel's question:

innodb_log_buffer_size 8388608
innodb_log_file_size 104857600

These values look low. They may be fine if you have small volume, small transactions and small data rows. It really depends on your usage.

If your transaction volume, transaction size or DML statements are large you would want to increase these.

There is a very good article here:

http://www.percona.com/blog/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/

The measurement will take a minute during your peak load time. You may want to take a few measurements and average to come up with good settings. Beyond this, you may want to contact our support for assistance in tuning for your environment.

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

#12

@dbpercona do you consider these low values to justify the trigger of an assertion failure, especially one that has not happened until the latest release?

While I agree it may need tuning I am only surprised that such an error would occur now.

I'll investigate the tuning, thanks. I didn't think 100MB for a innodb_log_file_size was small, especially since the assertion error never occured when it was at its default 15M....

Revision history for this message

Roel Van de Paar (roel11) wrote on 2015-02-05:

#13

@mig5, yes... 'You should execute mysql_upgrade each time you upgrade MySQL. '
http://dev.mysql.com/doc/refman/5.5/en/mysql-upgrade.html

Revision history for this message

Roel Van de Paar (roel11) wrote on 2015-02-05:

#14

And as David mentioned, you may like to take this opportunity to get into contact with our support services @ https://www.percona.com/services/mysql-support

Laurynas Biveinis (laurynas-biveinis) on 2015-02-05

no longer affects:

percona-server/5.7

Revision history for this message

Roel Van de Paar (roel11) wrote on 2015-02-05:

#15

(But as your version diff is small, I do not expect mysql_upgrade to make much of a change - yet, if you have never ran it, you may be running with very old tables)

Revision history for this message

David Bennett (dbpercona) wrote on 2015-02-05:

#16

@mig5 - they may be just fine for your environment and you are right, an assertion failure is not ok.

Regarding your first reported error "exceeds the log group capacity", the inno_xxx_size values being too low can cause this.

Not running mysql_upgrade between major version upgrades may cause assertion failures in some cases.

Regarding mysql_upgrade, there is another good article here:

http://www.percona.com/blog/2014/09/19/mysql-upgrade-best-practices/

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

#17

I ran it on a test server:

root@server2:/home/miguel# mysql_upgrade -p
Enter password:
Looking for 'mysql' as: mysql
Looking for 'mysqlcheck' as: mysqlcheck
This installation of MySQL is already upgraded to 5.5.41-37.0, use --force if you still need to run mysql_upgrade

Surprised that I would have to use this script with --force after a standard upgrade from Debian packages. Or does the message above imply that (as I thought) there's no need to run mysql_upgrade after an upgrade on Debian/Ubuntu?

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

#18

I also didn't think it was a major version upgrade, it was a point release between 5.5.40 and 5.5.41 no?

Revision history for this message

David Bennett (dbpercona) wrote on 2015-02-05:

#19

@mig5 - the packaging upgrade (apt) does not run the data upgrade (mysql_upgrade). It is recommended that you backup your data before running mysql_upgrade.

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

#20

I looked at /etc/mysql/debian-start and I think mysql_upgrade is actually executed on this on restart.

Anyway, we will see if the issue is reproduced on the old kernel and look at tuning the innodb_log_buffer_size (though as mentioned, that value's never been changed, the assertion error never occurred until the upgrade to 5.5.41, and the mysql_upgrade/mysqlcheck didn't report any tables needing upgrade.

Thanks.

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

#21

Interesting MariaDB bug report with the same assertion error in !lock(recursive), fixed last month in 5.5.41:

https://mariadb.atlassian.net/browse/MDEV-7148

Worth reviewing that ticket/fix and see if it needs applying in Percona?

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

#22

A description of what sounds like a race condition between recursive and writer_thread

https://lists.launchpad.net/maria-developers/msg07942.html

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2015-02-05:

#23

MDEV-7148 does not occur on x86_64, it's a PowerPC-only issue.

As for the original issue, your log files are 100 MB (after the change; what was the original size?). This limits BLOB/TEXT in a single row to 10MB, as the error message says. I'd increase that first.

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

#24

The original size was 5MB. What confuses me is why only after last upgrade, this would become a problem (MySQL log had been showing the checkpoint error for a long time, but no such assertions til 5.5.41). Maybe it's coincidence..

Revision history for this message

mig5 (mig5) wrote on 2015-02-05:

#25

Also, the assertion has occurred since the log file size was adjusted (and we receive no more checkpoint warnings).

Doesn't the assertion read like a thread/lock issue, not a log file size issue?

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2015-02-06:

#26

It is possible that the small log file size warnings and the actual crash have unrelated causes

Revision history for this message

Teodor Milkov (tm-del) wrote on 2015-02-26:

#27

We encountered the same problem. It looks like the crash is triggered somehow by a large sql query (I still don't know the exact query). See also: http://stackoverflow.com/questions/20004897/mysql-crashing-with-innodb-assertion-failure-in-thread-in-file-sync0rw-c-line-5

Revision history for this message

mig5 (mig5) wrote on 2015-02-26:

#28

We also ended up able to attribute and reproduce the issue specific to a task (big SQL query/batch operation) on a site. Unfortunately also don't know the specific query, but the connecting factor is that it's large I guess (or operating on a large row/table)

Revision history for this message

Teodor Milkov (tm-del) wrote on 2015-02-27:

#29

I think I caught the offending query. It's a 18819980 characters long (18 MB) slingle column update query (UPDATE wp_options SET option_value = ...).

Revision history for this message

Teodor Milkov (tm-del) wrote on 2015-02-27:

#30

I was to reproduce this reliably. Here's how:

CREATE TABLE `wp_options` (
  `option_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `option_name` varchar(64) NOT NULL DEFAULT '',
  `option_value` longtext NOT NULL,
  `autoload` varchar(20) NOT NULL DEFAULT 'yes',
  PRIMARY KEY (`option_id`),
  UNIQUE KEY `option_name` (`option_name`)
) ENGINE=InnoDB CHARSET=utf8;

INSERT INTO wp_options SET option_id=1, option_value='x';

UPDATE wp_options SET option_value=REPEAT('x', 19000000) WHERE option_id=1;

ERROR 2013 (HY000): Lost connection to MySQL server during query

I'm going to try different mysqld / linux kernel combinations and will report my findings soon.

Revision history for this message

Teodor Milkov (tm-del) wrote on 2015-02-27:

#31

Looks like the crash is 32bit + percona. Vanilla mysqld is ok on both 32 & 64 bit installs. Linux kernel doesn't matter. Here's the complete set of tests I did:

# Note: max_allowed_packet must be > 20M

3.14.29-grsec + 5.5.32-percona src (32bit): crash
3.2.59-grsec + 5.5.40-percona src (32bit): crash
3.2.59-grsec + 5.5.41-percona src (32bit): crash
3.14.29-grsec + 5.5.40-percona src (32bit): crash
3.14.29-grsec + 5.5.41-percona src (32bit): crash

3.14.29-grsec + 5.5.41-percona src + innodb_log_file_size = 256M + innodb_log_buffer_size = 1G (32bit): crash

3.16.0-30-generic #40-Ubuntu + percona 5.5.41-rel37.0-727.wheezy (32bit): crash
3.16.0-30-generic #40-Ubuntu + 5.5.41-rel37.0-727.wheezy (64bit): OK

3.16.0-30-generic #40-Ubuntu + 5.5.40-0+wheezy1 + mysql 5.5.40-0+wheezy1 (32bit): OK

As to which statement size crashes it:

UPDATE wp_options SET option_value=REPEAT('x', $SIZE) WHERE option_id=1;

On two different servers, there's slight difference:
- 3.14.29-grsec + 5.5.41-percona src (32bit):
$SIZE=16722688 is OK
$SIZE=167226989 crashes the server

- 3.16.0-30-generic #40-Ubuntu + percona 5.5.41-rel37.0-727.wheezy (32bit):
$SIZE=16722689 is OK
$SIZE=16722690 crashes the server

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#32

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3256