Dynamic innodb_buffer_pool_size resizing causes sig6 (Failing assertion: latches->empty()...)

Bug #1414257 reported by Roel Van de Paar on 2015-01-24
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.6
Invalid
Undecided
Unassigned
5.7
Fix Released
High
Laurynas Biveinis

Bug Description

Version: '5.7.5-m15-debug' socket: '/sda/PS-mysql-5.7.5-m15-linux-x86_64-debug/socket.sock' port: 12723 MySQL Community Server (GPL)
2015-01-24T03:31:53.789823Z 0 [Note] InnoDB: Resizing buffer pool from 2147483648 to 3221225472. (unit=134217728)
2015-01-24T03:31:53.794583Z 0 [Note] InnoDB: disabled adaptive hash index.
2015-01-24 14:31:53 0x7f9d873f7700 InnoDB: Assertion failure in thread 140314555676416 in file sync0debug.cc line 299
InnoDB: Failing assertion: latches->empty() || latch->m_level == SYNC_LEVEL_VARYING || latch->m_level == SYNC_NO_ORDER_CHECK || latches->back()->m_level == SYNC_NO_ORDER_CHECK || latches->back()->m_level == SYNC_LEVEL_VARYING || latches->back()->m_level >= latch->m_level

(gdb) bt
#0 0x00007f9e54f85771 in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000000e15d58 in my_write_core (sig=6) at /bzr/mysql-5.7-percona-patches_dbg/mysys/stacktrace.c:247
#2 0x0000000000822044 in handle_fatal_signal (sig=6) at /bzr/mysql-5.7-percona-patches_dbg/sql/signal_handler.cc:219
#3 <signal handler called>
#4 0x00007f9e53b895c9 in raise () from /lib64/libc.so.6
#5 0x00007f9e53b8acd8 in abort () from /lib64/libc.so.6
#6 0x00000000010d4ccf in ut_dbg_assertion_failed (expr=0x14dd428 "latches->empty() || latch->m_level == SYNC_LEVEL_VARYING || latch->m_level == SYNC_NO_ORDER_CHECK || latches->back()->m_level == SYNC_NO_ORDER_CHECK || latches->back()->m_level == SYNC_LEVEL_VARYING || latches->back()->m_level >= latch->m_level", file=0x14dd3c8 "/bzr/mysql-5.7-percona-patches_dbg/storage/innobase/sync/sync0debug.cc", line=299) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/ut/ut0dbg.cc:68
#7 0x0000000001092133 in SyncDebug::lock (this=0x1e472c0 <syncDebug>, latch=0x7f9e373e7870) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/sync/sync0debug.cc:294
#8 0x0000000001091cc8 in sync_check_lock (latch=0x7f9e373e7870) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/sync/sync0debug.cc:1382
#9 0x0000000000f50649 in DebugPolicy<TTASEventMutex<DebugPolicy> >::enter (this=0x7f9e373e7828, mutex=..., name=0x14fe2f0 "/bzr/mysql-5.7-percona-patches_dbg/storage/innobase/buf/buf0buf.cc", line=2418) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/include/sync0policy.ic:91
#10 0x0000000000f26495 in PolicyMutex<TTASEventMutex<DebugPolicy> >::enter (this=0x7f9e373e7810, n_spins=30, n_delay=6, name=0x14fe2f0 "/bzr/mysql-5.7-percona-patches_dbg/storage/innobase/buf/buf0buf.cc", line=2418) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/include/ib0mutex.h:1040
#11 0x0000000001130750 in buf_pool_resize () at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/buf/buf0buf.cc:2418
#12 0x0000000001131940 in buf_resize_thread (arg=0x0) at /bzr/mysql-5.7-percona-patches_dbg/storage/innobase/buf/buf0buf.cc:2707
#13 0x00007f9e54f80df3 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f9e53c4a1ad in clone () from /lib64/libc.so.6

Roel Van de Paar (roel11) wrote :

Adding files as per discussion with Laurynas

Roel Van de Paar (roel11) wrote :
no longer affects: percona-server/5.5
Roel Van de Paar (roel11) wrote :

Testcase (execute twice in a row if needbe);

SET @@GLOBAL.innodb_buffer_pool_size=2342177280;
SELECT 1;

Roel Van de Paar (roel11) wrote :

Another testcase found;

Start server with:
--innodb_buffer_pool_size=2589934592 --innodb_buffer_pool_instances=8 --innodb_old_blocks_pct=85 --innodb_old_blocks_time=100 --innodb_buffer_pool_chunk_size=1048577

Then;

mysql> SET @@GLOBAL.innodb_buffer_pool_size=1342177280;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 1;
ERROR 2013 (HY000): Lost connection to MySQL server during query

Roel Van de Paar (roel11) wrote :

A new testcase produces (potentially useful) lots of related output in the log.

Start server with:
--innodb_buffer_pool_size=2589934592 --innodb_buffer_pool_instances=8 --innodb_old_blocks_pct=85 --innodb_old_blocks_time=100 --innodb_buffer_pool_chunk_size=1048577

Then;

mysql> DROP DATABASE test;CREATE DATABASE test;USE test;
Query OK, 0 rows affected (0.01 sec)

Query OK, 1 row affected (0.00 sec)

Database changed
mysql> SET @@GLOBAL.innodb_buffer_pool_size=1342177280;
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye

[roel@localhost PS-mysql-5.7.5-m15-linux-x86_64-debug]$ ./bin/mysqladmin -uroot -S/sda/PS-mysql-5.7.5-m15-linux-x86_64-debug/socket.sock ping
mysqladmin: connect to server at 'localhost' failed [...]

Roel Van de Paar (roel11) wrote :

Notice the interesting error log write error (seemingly a resize address written where time should go);

withdrawn block at 0x7f392e97ceb8
2015-01-24T04:24:56.304435Z 0 [Note] InnoDB: buffer pool 7 : withdrawn target 6144 blocks.
2015-01-24 15:24:56 0x7f3917ff8700 InnoDB: Assertion failure in thread 139883192485632 in file sync0debug.cc line 299 <--

Roel Van de Paar (roel11) wrote :

It could be that this bug is triggered, or more triggered, by a particular InnoDB buffer pool configuration in mysqld options.

Roel Van de Paar (roel11) wrote :

Confirmed that --innodb_buffer_pool_size=2147483648 is required for the original testcase. And also confirmed that the options mentioned in #5 (and only the SET @@GLOBAL.innodb_buffer_pool_size=1342177280; instruction) are needed to trigger the bug.

Roel Van de Paar (roel11) wrote :

s/trigger the bug/trigger the bug (as a seperate testcase) with the additional info shown in the error log./

Roel Van de Paar (roel11) wrote :

Interestingly, the buffer pool resize info also looks much shorter on matching MS 5.7 (debug) - attached - for the latter testcase which produces a lot more output on PS.

Reproduced by comment #8. The extra output in PS log is because of a forgotten debugging printf, repull the branch.

This has been caused by buffer pool resize locking several mutexes in each instance before advancing to the next instance. This results in a lock order violation and is fixed by locking the first mutex across all instances, then the second, etc.

Fix committed. Roel, please re-pull the branch and confirm

tags: added: 5.7
tags: removed: 5.7
tags: added: qa57

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3251

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers