InnoDB: Failing assertion: heap->magic_n == MEM_BLOCK_MAGIC_N | abort in mem_heap_check | Using the --innodb-buffer-pool-populate option can cause the server to abort/be killed

Bug #1233128 reported by Roel Van de Paar on 2013-09-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server
Status tracked in 5.7
5.1
Undecided
Unassigned
5.5
High
Unassigned
5.6
High
Unassigned
5.7
High
Unassigned

Bug Description

2 Ways this issue can be seen:

1. "Killed"
9s file contains:
/sde/Percona-Server-5.6.13-rc60.5-450-debug.Linux.x86_64/bin/mysqld --basedir=/sde/Percona-Server-5.6.13-rc60.5-450-debug.Linux.x86_64 --datadir=/ssd/1380539109/data --port=39799 --pid=/ssd/1380539109/pid.pid --log-error=/ssd/1380539109/error.log.out --socket=/ssd/1380539109/socket.sock --user=roel --innodb-buffer-pool-populate

$ ./9s
./9s: line 1: 30116 Killed [...]

2. Assert:
2013-09-30 08:05:54 3095 [Note] InnoDB: Initializing buffer pool, size = 40.0G
InnoDB: mmap(44816138240 bytes) failed; errno 12
2013-09-30 08:05:54 7fdb449707e0 InnoDB: Assertion failure in thread 140579725314016 in file mem0dbg.cc line 680
InnoDB: Failing assertion: heap->magic_n == MEM_BLOCK_MAGIC_N

Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :

command for (2)

[roel@R420 618951]$ cat 9d
/sde/Percona-Server-5.6.13-rc60.5-450-debug.Linux.x86_64/bin/mysqld --basedir=/sde/Percona-Server-5.6.13-rc60.5-450-debug.Linux.x86_64 --datadir=/tmp --port=39799 --pid=/ssd/1380539109/pid.pid --log-error=/ssd/1380539109/error.log.out --socket=/ssd/1380539109/socket.sock --user=roel --innodb-buffer-pool-populate --core 2>&1

/tmp is not a datadir, it's just a dummy option for datadir. Note also that strace above may have been on dummy datadir also (but it matters little).

Roel Van de Paar (roel11) wrote :

[roel@R420 618951]$ free -g
             total used free shared buffers cached
Mem: 47 32 14 0 0 9
-/+ buffers/cache: 22 24
Swap: 0 0 0

summary: - Using the --innodb-buffer-pool-populate option can cause the server to
- abort/be killed
+ InnoDB: Failing assertion: heap->magic_n == MEM_BLOCK_MAGIC_N | abort in
+ mem_heap_check | Using the --innodb-buffer-pool-populate option can
+ cause the server to abort/be killed
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
tags: added: xtradb
Ramesh Sivaraman (rameshvs02) wrote :

** Got testcase for this issue

DROP DATABASE test;CREATE DATABASE test;USE test;
CREATE TABLE t(a CHAR);
CREATE TABLE t0(visitor_id int unsigned,group_id int unsigned,hits int unsigned,sessions int unsigned,ts timestamp,PRIMARY KEY (visitor_id,group_id))/*!engine=tokudb */;
XA START 'test0';
SET @@global.table_open_cache=TRUE;
insert INTO t0 values(0,0,0,0.0,date'0-0-0');
INSERT INTO t VALUES('vKauCXUmFC0CMr');
EXPLAIN PARTITIONS SELECT * FROM t0;
INSERT INTO t VALUES('VG0v');
SET STATEMENT max_join_size=0 FOR SELECT * FROM t0;
INSERT INTO t VALUES('EJ0U0P0');
select ROW(c,0,0)IN (0) FROM t0;
INSERT INTO t VALUES('VdfjXfKC0xoPzr0tpJuAFWvResHZCnaFI');

** Extra startup

--plugin-load=tokudb=ha_tokudb.so

The attached tarball gives the testcase as an exact match of our system,
  including some handy utilities

  $ vi {epoch}_mybase # Update base path in this file (the only change
  required!). For non-binary (i.e. non-tarball/non-build-binary-distribution) distributions please update SOURCE_DIR location also.
  $ ./{epoch}_init # Initializes the data dir
  $ ./{epoch}_start # Starts mysqld
  $ ./{epoch}_cl # To check mysqld is up
  $ ./{epoch}_run # Run the testcase with pquery binary(produces
  output)
  $ vi /dev/shm/{epoch}/error.log.out # Verify the error log
  $ ./{epoch}_gdb # Brings you to a gdb prompt attached to correct
  mysqld
  & generated core
  $ ./{epoch}_parse_core # Create {epoch}_STD.gdb and {epoch}_FULL.gdb;
  standard and full var gdb stack traces
  etc.

Roel Van de Paar (roel11) wrote :

Great to finally have a testcase for this! Note TokuDB may not be necessary to trigger the bug: early 5.6 version (prior to TokuDB) had the same assertion also. Another idea may be that it is a different bug (part of stack matches)

Krunal Bauskar (krunal-bauskar) wrote :

I tried executing the reproducible TC. Here are my observations:

* This bug is specific about use of --innodb-buffer-pool-populate which is not present in the bug-tcs.

* BugTC is trying to set "SET @@global.table_open_cache=TRUE;". Well table_open_cache is not a boolean but a numeric variable. Setting it to TRUE mean setting it to 1 that in other term means only 1 table can be open at any given time. Not sure if this is indented or miss by TC.

* TC crashes randomly when toku-db is involved. W/O toku-db involved I couldn't get the TC to crash. Infact last 2 crash where in toku-db code hitting the assert. See below for assert.

---------

Example of toku-db assert.

#2 0x00007fffdc2a221f in toku_do_backtrace_abort () at /opt/projects/codebase/5.6/storage/tokudb/PerconaFT/portability/toku_assert.cc:147
#3 0x00007fffdc2a2319 in toku_do_assert_fail (expr_as_string=0x7fffdc2cdb28 "txn->roll_info.num_rollentries == 0", function=0x7fffdc2ce070 <toku_txn_is_read_only(tokutxn*)::__FUNCTION__> "toku_txn_is_read_o
nly", file=0x7fffdc2cd880 "/opt/projects/codebase/5.6/storage/tokudb/PerconaFT/ft/txn/txn.cc", line=664, caller_errno=0) at /opt/projects/codebase/5.6/storage/tokudb/PerconaFT/portability/toku_assert.cc:168
#4 0x00007fffdc22c0ae in toku_txn_is_read_only (txn=0x7fff84026800) at /opt/projects/codebase/5.6/storage/tokudb/PerconaFT/ft/txn/txn.cc:664
#5 0x00007fffdc1b1d11 in locked_txn_commit_with_progress (txn=0x7fff840264d0, flags=0, poll=0x0, poll_extra=0x0) at /opt/projects/codebase/5.6/storage/tokudb/PerconaFT/src/ydb_txn.cc:267
#6 0x00007fffdc1b1e2b in locked_txn_commit (txn=0x7fff840264d0, flags=0) at /opt/projects/codebase/5.6/storage/tokudb/PerconaFT/src/ydb_txn.cc:317
#7 0x00007fffdc157398 in commit_txn (txn=0x7fff840264d0, flags=0) at /opt/projects/codebase/5.6/storage/tokudb/hatoku_defines.h:351
#8 0x00007fffdc15db42 in ha_tokudb::initialize_share (this=0x7fff84029280, name=0x7fff840138e8 "./test/t0", mode=2) at /opt/projects/codebase/5.6/storage/tokudb/ha_tokudb.cc:1664
#9 0x00007fffdc15e247 in ha_tokudb::open (this=0x7fff84029280, name=0x7fff840138e8 "./test/t0", mode=2, test_if_locked=2) at /opt/projects/codebase/5.6/storage/tokudb/ha_tokudb.cc:1765
#10 0x000000000064a9d2 in handler::ha_open (this=0x7fff84029280, table_arg=0x7fff84022c70, name=0x7fff840138e8 "./test/t0", mode=2, test_if_locked=2) at /opt/projects/codebase/5.6/sql/handler.cc:2743
#11 0x00000000008d1d65 in open_table_from_share (thd=0x20a9b40, share=0x7fff84013570, alias=0x7fff840056d0 "t0", db_stat=39, prgflag=44, ha_open_flags=0, outparam=0x7fff84022c70, is_create_table=false) at /o
pt/projects/codebase/5.6/sql/table.cc:2378

------------

Based on all these observations I would suggest you log a separate bug for toku-db using this TC and see if you can get the reproducible TC for existing bug with needed options.

One more observation: All stmt (in the TC above ) against toku-db engine are extremely slow.

tags: added: tokudb

I believe this might be related to or the root cause is https://bugs.launchpad.net/percona-server/+bug/1532042

Roel Van de Paar (roel11) wrote :

https://bugs.mysql.com/bug.php?id=86234 - same assertion, similar stack, also related to buffer pool. May or may not be same issue.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.