Percona Server with XtraDB

handle_fatal_signal (sig=11) in rbt_free_node at ut0rbt.c:731 | Writing larger than 4GB bitmap file fails; bitmap write error causes heap corruption

Reported by Roel Van de Paar on 2013-01-31
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server
High
Laurynas Biveinis
5.1
High
Laurynas Biveinis
5.5
High
Laurynas Biveinis

Bug Description

========================= gdb :
Thread 10 (LWP 23954):
+bt
#0 0x0000000004e3969c in pthread_kill () from /lib64/libpthread.so.0
#1 0x00000000007e2779 in my_write_core (sig=11) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/mysys/stacktrace.c:433
#2 0x00000000006ab0ea in handle_fatal_signal (sig=11) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/sql/signal_handler.cc:249
#3 <signal handler called>
#4 rbt_free_node (node=0x0, nil=0x1040f170) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/ut/ut0rbt.c:731
#5 0x00000000009935e9 in rbt_free_node (node=0x1040f1e0, nil=0x1040f170) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/ut/ut0rbt.c:731
#6 0x00000000009935e9 in rbt_free_node (node=0x1017d7d0, nil=0x1040f170) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/ut/ut0rbt.c:731
#7 0x0000000000993693 in rbt_free (tree=0x1017d760) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/ut/ut0rbt.c:746
#8 0x000000000094e1ad in log_online_read_shutdown () at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/log/log0online.c:743
#9 0x0000000000860a76 in srv_redo_log_follow_thread (arg=<optimized out>) at /ssd/QA-16274-5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/srv/srv0srv.c:3093
#10 0x0000000004e34851 in start_thread () from /lib64/libpthread.so.0
#11 0x0000000005bfa11d in clone () from /lib64/libc.so.6

========================= Error log:
130126 13:58:08 InnoDB: Completed initialization of buffer pool
130126 13:58:08 InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
130126 13:58:08 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: last tracked LSN in './ib_modified_log_2_8204.xdb' is 8204, but the last checkpoint LSN is 1597945. This might be due to a server crash or a very fast shutdown. Reading the log to advance the last tracked LSN.
InnoDB: continuing tracking changed pages from LSN 1597945
130126 13:58:12 InnoDB: Waiting for the background threads to start
130126 13:58:13 Percona XtraDB (http://www.percona.com) 1.1.8-rel29.3 started; log sequence number 1597945
130126 13:58:14 [Warning] /ssd/Percona-Server-5.5.28-rel29.3-416-debug-valgrind.Linux.x86_64/bin/mysqld: unknown option '--loose-skip-ndbcluster'
130126 13:58:14 [Warning] /ssd/Percona-Server-5.5.28-rel29.3-416-debug-valgrind.Linux.x86_64/bin/mysqld: unknown option '--loose-debug-assert-if-crashed-table'
130126 13:58:14 [Warning] /ssd/Percona-Server-5.5.28-rel29.3-416-debug-valgrind.Linux.x86_64/bin/mysqld: unknown option '--loose-debug-assert-on-error'
130126 13:58:14 [Note] Server hostname (bind-address): '0.0.0.0'; port: 13080
130126 13:58:14 [Note] - '0.0.0.0' resolves to '0.0.0.0';
130126 13:58:14 [Note] Server socket created on IP: '0.0.0.0'.
130126 13:58:14 [Warning] 'user' entry '<email address hidden>' ignored in --skip-name-resolve mode.
130126 13:58:14 [Warning] 'proxies_priv' entry '@ <email address hidden>' ignored in --skip-name-resolve mode.
130126 13:58:15 [Note] Event Scheduler: Loaded 0 events
130126 13:58:15 [Note] /ssd/Percona-Server-5.5.28-rel29.3-416-debug-valgrind.Linux.x86_64/bin/mysqld: ready for connections.
Version: '5.5.29-rel29.3-debug-log' socket: '/ssd/210222/current1_5/tmp/master.sock' port: 13080 Percona Server with XtraDB (GPL), Release rel29.3, Revision 416-debug
InnoDB: DEBUG: update_statistics for test/table100_innodb_dynamic.
InnoDB: DEBUG: update_statistics for test/table100_innodb_dynamic.
InnoDB: DEBUG: update_statistics for test/table100_innodb_dynamic.
130126 13:58:20 InnoDB: Error: Write to file ./ib_modified_log_2_8204.xdb failed at offset 35184372088832 8192.
InnoDB: 4096 bytes should have been written, only 0 were written.
InnoDB: Operating system error number 22.
InnoDB: Check that your OS and file system support files of this size.
InnoDB: Check also that the disk is not full or a disk quota exceeded.
InnoDB: Error number 22 means 'Invalid argument'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/operating-system-error-codes.html
130126 13:58:20 InnoDB: Operating system error number 22 in a file operation.
InnoDB: Error number 22 means 'Invalid argument'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/operating-system-error-codes.html
InnoDB: Error: failed writing changed page bitmap file './ib_modified_log_2_8204.xdb'
InnoDB: Error: log tracking bitmap write failed, stopping log tracking thread!
==23942== Thread 12:
==23942== Invalid read of size 8
==23942== at 0x9935E0: rbt_free_node (ut0rbt.c:731)
==23942== by 0x9935E8: rbt_free_node (ut0rbt.c:731)
==23942== by 0x9935E8: rbt_free_node (ut0rbt.c:731)
==23942== by 0x993692: rbt_free (ut0rbt.c:746)
==23942== by 0x94E1AC: log_online_read_shutdown (log0online.c:743)
==23942== by 0x860A75: srv_redo_log_follow_thread (srv0srv.c:3093)
==23942== by 0x4E34850: start_thread (in /lib64/libpthread-2.12.so)
==23942== by 0x179C76FF: ???
==23942== Address 0x8 is not stack'd, malloc'd or (recently) free'd
==23942==
10:58:20 UTC - mysqld got signal 11 ;

========================= Run details:
[roel@macpro2 210222]$ cat cmd113
ps -ef | grep 'cmdrun_113' | grep -v grep | awk '{print $2}' | xargs sudo kill -9
rm -Rf /ssd/210222/cmdrun_113
mkdir /ssd/210222/cmdrun_113
cd /ssd/randgen
bash -c "set -o pipefail; perl runall.pl --queries=100000000 --seed=5950 --duration=180 --querytimeout=60 --short_column_names --sqltrace --reporter=Shutdown,Backtrace,QueryTimeout,ErrorLog,ErrorLogAlarm --mysqld=--log-output=none --mysqld=--sql_mode=ONLY_FULL_GROUP_BY --grammar=conf/percona_qa/percona_qa.yy --gendata=conf/percona_qa/percona_qa.zz --basedir=/ssd/Percona-Server-5.5.28-rel29.3-416-debug-valgrind.Linux.x86_64 --valgrind --reporter=ValgrindErrors --validator=MarkErrorLog --threads=25 --views --notnull --mysqld=--innodb_track_changed_pages=1 --mysqld=--innodb_max_bitmap_file_size=1048576 --mysqld=--innodb_log_file_size=1048576 --mysqld=--innodb_log_files_in_group=2 --mysqld=--innodb_log_buffer_size=1048576 --mysqld=--innodb_log_block_size=512 --mysqld=--innodb_fast_shutdown=2 --mysqld=--innodb_adaptive_flushing_method=native --mysqld=--innodb_use_global_flush_log_at_trx_commit=0 --mysqld=--userstat --mysqld=--innodb_flush_method=O_DIRECT --mysqld=--innodb_file_per_table=1 --mtr-build-thread=733 --mask=31450 --vardir1=/ssd/210222/cmdrun_113 > /ssd/210222/cmdrun113.log 2>&1"

========================= Notes:
Note the initial crash recovery. No idea why that is happening, but I suspect it has something to do with the startup options.

Related branches

lp:~laurynas-biveinis/percona-server/BT-16274-bug1111226-5.1
Merged into lp:percona-server/5.1 at revision 527
Stewart Smith (community): Approve on 2013-02-07
George Ormond Lorch III: Approve (g2) on 2013-02-07
Laurynas Biveinis: Pending requested 2013-02-07
lp:~laurynas-biveinis/percona-server/BT-16274-bug1111226-5.5
Merged into lp:percona-server/5.5 at revision 433
Stewart Smith (community): Approve on 2013-02-07
George Ormond Lorch III: Approve (g2) on 2013-02-07
Laurynas Biveinis: Pending requested 2013-02-07
Roel Van de Paar (roel11) wrote :
Roel Van de Paar (roel11) wrote :
description: updated

The first issue here is a a spectacular typo that causes the write to fail:

130126 13:58:20 InnoDB: Error: Write to file ./ib_modified_log_2_8204.xdb failed at offset 35184372088832 8192.

 success = os_file_write(log_bmp_sys->out.name, log_bmp_sys->out.file,
    block,
    (ulint)(log_bmp_sys->out.offset & 0xFFFFFFFF),
    (ulint)(log_bmp_sys->out.offset << 32),
    MODIFIED_PAGE_BLOCK_SIZE);

s/<</>> !

The second issue is that a write error for a non-first bitmap page will result in the bitmap tree with some of its nodes being pointed to from the node freelist, causing a double freeing of them on the thread cleanup.

summary: - handle_fatal_signal (sig=11) in rbt_free_node at ut0rbt.c:731
+ Writing larger than 4GB bitmap file fails; bitmap write error causes
+ heap corruption
summary: Writing larger than 4GB bitmap file fails; bitmap write error causes
- heap corruption
+ heap corruption | Was: handle_fatal_signal (sig=11) in rbt_free_node at
+ ut0rbt.c:731
summary: - Writing larger than 4GB bitmap file fails; bitmap write error causes
- heap corruption | Was: handle_fatal_signal (sig=11) in rbt_free_node at
- ut0rbt.c:731
+ handle_fatal_signal (sig=11) in rbt_free_node at ut0rbt.c:731 | Writing
+ larger than 4GB bitmap file fails; bitmap write error causes heap
+ corruption
Roel Van de Paar (roel11) wrote :

Changed title for SEO (google bug search) purposes

tags: added: xtradb
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers