InnoDB: failing assertion: buf_page_in_file(bpage) below buf_flush_page_and_try_neighbors()

Bug #1269352 reported by Laurynas Biveinis on 2014-01-15
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server
Status tracked in 5.6
5.1
Undecided
Unassigned
5.5
High
Laurynas Biveinis
5.6
High
Laurynas Biveinis

Bug Description

Fix for bug 1227581 is incomplete, comments 29-33:

Thread 22 (Thread 0x7ec67baeb700 (LWP 10836)):
+bt
#0 0x00000031e000b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000928147 in os_cond_wait (event=0xc9a18870, reset_sig_count=46093) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/os/os0sync.c:207
#2 os_event_wait_low (event=0xc9a18870, reset_sig_count=46093) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/os/os0sync.c:609
#3 0x00000000008557d2 in sync_array_wait_event (arr=0x12b5e90, index=2) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/sync/sync0arr.c:458
#4 0x0000000000856872 in mutex_spin_wait (mutex=0x2817c18, file_name=0xa6ccb8 "/mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c", line=613) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/sync/sync0sync.c:653
#5 0x00000000008a43fa in pfs_mutex_enter_func (bpage=0x7ee75cabb450, dpage=0x7eb09bd94a20) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/include/sync0sync.ic:222
#6 buf_flush_relocate_on_flush_list (bpage=0x7ee75cabb450, dpage=0x7eb09bd94a20) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:613
#7 0x00000000008ab020 in buf_LRU_free_block (bpage=0x7ee75cabb450, zip=0, have_LRU_mutex=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:2034
#8 0x00000000008ac217 in buf_LRU_free_from_unzip_LRU_list (buf_pool=0x2817898, n_iterations=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:981
#9 buf_LRU_search_and_free_block (buf_pool=0x2817898, n_iterations=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:1085
#10 0x00000000008ac8ff in buf_LRU_get_free_block (buf_pool=0x2817898) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:1326
#11 0x000000000089b74f in buf_page_get_gen (space=1256, zip_size=4096, offset=73393, rw_latch=3, guess=0x0, mode=10, file=0xa6ad00 "/mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/btr/btr0cur.c", line=3615, mtr=0x7ec67bae41f0) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0buf.c:2667
...

Thread 1 (Thread 0x7ec739c9a700 (LWP 42350)):
+bt
#0 0x00000031e000c69c in pthread_kill () from /lib64/libpthread.so.0
#1 0x00000000006a5aa6 in handle_fatal_signal (sig=6) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/sql/signal_handler.cc:250
#2 <signal handler called>
#3 0x00000031df8328e5 in raise () from /lib64/libc.so.6
#4 0x00000031df8340c5 in abort () from /lib64/libc.so.6
#5 0x00000000008a6506 in buf_page_get_space (bpage=0x7ee75cabb450, flush_type=BUF_FLUSH_LIST, n_to_flush=300, count=0x7ec739c99568) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/include/buf0buf.ic:749
#6 buf_flush_page_and_try_neighbors (bpage=0x7ee75cabb450, flush_type=BUF_FLUSH_LIST, n_to_flush=300, count=0x7ec739c99568) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1634
#7 0x00000000008a69fe in buf_flush_flush_list_batch (buf_pool=0x2817898, flush_type=<value optimized out>, min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1772
#8 buf_flush_batch (buf_pool=0x2817898, flush_type=<value optimized out>, min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1854
#9 0x00000000008a6ca0 in buf_flush_list (min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:2071
#10 0x000000000084c3bd in srv_master_thread (arg=<value optimized out>) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/srv/srv0srv.c:3879
#11 0x00000031e0007851 in start_thread () from /lib64/libpthread.so.0
#12 0x00000031df8e894d in clone () from /lib64/libc.so.6

The issue is that the fix for bug 1227581 is not fully correct WRT locking, thus only narrowed the race window in buf_flush_page_and_try_neighbors()instead of closing it completely. buf_LRU_block_remove_hashed_page()/buf_LRU_free_block() will make a page BUF_BLOCK_REMOVE_HASH while holding an LRU list mutex, a page hash X latch, and the block mutex. buf_flush_page_and_try_neighbors() may hold a flush list mutex instead. So, even though it checks for BUF_BLOCK_REMOVE_HASH pages, there is nothing to protect the flushing candidate page from becoming BUF_BLOCK_REMOVE_HASH after that check. The proper fix is making buf_flush_page_and_try_neigbors() oblivious to BUF_BLOCK_REMOVE_HASH pages all the way until the mutex release before the buf_flush_try_neighbors() call.

Related branches

lp:~laurynas-biveinis/percona-server/bug1269352-5.5
Merged into lp:percona-server/5.5 at revision 621
Registry Administrators: Pending requested 2014-01-16
lp:~laurynas-biveinis/percona-server/bug1269352-5.6
Merged into lp:percona-server at revision 538
Registry Administrators: Pending requested 2014-01-16
description: updated
tags: added: bp-split xtradb
Ovais Tariq (ovais-tariq) wrote :

Laurynas, do we have a milestone candidate for this bug fix?

Should be in the next releases.

Chris Calender (chriscalender) wrote :
Download full text (4.3 KiB)

Hello,

I'm seeing a somewhat similar stack trace from a MariaDB 5.5.34 crash using XtraDB. I was wondering if you could tell if this crash is related to this bug or not.

Off the top, I say they look similar because in the below, the last 5 calls leading up to the crash are:

page_zip_calc_checksum
buf_LRU_free_block
buf_LRU_search_and_free_block
buf_LRU_get_free_block
buf_page_get_gen

And these are close to those reported by thread 22 in #7-#11. Close enough that I thought I should ask. :)

Here is the crash output I'm seeing:

140122 4:21:13 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 5.5.34-MariaDB-log
key_buffer_size=1073741824
read_buffer_size=8388608
max_used_connections=86
max_threads=1002
thread_count=33
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 17483221 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x7f2d140bf000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f2cdd0d9d58 thread_stack 0x48000
??:0(my_print_stacktrace)[0xa8ab0e]
??:0(handle_fatal_signal)[0x6d21cb]
??:0(??)[0x373660f500]
??:0(??)[0x313ae02244]
??:0(page_zip_calc_checksum)[0x9522c6]
??:0(buf_LRU_free_block)[0x8d0ef8]
??:0(buf_LRU_search_and_free_block)[0x8d1b67]
??:0(buf_LRU_get_free_block)[0x8d20c4]
??:0(buf_page_get_gen)[0x8c605c]
??:0(btr_cur_search_to_nth_level)[0x8b24de]
??:0(row_search_for_mysql)[0x86987f]
??:0(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int))[0x83b9ce]
??:0(join_read_next_same(READ_RECORD*))[0x5cadca]
??:0(sub_select(JOIN*, st_join_table*, bool))[0x5b1d6e]
??:0(evaluate_join_record(JOIN*, st_join_table*, int))[0x5ac7ff]
??:0(sub_select(JOIN*, st_join_table*, bool))[0x5b1cfb]
??:0(evaluate_join_record(JOIN*, st_join_table*, int))[0x5ac7ff]
??:0(sub_select(JOIN*, st_join_table*, bool))[0x5b1d93]
??:0(evaluate_join_record(JOIN*, st_join_table*, int))[0x5ac7ff]
??:0(sub_select(JOIN*, st_join_table*, bool))[0x5b1d93]
??:0(evaluate_join_record(JOIN*, st_join_table*, int))[0x5ac7ff]
??:0(sub_select(JOIN*, st_join_table*, bool))[0x5b1cfb]
??:0(evaluate_join_record(JOIN*, st_join_table*, int))[0x5ac7ff]
??:0(sub_select(JOIN*, st_join_table*, bool))[0x5b1d93]
??:0(do_select(JOIN*, List<Item>*, TABLE*, Procedure*))[0x5b9c9d]
??:0(JOIN::exec())[0x5dada9]
??:0(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x5dc738]
??:0(handle_select(THD*, LEX*, select_result*, unsi...

Read more...

Chris -

It's a different crash, please report it to MariaDB.

Yet another instance of this is bug 1305364.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers