InnoDB: Failing assertion: buf_page_in_file(bpage)

Bug #1250762 reported by Ovais Tariq
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
New
Undecided
Unassigned
5.1
New
Undecided
Unassigned
5.5
New
Undecided
Unassigned
5.6
New
Undecided
Unassigned

Bug Description

The crash happened after upgrading to Percona Server 5.5.34-32.0

Below is the relevant part from the error log:
131111 23:07:37 InnoDB: Assertion failure in thread 140023754589952 in
file buf0flu.c line 1608
InnoDB: Failing assertion: buf_page_in_file(bpage)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB:
http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
05:07:37 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this
binary
or one of the libraries it was linked against is corrupt, improperly
built,
or misconfigured. This error can also be caused by malfunctioning
hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=33
max_threads=3002
thread_count=26
connection_count=26
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
6602304 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/opt/percona-server-5.5.34/bin/mysqld(my_print_stacktrace
+0x35)[0x7eb7c5]
/opt/percona-server-5.5.34/bin/mysqld(handle_fatal_signal
+0x3f4)[0x6a5ce4]
/lib64/libpthread.so.0[0x31e000f500]
/lib64/libc.so.6(gsignal+0x35)[0x31df8328e5]
/lib64/libc.so.6(abort+0x175)[0x31df8340c5]
/opt/percona-server-5.5.34/bin/mysqld[0x8953b7]
/opt/percona-server-5.5.34/bin/mysqld[0x895b8e]
/opt/percona-server-5.5.34/bin/mysqld[0x895e30]
/opt/percona-server-5.5.34/bin/mysqld[0x83b67d]
/lib64/libpthread.so.0[0x31e0007851]
/lib64/libc.so.6(clone+0x6d)[0x31df8e894d]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find
information
in the manual which will help you identify the cause of the crash.

The asserting code is inside the function buf_flush_page_and_try_neighbors(). It appears to me that the page that was being accessed probably belonged to a deleted tablespace ?

Tags: xtradb i36642
Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

There was another crash on a different server when upgraded to 5.5.34-32.0.

The relevant part from the error log is as follows:
131112 16:53:45 InnoDB: Assertion failure in thread 139575214880512 in
file buf0buf.ic line 749
InnoDB: Failing assertion: buf_page_in_file(bpage)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
22:53:45 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=137
max_threads=3002
thread_count=5
connection_count=5
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
6602304 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/opt/percona-server-5.5.34/bin/mysqld(my_print_stacktrace+0x35)[0x7eb7c5]
/opt/percona-server-5.5.34/bin/mysqld(handle_fatal_signal+0x3f4)[0x6a5ce4]
/lib64/libpthread.so.0[0x322bc0f500]
/lib64/libc.so.6(gsignal+0x35)[0x322b8328e5]
/lib64/libc.so.6(abort+0x175)[0x322b8340c5]
/opt/percona-server-5.5.34/bin/mysqld[0x895680]
/opt/percona-server-5.5.34/bin/mysqld[0x895b8e]
/opt/percona-server-5.5.34/bin/mysqld[0x895e30]
/opt/percona-server-5.5.34/bin/mysqld[0x83b67d]
/lib64/libpthread.so.0[0x322bc07851]
/lib64/libc.so.6(clone+0x6d)[0x322b8e894d]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

The assertion is raised within the function buf_page_get_space(). Though the assertion is raised within a different function here, assertion is similar to the previous crash, in that the assert happens because the page is not mapped to the tablespace.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

The MySQL configuration file of the crashed server.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

I will provide the gdb stack trace here as it becomes available.

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

Does this happen on AMD/Opteron CPU? See http://bugs.mysql.com/bug.php?id=26081.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

@Valerii,

Both the machines where MySQL crashed are Intel based. Some more details about the machines is as follows:

CentOS release 6.4 (Final)
Kernel: 2.6.32-358.23.2.el6.x86_64

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Here is the resolved trace for the first crash. This has been resolved using addr2line

buf_page_get_mutex
buf_flush_page_and_try_neighbors
buf_flush_flush_list_batch
buf_flush_batch
buf_flush_list
srv_master_thread

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Here is the resolved trace for the second crash. This has been resolved using addr2line.

buf_page_get_space
buf_flush_page_and_try_neighbors
buf_flush_flush_list_batch
buf_flush_batch
buf_flush_list
srv_master_thread

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

There was another crash as follows:

buf_page_get_mutex
buf_flush_page_and_try_neighbors
buf_flush_flush_list_batch
buf_flush_batch
buf_flush_list
srv_master_thread
start_thread

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Core file was setup correctly, but was only created as a zero sized file by mysqld

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Crash excerpt from the error log:

131120 1:10:26 InnoDB: Assertion failure in thread 139764694140672 in file buf0flu.c line 1608
InnoDB: Failing assertion: buf_page_in_file(bpage)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
07:10:26 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=78
max_threads=3002
thread_count=55
connection_count=55
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6602304 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/opt/percona-server-5.5.34/bin/mysqld(my_print_stacktrace+0x35)[0x7eb7c5]
/opt/percona-server-5.5.34/bin/mysqld(handle_fatal_signal+0x3f4)[0x6a5ce4]
/lib64/libpthread.so.0[0x31e000f500]
/lib64/libc.so.6(gsignal+0x35)[0x31df8328e5]
/lib64/libc.so.6(abort+0x175)[0x31df8340c5]
/opt/percona-server-5.5.34/bin/mysqld[0x8953b7]
/opt/percona-server-5.5.34/bin/mysqld[0x895b8e]
/opt/percona-server-5.5.34/bin/mysqld[0x895e30]
/opt/percona-server-5.5.34/bin/mysqld[0x83b67d]
/lib64/libpthread.so.0[0x31e0007851]
/lib64/libc.so.6(clone+0x6d)[0x31df8e894d]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
Writing a core file

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

The code path to assertion is actually as follows:

buf_flush_page_and_try_neighbors()
1608 ut_a(buf_page_in_file(bpage)); <-- assertion here

buf_flush_flush_list_batch()
1762 while (bpage != NULL
1763 && len > 0
1764 && !buf_flush_page_and_try_neighbors(
1765 bpage, BUF_FLUSH_LIST, min_n, &count)) {

buf_flush_batch()
1843 case BUF_FLUSH_LIST:
1844 count = buf_flush_flush_list_batch(buf_pool, min_n, lsn_limit);

buf_flush_list()
2061 page_count = buf_flush_batch(
2062 buf_pool, BUF_FLUSH_LIST, min_n, lsn_limit);

So the assertion is while doing a flush_list flush of a page, it has nothing to do directly with neighbor page flushing. The assertion is raised very early before the actual flush of the dirty page.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Ovais -

How reproducible is this? Would it be possible to get a core file and then the stacktraces for all the threads because some race condition is likely to be involved? I assume InnoDB compression is used? If yes, this might be related to bug 1220191.

tags: added: xtradb
Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Laurynas,

This crash has occurred 3 times in 9 days, since upgrade to PS 5.5.34. The core dumps have been enabled, but the last crash produced a zero byte core dump file. The zeroed core dump file was probably related to memory allocation errors that the kernel had been reporting.

I will provide the gdb backtrace from the core file once it becomes available.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Yes InnoDB compression is used, almost all the tables are compressed tables.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

thread apply all bt

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

thread apply all bt full

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

There were two additional flush operations apart from the dirty page flushing being carried out by the master thread. However the race condition appeared to have happened between Thread 31 and Thread 1.

Thread 31 needed to read in a page into the buffer pool and for that it needs to free a block. The block it tries to free is as follows:
#6 buf_flush_relocate_on_flush_list (bpage=0x7f410f2c3380, dpage=0x7f1b14035f00) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:613
#7 0x000000000089a1b0 in buf_LRU_free_block (bpage=0x7f410f2c3380, zip=0, have_LRU_mutex=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:2034

This is the same block that the master thread is trying to flush:
#4 0x00000031df8340c5 in abort () from /lib64/libc.so.6
#5 0x00000000008953b7 in buf_flush_page_and_try_neighbors (bpage=0x7f410f2c3380, flush_type=BUF_FLUSH_LIST, n_to_flush=300, count=0x7f2138844568) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1608

So this appears to be a race condition between the flushing/relocation of page from different code paths.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

The relevant threads are

...
#4 0x00000031df8340c5 in abort () from /lib64/libc.so.6
#5 0x00000000008953b7 in buf_flush_page_and_try_neighbors (bpage=0x7f410f2c3380, flush_type=BUF_FLUSH_LIST, n_to_flush=300, count=0x7f2138844568) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1608
#6 0x0000000000895b8e in buf_flush_flush_list_batch (buf_pool=0x3a1a178, flush_type=<value optimized out>, min_n=300, lsn_limit=18446744073709551615) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:1762
...

and

...
#5 0x00000000008935aa in pfs_mutex_enter_func (bpage=0x7f410f2c3380, dpage=0x7f1b14035f00) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/include/sync0sync.ic:222
#6 buf_flush_relocate_on_flush_list (bpage=0x7f410f2c3380, dpage=0x7f1b14035f00) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0flu.c:613
#7 0x000000000089a1b0 in buf_LRU_free_block (bpage=0x7f410f2c3380, zip=0, have_LRU_mutex=1) at /mnt/workspace/percona-server-5.5-binaries/label_exp/centos5-64/Percona-Server-5.5.34-rel32.0/storage/innobase/buf/buf0lru.c:2034
...

Both threads operate on the same buffer block. The former thread holds the flush list mutex and asserts that the block is mapped to a data file page. The latter attempts to relocate the same block and for that has removed it from the page hash by buf_LRU_block_remove_hashed_page(), which sets the page state to BUF_BLOCK_REMOVE_HASH, which is what causes the assert in the first thread.

This root cause is the same as for bug 1227581, although the fix probably will have to be made anew and not backported, by auditing BUF_BLOCK_REMOVE_HASH code paths and buf_page_in_file() checks due to non-trivial 5.5 and 5.6 differences in flushing. Closing as a duplicate of bug 1227581 and continuing work there.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

@Laurynas,

Thanks for the analysis.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.