Percona Server moved to https://jira.percona.com/projects/PS

Assertion failure in file buf0buf.cc line 3687, buf_page_init_for_read() w/ multiple purge threads and compressed tables

Bug #1247021 reported by Ivor B. on 2013-11-01

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
Percona Server moved to https://jira.percona.com/projects/PS	Fix Released	High	Laurynas Biveinis	Percona Server moved to https://jira.percona.com/projects/PS 5.6.16-64.0
5.1	Invalid	Undecided	Unassigned
5.5	Invalid	Undecided	Unassigned
5.6	Fix Released	High	Laurynas Biveinis	Percona Server moved to https://jira.percona.com/projects/PS 5.6.16-64.0

Bug Description

I'm experiencing constant crashes of percona server; Assertion failure in file buf0buf.cc line 3687

/usr/local/mysql/bin/mysqld Ver 5.6.14 for Linux on x86_64 (MySQL Community Server (GPL)) (compiled from source)

Tried downgrading a couple of versions, but that didn't help.
Unable to provide a testcase, because the crashes seem random

2013-11-01 09:59:27 7fb231e20700 InnoDB: Assertion failure in thread 140403317802752 in file buf0buf.cc line 3687
InnoDB: Failing assertion: buf_fix_count > 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
08:59:27 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=25
max_threads=1502
thread_count=9
connection_count=8
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 606677 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x35)[0x8db1d5]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x4c4)[0x6628e4]
/lib64/libpthread.so.0[0x3f9420f500]
/lib64/libc.so.6(gsignal+0x35)[0x3f93e328e5]
/lib64/libc.so.6(abort+0x175)[0x3f93e340c5]
/usr/local/mysql/bin/mysqld[0xa5c57a]
/usr/local/mysql/bin/mysqld[0xa6e152]
/usr/local/mysql/bin/mysqld[0xa6ee73]
/usr/local/mysql/bin/mysqld[0xa58fee]
/usr/local/mysql/bin/mysqld[0xa3e816]
/usr/local/mysql/bin/mysqld[0x9e245d]
/usr/local/mysql/bin/mysqld[0x9e0b6c]
/usr/local/mysql/bin/mysqld[0x9e13d7]
/usr/local/mysql/bin/mysqld[0x9e19fa]
/usr/local/mysql/bin/mysqld[0x9b14e3]
/usr/local/mysql/bin/mysqld[0x9fce2a]
/lib64/libpthread.so.0[0x3f94207851]
/lib64/libc.so.6(clone+0x6d)[0x3f93ee894d]

Tags:

Related branches

lp:~laurynas-biveinis/percona-server/bug1247021

Merged into lp:percona-server/5.6 at revision 526

George Ormond Lorch III (community): Approve (g2) on 2013-12-02

Revision history for this message

Ivor B. (ivor.b) wrote on 2013-11-01:

Config file Edit (1.3 KiB, text/plain)

Revision history for this message

Ivor B. (ivor.b) wrote on 2013-11-03:

After reinstalling (compiled from latest source) and restoring the DB from a dump the bug was still there (it took some time, only after the server had been online for a day)

Turning off binlog seems to fix it. Changing innodb_purge_threads to 1 (with active binlog) also seems to work.

Resolved stacktrace;

0x8db7c5 my_print_stacktrace + 53
0x663114 handle_fatal_signal + 1220
0x3f9420f500 _end + -1830327664
0x3f93e328e5 _end + -1834377611
0x3f93e340c5 _end + -1834371499
0xa5cbaa buf_page_init_for_read(dberr_t*, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long) + 2714
0xa6e5e2 buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long, trx_t*) + 162
0xa6f303 buf_read_page(unsigned long, unsigned long, unsigned long, trx_t*) + 83
0xa591fe buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*) + 830
0xa47093 btr_cur_search_to_nth_level(dict_index_t*, unsigned long, dtuple_t const*, unsigned long, unsigned long, btr_cur_t*, unsigned long, char const*, unsigned long, mtr_t*) + 803
0x9e31ad row_search_index_entry(dict_index_t*, dtuple_t const*, unsigned long, btr_pcur_t*, mtr_t*) + 141
0x9e188c row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, dtuple_t const*) + 204
0x9e29bc row_purge_step(que_thr_t*) + 1900
0x9b20ef que_run_threads(que_thr_t*) + 1919
0x9fb2fd srv_worker_thread + 781
0x3f94207851 _end + -1830359583
0x3f93ee894d _end + -1833632035

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-11-04:

Ivor -

Thank you for the bug report. Would it be possible for you to start a debug server build on the affected instance with the crashing conditions enabled (binlog on; multiple purge threads), and post any resulting crashes?

Revision history for this message

Ivor B. (ivor.b) wrote on 2013-11-05:

Could you please explain how to properly build a debug server using CMake?

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-11-05:

Ivor -

Use the same CMake options you used for building server in comment #2, and add -DCMAKE_BUILD_TYPE=Debug. If you already had -DCMAKE_BUILD_TYPE=RelWithDebInfo or any other value, please replace it. Thanks.

Revision history for this message

Ivor B. (ivor.b) wrote on 2013-11-06:

Hello Laurynas,

After compiling as follows:

cd percona-server/5.6
make
cd Percona-Server-5.6.14-rel61.1
rm CMakeCache.txt
cmake . -DCMAKE_BUILD_TYPE=Debug -DBUILD_CONFIG=mysql_release -DFEATURE_SET=community -DWITH_EMBEDDED_SERVER=OFF && make all install

And starting the server like this:

/usr/local/mysql/bin/mysqld --datadir=/var/lib/mysql --plugin-dir=/usr/local/mysql/lib/mysql/plugin --user=mysql --open-files-limit=65535 --pid-file=/tmp/my.pid --socket=/var/lib/mysql/mysql.sock --basedir=/usr/local/mysql

With binlog enabled and purge thread at 8, i'm unable to reproduce the bug..

On the other hand, running the stock RPM with these settings also doesn't produce the bug anymore.

They only thing recently changed was purging some old data from the DB and adding compression to one large InnoDB table.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-11-07:

Ivor -

I see, thank you. Should the issue come up again, please keep us posted. Meanwhile we still might be able to proceed by code analysis.

Revision history for this message

David O'Dell (daveo-1) wrote on 2013-11-27:

When running 5.6.14-rel62.0-483.squeeze I get this error on every restart.

I had innodb_purge_threads set to 8.

When I changed it to 1 the error no longer appears.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-11-27:

The issue is buf_page_init_for_read() adjusted for the buffer pool
mutex split incorrectly, visible in a combination of compressed
tables and multiple active purge threads, in 5.6 only.

In upstream, decrement of watch page fix count in
buf_pool_watch_unset() is protected by a combination of the relevant
page_hash X latch, buffer pool mutex, and possibly block mutex. In
XtraDB, buf_pool_watch_unset() locks page_hash X latch and then
either a block mutex or the zip mutex.

In upstream, the code path leading to the failing assert temporarily
releases page_hash latch around buf_buddy_alloc() call, which in turn
may release the buffer pool mutex too. If that happens, the watch
page is re-read from the page hash.

In the same code path in XtraDB, the buffer pool mutex was substituted
by the LRU list mutex and the logic was kept the same. Which is
incorrect, because page_hash unlock in itself is enough to necessitate
the page_hash recheck regardless of the LRU list mutex, because of
buf_pool_watch_unset() differences.

tags:

added: bp-split xtradb

Laurynas Biveinis (laurynas-biveinis) on 2013-11-27

summary:

- Assertion failure in file buf0buf.cc line 3687
+ Assertion failure in file buf0buf.cc line 3687, buf_page_init_for_read()
+ w/ multiple purge threads and compressed tables

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#10

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-740