Percona Server moved to https://jira.percona.com/projects/PS

InnoDB: Failing assertion: mutex_own(&buf_pool->LRU_list_mutex) in file buf0lru.c line 859

Bug #1103850 reported by Roel Van de Paar on 2013-01-24

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
Percona Server moved to https://jira.percona.com/projects/PS	Invalid	Undecided	Unassigned
5.1	Fix Released	High	Laurynas Biveinis	Percona Server moved to https://jira.percona.com/projects/PS 5.1.71-14.9
5.5	Fix Released	High	Laurynas Biveinis	Percona Server moved to https://jira.percona.com/projects/PS 5.5.30-30.2
5.6	Invalid	Undecided	Unassigned

Bug Description

130124 7:34:12 InnoDB: Assertion failure in thread 139816180811520 in file buf0lru.c line 859
InnoDB: Failing assertion: mutex_own(&buf_pool->LRU_list_mutex)

Thread 1 (LWP 15772):
+bt
#0 0x0000003da180c60c in pthread_kill () from /lib64/libpthread.so.0
#1 0x00000000007de838 in my_write_core (sig=6) at /ssd/5.5/Percona-Server-5.5.28-rel29.3/mysys/stacktrace.c:433
#2 0x00000000006b2e14 in handle_fatal_signal (sig=6) at /ssd/5.5/Percona-Server-5.5.28-rel29.3/sql/signal_handler.cc:249
#3 <signal handler called>
#4 0x0000003da1435935 in raise () from /lib64/libc.so.6
#5 0x0000003da14370e8 in abort () from /lib64/libc.so.6
#6 0x00000000008f339a in buf_LRU_insert_zip_clean (bpage=bpage@entry=0x7f297043d380) at /ssd/5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/buf/buf0lru.c:859
#7 0x00000000008eb0c9 in buf_flush_remove (bpage=bpage@entry=0x7f297043d380) at /ssd/5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/buf/buf0flu.c:538
#8 0x00000000008ecb5e in buf_flush_ready_for_flush (flush_type=BUF_FLUSH_LIST, bpage=0x7f297043d380) at /ssd/5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/buf/buf0flu.c:487
#9 buf_flush_page_and_try_neighbors (bpage=bpage@entry=0x7f297043d380, flush_type=flush_type@entry=BUF_FLUSH_LIST, n_to_flush=n_to_flush@entry=200, count=count
@entry=0x7f297dca7668) at /ssd/5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/buf/buf0flu.c:1587
#10 0x00000000008ef41f in buf_flush_flush_list_batch (lsn_limit=18446744073709551615, min_n=200, buf_pool=0x2a2ee48) at /ssd/5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/buf/buf0flu.c:1740
#11 buf_flush_batch (buf_pool=buf_pool@entry=0x2a2ee48, flush_type=flush_type@entry=BUF_FLUSH_LIST, min_n=min_n@entry=200, lsn_limit=lsn_limit@entry=18446744073
709551615) at /ssd/5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/buf/buf0flu.c:1830
#12 0x00000000008effbe in buf_flush_list (min_n=<optimized out>, lsn_limit=18446744073709551615, lsn_limit@entry=1) at /ssd/5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/buf/buf0flu.c:2047
#13 0x0000000000876587 in srv_master_thread (arg=<optimized out>) at /ssd/5.5/Percona-Server-5.5.28-rel29.3/storage/innobase/srv/srv0srv.c:3694
#14 0x0000003da1807d14 in start_thread () from /lib64/libpthread.so.0
#15 0x0000003da14f168d in clone () from /lib64/libc.so.6

Found in trunk @ Percona-Server-5.5.28-rel29.3-409-debug.Linux.x86_64

Tags:

Related branches

lp:~laurynas-biveinis/percona-server/buf-mutex-split-fixes-5.1

Merged into lp:percona-server/5.1 at revision 585

Laurynas Biveinis (community): Approve on 2013-08-16

Alexey Kopytov (community): Approve on 2013-08-15

George Ormond Lorch III (community): Approve (g2) on 2013-06-06

lp:~laurynas-biveinis/percona-server/rm-lazy-drop-table-5.5

Merged into lp:percona-server/5.5 at revision 496

Alexey Kopytov (community): Approve on 2013-04-04

Revision history for this message

Roel Van de Paar (roel11) wrote on 2013-01-24:

thread apply all bt on core Edit (15.4 KiB, text/plain)

Revision history for this message

Roel Van de Paar (roel11) wrote on 2013-01-24:

Large testcase, can be optimized. Use build-binary.sh --debug build. Use start_mtr4 then test_mtr4. Edit (7.7 MiB, application/x-tar)

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-01-24:

A debug-only issue: buf_flush_remove needs to hold the LRU list mutex for the flush list flushes for BUF_BLOCK_ZIP_DIRTY pages. Due to mutex locking order, the way to fix this is to lock LRU in buf_flush_flush_list_batch #if UNIV_DEBUG || UNIV_BUF_DEBUG.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-01-24:

Likewise for 5.1 (with buf_flush_batch instead of buf_flush_flush_list_batch).

Revision history for this message

Roel Van de Paar (roel11) wrote on 2013-01-24:

(Bug is single threaded reproducible)

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-01-24:

Great, the single-threadeness fits the analysis.

Revision history for this message

Roel Van de Paar (roel11) wrote on 2013-01-27:

See bug 1038225

Revision history for this message

Roel Van de Paar (roel11) wrote on 2013-01-28:

Ignore #7, unrelated to bug 1038225

Laurynas Biveinis (laurynas-biveinis) on 2013-02-05

tags:

added: xtradb

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2013-03-13:

I hit this today while trying to reproduce 1086700 against PS 5.5 trunk revno 466

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2013-03-13:

#10

Laurynas, if I understood your suggestion, it doesn't work. I tried a few things and my final changes still caused assertions on startup for a fresh server:

=== modified file 'Percona-Server/storage/innobase/buf/buf0flu.c'
--- Percona-Server/storage/innobase/buf/buf0flu.c 2013-02-18 04:48:10 +0000
+++ Percona-Server/storage/innobase/buf/buf0flu.c 2013-03-13 22:36:14 +0000
@@ -1757,11 +1757,22 @@
   /* The list may change during the flushing and we cannot
   safely preserve within this function a pointer to a
   block in the list! */
- while (bpage != NULL
- && len > 0
- && !buf_flush_page_and_try_neighbors(
+ while (bpage != NULL && len > 0 ) {
+#if UNIV_DEBUG || UNIV_BUF_DEBUG
+ ibool have_LRU_list_mutex =
+ mutex_own(&buf_pool->LRU_list_mutex);
+
+ if (!have_LRU_list_mutex)
+ mutex_enter(&buf_pool->LRU_list_mutex);
+#endif
+ if (buf_flush_page_and_try_neighbors(
     bpage, BUF_FLUSH_LIST, min_n, &count)) {
-
+ break;
+ }
+#if UNIV_DEBUG || UNIV_BUF_DEBUG
+ if (!have_LRU_list_mutex)
+ mutex_exit(&buf_pool->LRU_list_mutex);
+#endif
    buf_flush_list_mutex_enter(buf_pool);

/* If we are here that means that buf_pool->mutex

Initializing server instance: id[1] bindir[/data/bin/ps-5.5-PS-81-debug]
Creating database...done.
Starting server instance 1 .....ERROR: failed to start.
130313 15:36:36 [Note] Plugin 'FEDERATED' is disabled.
130313 15:36:36 InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
130313 15:36:36 InnoDB: The InnoDB memory heap is disabled
130313 15:36:36 InnoDB: Mutexes and rw_locks use GCC atomic builtins
130313 15:36:36 InnoDB: Compressed tables use zlib 1.2.3
130313 15:36:36 InnoDB: Using Linux native AIO
130313 15:36:36 InnoDB: Initializing buffer pool, size = 4.0G
130313 15:36:36 InnoDB: Completed initialization of buffer pool
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
130313 15:36:36 InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
130313 15:36:37 InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 5 MB
InnoDB: Database physically writes the file full: wait...
130313 15:36:37 InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 5 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Doublewrite buffer not found: creating new
130313 15:36:37 InnoDB: Assertion failure in thread 140222067590944 in file buf0flu.c line 1139
InnoDB: Failing assertion: !mutex_own(&buf_pool->LRU_list_mutex)

Laurynas, if I understood your suggestion, it doesn't work. I tried a few things and my final changes still caused assertions on startup for a fresh server:

=== modified file 'Percona-Server/storage/innobase/buf/buf0flu.c'
--- Percona-Server/storage/innobase/buf/buf0flu.c	2013-02-18 04:48:10 +0000
+++ Percona-Server/storage/innobase/buf/buf0flu.c	2013-03-13 22:36:14 +0000
@@ -1757,11 +1757,22 @@
 		/* The list may change during the flushing and we cannot
 		safely preserve within this function a pointer to a
 		block in the list! */
-		while (bpage != NULL
-		       && len > 0
-		       && !buf_flush_page_and_try_neighbors(
+		while (bpage != NULL && len > 0 ) {
+#if UNIV_DEBUG || UNIV_BUF_DEBUG
+			ibool have_LRU_list_mutex = 
+				mutex_own(&buf_pool->LRU_list_mutex);
+
+			if (!have_LRU_list_mutex)
+				mutex_enter(&buf_pool->LRU_list_mutex);
+#endif
+			if (buf_flush_page_and_try_neighbors(
 				bpage, BUF_FLUSH_LIST, min_n, &count)) {
-
+				break;
+			}
+#if UNIV_DEBUG || UNIV_BUF_DEBUG
+		if (!have_LRU_list_mutex)
+			mutex_exit(&buf_pool->LRU_list_mutex);
+#endif
 			buf_flush_list_mutex_enter(buf_pool);
 
 			/* If we are here that means that buf_pool->mutex

Initializing server instance: id[1] bindir[/data/bin/ps-5.5-PS-81-debug]
Creating database...done.
Starting server instance 1 .....ERROR: failed to start.
130313 15:36:36 [Note] Plugin 'FEDERATED' is disabled.
130313 15:36:36 InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
130313 15:36:36 InnoDB: The InnoDB memory heap is disabled
130313 15:36:36 InnoDB: Mutexes and rw_locks use GCC atomic builtins
130313 15:36:36 InnoDB: Compressed tables use zlib 1.2.3
130313 15:36:36 InnoDB: Using Linux native AIO
130313 15:36:36 InnoDB: Initializing buffer pool, size = 4.0G
130313 15:36:36 InnoDB: Completed initialization of buffer pool
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
130313 15:36:36  InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
130313 15:36:37  InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 5 MB
InnoDB: Database physically writes the file full: wait...
130313 15:36:37  InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 5 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Doublewrite buffer not found: creating new
130313 15:36:37  InnoDB: Assertion failure in thread 140222067590944 in file buf0flu.c line 1139
InnoDB: Failing assertion: !mutex_own(&buf_pool->LRU_list_mutex)

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-03-14:

#11

George -

For one thing,
+ ibool have_LRU_list_mutex =
+ mutex_own(&buf_pool->LRU_list_mutex);
+
+ if (!have_LRU_list_mutex)
+ mutex_enter(&buf_pool->LRU_list_mutex);

is a race condition.

I worked on this bug couple of weeks ago, and it appears that, while the fix idea is simple, the code reorganization to achieve it is not. If this bug is blocking further work for a more critical bug, I'd disable the failing UNIV_DEBUG site and proceed.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-05-17:

#12

I see three ways to fix this.

1) Acquire LRU list mutex in advance (i.e. before the flush list mutex) for debug builds where needed in order to satisfy the current code. But this comes close to unsplitting the buffer pool mutex split, meaning that the debug builds would become less useful in diagnosing any issues with the split.

2) In debug builds, buffer-fix the block, release the mutexes before the buf_LRU_insert_zip_clean() call in buf_flush_remove() and reacquire them again, this time with LRU mutex acquired too, undo this after the buf_LRU_insert_zip_clean() call.

3) Do not use zip_clean list in debug builds neither.

I am deciding between 2) and 3).

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-05-17:

#13

Option 2) becomes quite complicated because buf_flush_remove() may be called with different mutexes being held from different contexts (and even more so after the bug 1086680 fix). Option 3) starts looking more attractive. The zip_clean list maintenance does not seem to enforce anything vital.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-05-19:

#14

The above reasoning re. option 2) applies to 5.1 more than to 5.5. In 5.5 it may be a feasible option.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-05-19:

#15

In 5.5 all the current buf_flush_remove() callers hold the LRU list mutex. This bug only occurred when buf_flush_ready_for_flush() called buf_flush_remove(), and this has been fixed with the lazy drop table feature removal.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#16

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-2883

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.