Percona Server with XtraDB

Failing assertion: mutex_own(&buf_pool->mutex) in buf0lru.c on ADD PRIMARY KEY

Reported by Elena Stepanova on 2012-08-17
78
This bug affects 26 people
Affects Status Importance Assigned to Milestone
Percona Server
Critical
Laurynas Biveinis
5.1
Undecided
Unassigned
5.5
Critical
Laurynas Biveinis

Bug Description

Percona XtraDB (http://www.percona.com) 1.1.8-27.0
Version: '5.5.27-debug-log'

InnoDB: Assertion failure in thread 140216578594560 in file buf0lru.c line 390
InnoDB: Failing assertion: mutex_own(&buf_pool->mutex)

Query (398ffd0): ALTER TABLE t1 ADD PRIMARY KEY (l_orderkey, l_linenumber)
Connection ID (thread ID): 1
Status: NOT_KILLED

#5 0x00007f86c8d8fb0b in __GI_abort () at abort.c:92
#6 0x00000000009a5e24 in buf_flush_yield (buf_pool=0x35c4fb8, bpage=0x7f86be41eca0) at storage/innobase/buf/buf0lru.c:390
#7 0x00000000009a5f80 in buf_flush_try_yield (buf_pool=0x35c4fb8, bpage=0x7f86be41eca0, processed=1024) at storage/innobase/buf/buf0lru.c:445
#8 0x00000000009a6272 in buf_flush_or_remove_pages (buf_pool=0x35c4fb8, id=10) at storage/innobase/buf/buf0lru.c:569
#9 0x00000000009a62e9 in buf_flush_dirty_pages (buf_pool=0x35c4fb8, id=10) at storage/innobase/buf/buf0lru.c:600
#10 0x00000000009a676a in buf_LRU_flush_or_remove_pages (id=10, buf_remove=BUF_REMOVE_FLUSH_NO_WRITE) at storage/innobase/buf/buf0lru.c:798
#11 0x00000000009e3db0 in fil_delete_tablespace (id=10, evict_all=0) at storage/innobase/fil/fil0fil.c:2385
#12 0x0000000000901f72 in row_drop_table_for_mysql (name=0x39a35d8 "test/t1@00232", trx=0x39a3b88, drop_db=0) at storage/innobase/row/row0mysql.c:3564
#13 0x00000000008f6d85 in row_merge_drop_table (trx=0x39a3b88, table=0x39a1198) at storage/innobase/row/row0merge.c:2672
#14 0x00000000008e2070 in ha_innobase::final_add_index (this=0x3996bb0, add_arg=0x399c0b8, commit=true) at storage/innobase/handler/handler0alter.cc:1026
#15 0x0000000000675e0c in mysql_alter_table (thd=0x38fb310, new_db=0x3990620 "test", new_name=0x39900c0 "t1", create_info=0x7f86b75ba290, table_list=0x39900f8, alter_info=0x7f86b75ba370, order_num=0, order=0x0, ignore=false) at sql/sql_table.cc:6914
#16 0x0000000000894130 in Alter_table_statement::execute (this=0x3990718, thd=0x38fb310) at sql/sql_alter.cc:106
#17 0x00000000005ee1ee in mysql_execute_command (thd=0x38fb310) at sql/sql_parse.cc:4582
#18 0x00000000005f0d3d in mysql_parse (thd=0x38fb310, rawbuf=0x398ffd0 "ALTER TABLE t1 ADD PRIMARY KEY (l_orderkey, l_linenumber)", length=57, parser_state=0x7f86b75bbff0) at sql/sql_parse.cc:5811
#19 0x00000000005e4c01 in dispatch_command (command=COM_QUERY, thd=0x38fb310, packet=0x3987e71 "ALTER TABLE t1 ADD PRIMARY KEY (l_orderkey, l_linenumber)", packet_length=57) at sql/sql_parse.cc:1061
#20 0x00000000005e3edd in do_command (thd=0x38fb310) at sql/sql_parse.cc:788
#21 0x00000000006cb0b8 in do_handle_one_connection (thd_arg=0x38fb310) at sql/sql_connect.cc:1484
#22 0x00000000006caa9e in handle_one_connection (arg=0x38fb310) at sql/sql_connect.cc:1391
#23 0x0000000000aaa0af in pfs_spawn_thread (arg=0x3966020) at storage/perfschema/pfs.cc:1015
#24 0x00007f86c9ac6efc in start_thread (arg=0x7f86b75bd700) at pthread_create.c:304
#25 0x00007f86c8e3959d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

bzr version-info
revision-id: <email address hidden>
date: 2012-08-15 10:46:21 +1000
build-date: 2012-08-17 23:08:42 +0400
revno: 290

build options: -DCMAKE_BUILD_TYPE=Debug

# Test case:
# Run as
# perl mysql-test-run.pl --mysqld=--innodb_log_buffer_size=64M \
# --mysqld=--innodb_log_file_size=64M --mysqld=--innodb_buffer_pool_size=128M <test name>

SET GLOBAL innodb_file_per_table=ON;
SET GLOBAL slow_query_log=OFF;

USE test;
DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (
          `l_orderkey` int(11) NOT NULL DEFAULT '0',
          `l_partkey` int(11) DEFAULT NULL,
          `l_suppkey` int(11) DEFAULT NULL,
          `l_linenumber` int(11) NOT NULL DEFAULT '0',
          `l_quantity` double DEFAULT NULL,
          `l_extendedprice` double DEFAULT NULL,
          `l_discount` double DEFAULT NULL,
          `l_tax` double DEFAULT NULL,
          `l_returnflag` char(1) DEFAULT NULL,
          `l_linestatus` char(1) DEFAULT NULL,
          `l_shipDATE` date DEFAULT NULL,
          `l_commitDATE` date DEFAULT NULL,
          `l_receiptDATE` date DEFAULT NULL,
          `l_shipinstruct` char(25) DEFAULT NULL,
          `l_shipmode` char(10) DEFAULT NULL,
          `l_comment` varchar(44) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

LOAD DATA LOCAL INFILE 'mdev-444.data' INTO TABLE t1;

# Server restart can be replaced by waiting for
# Innodb_checkpoint_age to become 0

--enable_reconnect
--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
restart
EOF
--shutdown_server 60
--source include/wait_until_connected_again.inc

ALTER TABLE t1 ADD PRIMARY KEY (l_orderkey, l_linenumber);

# Cleanup
DROP TABLE t1;

# End of the test case

Initially encountered on DBT3 S10 data, the description and initial analysis by Timour is also filed as https://mariadb.atlassian.net/browse/MDEV-444.

Upd: sorry, forgot to mention: data file for the LOAD command, in case you need the test case, can be found at https://mariadb.atlassian.net/secure/attachment/12000/mdev-444.data.gz

Related branches

It looks like a regression from the fix for https://bugs.launchpad.net/percona-server/+bug/1007268

This should fix it.

=== modified file 'Percona-Server/storage/innobase/buf/buf0lru.c'
--- Percona-Server/storage/innobase/buf/buf0lru.c 2012-08-10 05:05:10 +0000
+++ Percona-Server/storage/innobase/buf/buf0lru.c 2012-08-19 17:33:54 +0000
@@ -387,7 +387,7 @@
 {
        mutex_t* block_mutex;

- ut_ad(buf_pool_mutex_own(buf_pool));
+ ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
        ut_ad(buf_page_in_file(bpage));

        block_mutex = buf_page_get_mutex(bpage);

Raghu, you mean fix for 1015109 perhaps?

This looks like not complete fix for bug 1015109 for the incorrectly merged upstream revision 0.16394.47, the one which eliminated the LRU scan for dropping AHI entries.

The InnoDB buf_flush_dirty_pages() acquires the buffer pool mutex, the XtraDB one acquires the LRU list mutex. The whole of 0.16394.47 needs to be reviewed for mutex usage. Raghu's fix is correct but incomplete: as a minimum, buf_flush_yield() should also release/re-acquire the LRU list mutex instead of the buffer pool mutex.

Elena Stepanova (elenst) on 2012-08-20
description: updated

Bug 1007268 is duplicate.

One thing I didn't do in my fix was that I didn't try to create a short MTR regression testcase. It should be possible to make one with DBUG_SYNC(). Should I do this?

Joern Heissler (percona) wrote :
Download full text (6.2 KiB)

Not sure if this is the correct bug report. Please advice if it's not.

I'm running "percona-server-server-5.5 5.5.27-rel28.0-291.precise" on ubuntu 12.04, 64 bit.

I had problems with 5.5.25, and they haven't gone away after the upgrade.

The server runs in slave mode and only replicates, no other load.

InnoDB: Warning: a long semaphore wait:
--Thread 139685069317888 has waited at buf0lru.c line 1102 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619390080768 has waited at buf0flu.c line 1887 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619381688064 has waited at buf0lru.c line 1102 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
120826 9:20:05 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 15 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2355 1_second, 2283 sleeps, 227 10_second, 4171 background, 4170 flush
srv_master_thread log flush and writes: 3317
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 98450, signal count 319060
--Thread 139685069317888 has waited at buf0lru.c line 1102 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
--Thread 139619390080768 has waited at buf0flu.c line 1887 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
--Thread 139619381688064 has waited at buf0lru.c line 1102 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
Mutex spin waits 1025363, rounds 2430593, OS waits 40278
RW-shared spins 388510, rounds 4354413, OS waits 28770
RW-excl spins 124035, rounds 2089687, OS waits 20762
Spin rounds per wait: 2.37 mutex, 11.21 RW-shared, 16.85 RW-excl
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
1497638 OS file reads, 1766767 OS file writes, 65253 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/...

Read more...

Joern Heissler (percona) wrote :

Got another slave running, it appears to be stable. Version there is "5.5.24-rel26.0-256.precise"

@Joern,

The issue seems to be related to https://bugs.launchpad.net/percona-server/+bug/1040735 based on the pattern of lock waits and areas where it is being seen.

Joern Heissler (percona) wrote :

cheers @raghavendra-prabhu

Changed in android-user-issue-20128 (Unity Linux):
assignee: nobody → Alejandra Bedolla Diaz (diazalejandra2008)
Joern Heissler (percona) wrote :

I had 5.5.27-28.1 running for a couple of days in a similar setup and the bug did not appear again. Thanks!

Dan Rogers (drogers-l) wrote :

I have also been running 5.5.27-28.1 for several days without a recurrence of this bug.

Thanks for getting it fixed!

Changed in android-user-issue-20128 (Unity Linux):
assignee: Alejandra Bedolla Diaz (diazalejandra2008) → nobody
affects: android-user-issue-20128 (Unity Linux) → percona-xtradb
no longer affects: percona-xtradb
Roel Van de Paar (roel11) wrote :

Bug is present in Percona-Server-5.5.28-rel29.3: see new bug logged @ bug 1103850

Roel Van de Paar (roel11) wrote :

Correction: though the assert is the same, the bug is different.

Bug 1103850 is not the recurrence of the current bug: different cause, affects only debug builds, happens only intermittently if compressed tables are used.

There is no bug 1038225 regression.

Roel Van de Paar (roel11) wrote :

Apologies, was under the impression the assert was the same. It is also different. My bad.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers