Bug #1040735 “Server hang in 5.5.27-rel28” : Series 5.5 : Bugs : Percona Server moved to https://jira.percona.com/projects/PS

Revision history for this message

Dan Rogers (drogers-l) wrote on 2012-08-23:

#1

sugar-prod-db04-lax1.err Edit (189.6 KiB, text/plain)

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-08-23:

#2

Thanks for your report.

There is a workaround of innodb_lazy_drop_table=1. Can you see if it helps for you?

Revision history for this message

Dan Rogers (drogers-l) wrote on 2012-08-23:

#3

I don't believe that would help in this case. There were no DROP TABLE or TRUNCATE TABLE commands running. There was a single INSERT being applied by the slave thread, then hundreds of hung SELECTs until the server ran out of connections.

Note as well that mine was hung on:

Mutex at 0xc776078 '&buf_pool->mutex', lock var 1

Whereas the other bug was:

S-lock on RW-latch at 0x416f048 '&buf_pool->page_hash_latch'

So perhaps they aren't related?

Revision history for this message

Thomas Babut (thbabut) wrote on 2012-08-24:

#4

Same problem here. Multiple servers hang on random UPDATE statements since percona update to 5.5.27-28.0 (all servers with Debian Squeeze x86-64). These servers were running fine with 5.5.25a-27.1. There were no DROP TABLE or TRUNCATE TABLE commands running either.

Revision history for this message

Joern Heissler (joernheissler) wrote on 2012-08-26:

#5

Download full text (6.4 KiB)

I'm running "percona-server-server-5.5 5.5.27-rel28.0-291.precise" on ubuntu 12.04, 64 bit.

The server runs in slave mode and only replicates, no other load.

The query which is currently being replicated looks like this:
INSERT INTO `sometable` (columms...) VALUES (...),... ON DUPLICATE KEY UPDATE some stuff;

Symptoms: Query got stuck, other queries won't get executed but hang. (show slave status, show processlist are the only exceptions), eventually all connections are used up. Only little CPU load, maybe from replication thread which download logs.

Similar problems happened with 5.5.25, there it was "TRUNCATE TABLE" or "DROP TABLE" iirc.

Another slave running "5.5.24-rel26.0-256.precise" seems to be stable. I downgraded to see if it helps.

InnoDB: Warning: a long semaphore wait:
--Thread 139685069317888 has waited at buf0lru.c line 1102 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619390080768 has waited at buf0flu.c line 1887 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619381688064 has waited at buf0lru.c line 1102 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
120826 9:20:05 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 15 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2355 1_second, 2283 sleeps, 227 10_second, 4171 background, 4170 flush
srv_master_thread log flush and writes: 3317
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 98450, signal count 319060
--Thread 139685069317888 has waited at buf0lru.c line 1102 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
--Thread 139619390080768 has waited at buf0flu.c line 1887 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
--Thread 139619381688064 has waited at buf0lru.c line 1102 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
Mutex spin waits 1025363, rounds 2430593, OS waits 40278
RW-shared spins 388510, rounds 4354413, OS waits 28770
RW-excl spins 124035, rounds 2089687, OS waits 20762
Spin rounds per wait: 2.37 mutex, 11.21 RW-shared, 16.85 RW-excl
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio reque...

I'm running "percona-server-server-5.5 5.5.27-rel28.0-291.precise" on ubuntu 12.04, 64 bit.

The server runs in slave mode and only replicates, no other load.

The query which is currently being replicated looks like this:
INSERT INTO `sometable` (columms...) VALUES (...),... ON DUPLICATE KEY UPDATE some stuff;

Symptoms: Query got stuck, other queries won't get executed but hang. (show slave status, show processlist are the only exceptions), eventually all connections are used up. Only little CPU load, maybe from replication thread which download logs.

Similar problems happened with 5.5.25, there it was "TRUNCATE TABLE" or "DROP TABLE" iirc.

Another slave running "5.5.24-rel26.0-256.precise" seems to be stable. I downgraded to see if it helps.

InnoDB: Warning: a long semaphore wait:
--Thread 139685069317888 has waited at buf0lru.c line 1102 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619390080768 has waited at buf0flu.c line 1887 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619381688064 has waited at buf0lru.c line 1102 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
120826 9:20:05 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 15 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2355 1_second, 2283 sleeps, 227 10_second, 4171 background, 4170 flush
srv_master_thread log flush and writes: 3317
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 98450, signal count 319060
--Thread 139685069317888 has waited at buf0lru.c line 1102 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
--Thread 139619390080768 has waited at buf0flu.c line 1887 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
--Thread 139619381688064 has waited at buf0lru.c line 1102 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
Mutex spin waits 1025363, rounds 2430593, OS waits 40278
RW-shared spins 388510, rounds 4354413, OS waits 28770
RW-excl spins 124035, rounds 2089687, OS waits 20762
Spin rounds per wait: 2.37 mutex, 11.21 RW-shared, 16.85 RW-excl
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
1497638 OS file reads, 1766767 OS file writes, 65253 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 6071, seg size 6073, 12305 merges
merged operations:
 insert 13945, delete mark 242250, delete 2960
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 123248603, node heap has 81590 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 3250958542457
Log flushed up to 3250958542184
Last checkpoint at 3250196518412
Max checkpoint age 869019772
Checkpoint age target 841862905
Modified age 759564708
Checkpoint age 762024045
0 pending log writes, 0 pending chkp writes
493077 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 63979913216; in additional pool allocated 0
Internal hash tables (constant factor + variable factor)
    Adaptive hash index 2322763600 (985988824 + 1336774776)
    Page hash 15406904 (buffer pool 0 only)
    Dictionary cache 247027308 (246498992 + 528316)
    File system 207192 (82672 + 124520)
    Lock system 154062000 (154061624 + 376)
    Recovery system 0 (0 + 0)
Dictionary memory allocated 528316

InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Warning: a long semaphore wait:
--Thread 139685069317888 has waited at buf0lru.c line 1102 for 272.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619390080768 has waited at buf0flu.c line 1887 for 272.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619381688064 has waited at buf0lru.c line 1102 for 272.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Warning: a long semaphore wait:
--Thread 139685069317888 has waited at buf0lru.c line 1102 for 303.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619390080768 has waited at buf0flu.c line 1887 for 303.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619381688064 has waited at buf0lru.c line 1102 for 303.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
InnoDB: ###### Diagnostic info printed to the standard error stream

Any more information I can provide on this?

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-26:

#6

Thank you for all the information.

However, it would be better if someone can provide a full backtrace with something like: (debug symbols package for mysql/percona-server must be installed)

sudo gdb --batch --quiet -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' -p $(pidof mysqld)

Revision history for this message

Joern Heissler (joernheissler) wrote on 2012-08-26:

#7

gdb output as requested in #6 Edit (86.3 KiB, text/plain)

Okay, here's the gdb output. Hope it helps :)

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-26:

#8

Thank for the gdb output, I presume this backtrace was taken when the server was hung?

Revision history for this message

Joern Heissler (joernheissler) wrote on 2012-08-26:

#9

Yes, I took it after the server started to spit out logs regarding the problem.

The two servers I downgraded to 5.5.24 are running fine now. Could be coincidence, but I'm pretty sure that it is not.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-26:

#10

The lock contention seems to be mainly about buf_pool mutex in buf_LRU_try_free_flushed_blocks and buf_flush_start. However, I have not been able to reproduce it with configuration in description and sysbench. (also in master slave mode). In the backtrace provided there isn't much in buf0buf except for innodb monitor print.

If steps to reproduce this can be provided(along with cnf) , that would be great.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-08-27:

#11

Threads 21-12 are idling I/O threads.
Thread 11 is the lock timeout thread, idle.
Thread 10 is the error monitor thread waiting to acquire srv_innodb_monitor_mutex.
Thread 9 is the monitor thread waiting to acquire the buffer pool mutex, it holds LRU and free list mutexes.
Thread 8 is the LRU dump thread, idle.
Thread 7 is the InnoDB master thread doing buffer flush, waiting to acquire the buffer pool mutex.
Thread 6 is the purge worker thread, waiting to lock the index tree.
Thread 5 is the signal handler thread, idle.
Thread 4 is the slave I/O thread.
Thread 3 is the slave SQL thread, waiting to acquire the buffer pool mutex.
Thread 2 is a cached MySQL connection thread.
Thread 1 is the main server thread.

It is not apparent which of the threads deadlock here, but it's a reasonably small number of threads for further analysis.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-08-27:

#12

See also comments #10 and #11 on bug 1026926, and
comments #28 and #29 on bug 1007268.

Revision history for this message

Dan Rogers (drogers-l) wrote on 2012-08-27:

#13

I've updated one of my slave-only databases to 5.5.27-28.0 and installed the debuginfo package to see if I can get it to hang like the other one did. The configurations are nearly identical, so hopefully it'll provide useful information.

As another data point, the server that hung last week and was the basis for this bug report was downgraded to 5.5.25a-27.1 and has been running fine since August 23rd on this release. With 5.5.27-28.0, it hung only an hour after being restarted.

Revision history for this message

Joern Heissler (joernheissler) wrote on 2012-08-27:

#14

my.cnf Edit (3.4 KiB, text/plain)

> If steps to reproduce this can be provided(along with cnf) , that would be great.

Not really. I just start my server and it hangs before long. In debug output from #7 no other connections were made to the server (blocked via iptables), so only replication was running. It hang almost immediately.

Configuration is attached.
The init-file shouldn't be at fault; when I skip the slave start, it seems to work, until I start it.

Btw, 5.5.24 still not crashed since downgrade on two servers.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-27:

#15

Bogdar on #percona channel reported a similar reproducible hang on his slave.

Here is the backtrace:

http://pastebin.ca/2198485

Monitor output: http://pastebin.ca/2198486 http://pastebin.ca/2198488

It has also been reported/discussed here http://forum.percona.com/index.php?t=msg&th=2522&goto=9240&

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-27:

#16

Download full text (18.2 KiB)

From the previous backtrace, this is what the gist is: (annotated with where it spins)

Thread 8 (Thread 0x7f2317f0e700 (LWP 16676)):
#0 0x00007f233adc61fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1 0x00000000008f6285 in os_cond_wait (event=0x35b59d0, reset_sig_count=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/os/os0sync.c:207
#2 os_event_wait_low (event=0x35b59d0, reset_sig_count=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/os/os0sync.c:609
#3 0x000000000082dab2 in sync_array_wait_event (arr=0x3037d30, index=2) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/sync/sync0arr.c:458
#4 0x000000000082ecfc in mutex_spin_wait (mutex=0x7f2327c848c0, file_name=<value optimized out>, line=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/sync/sync0sync.c:653
#5 0x000000000086f19e in mutex_enter_func (mutex=0x7f2327c848c0, file_name=0xa83d80 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c", line=3667) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/sync0sync.ic:222
#6 pfs_mutex_enter_func (mutex=0x7f2327c848c0, file_name=0xa83d80 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c", line=3667) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/sync0sync.ic:251
#7 0x00000000008773ae in buf_page_init_for_read (err=<value optimized out>, mode=<value optimized out>, space=108, zip_size=<value optimized out>, unzip=<value optimized out>, tablespace_version=<value optimized out>, offset=1771977) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c:3667

>>>>
buf_pool_mutex_enter(buf_pool);
buf_pool->n_pend_reads++;
buf_pool_mutex_exit(buf_pool);
>>>>

#8 0x0000000000885b14 in buf_read_page_low (err=0x7f2317f0d028, sync=<value optimized out>, mode=<value optimized out>, space=108, zip_size=0, unzip=0, tablespace_version=124, offset=1771977, trx=0x0) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0rea.c:165
#9 0x00000000008865a5 in buf_read_ibuf_merge_pages (sync=<value optimized out>, space_ids=<value optimized out>, space_versions=<value optimized out>, page_nos=0x7f2317f0d6a0, n_stored=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0rea.c:816
#10 0x00000000008c9440 in ibuf_contract_ext (n_pages=0x7f2317f0d728, sync=<value optimized out>) at /home/jenkins/workspace/perc...

From the previous backtrace, this is what the gist is: (annotated with where it spins)

Thread 8 (Thread 0x7f2317f0e700 (LWP 16676)):
#0  0x00007f233adc61fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x00000000008f6285 in os_cond_wait (event=0x35b59d0, reset_sig_count=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/os/os0sync.c:207
#2  os_event_wait_low (event=0x35b59d0, reset_sig_count=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/os/os0sync.c:609
#3  0x000000000082dab2 in sync_array_wait_event (arr=0x3037d30, index=2) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/sync/sync0arr.c:458
#4  0x000000000082ecfc in mutex_spin_wait (mutex=0x7f2327c848c0, file_name=<value optimized out>, line=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/sync/sync0sync.c:653
#5  0x000000000086f19e in mutex_enter_func (mutex=0x7f2327c848c0, file_name=0xa83d80 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c", line=3667) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/sync0sync.ic:222
#6  pfs_mutex_enter_func (mutex=0x7f2327c848c0, file_name=0xa83d80 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c", line=3667) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/sync0sync.ic:251
#7  0x00000000008773ae in buf_page_init_for_read (err=<value optimized out>, mode=<value optimized out>, space=108, zip_size=<value optimized out>, unzip=<value optimized out>, tablespace_version=<value optimized out>, offset=1771977) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c:3667

>>>>
	buf_pool_mutex_enter(buf_pool);
	buf_pool->n_pend_reads++;
	buf_pool_mutex_exit(buf_pool);
>>>>

#8  0x0000000000885b14 in buf_read_page_low (err=0x7f2317f0d028, sync=<value optimized out>, mode=<value optimized out>, space=108, zip_size=0, unzip=0, tablespace_version=124, offset=1771977, trx=0x0) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0rea.c:165
#9  0x00000000008865a5 in buf_read_ibuf_merge_pages (sync=<value optimized out>, space_ids=<value optimized out>, space_versions=<value optimized out>, page_nos=0x7f2317f0d6a0, n_stored=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0rea.c:816
#10 0x00000000008c9440 in ibuf_contract_ext (n_pages=0x7f2317f0d728, sync=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/ibuf/ibuf0ibuf.c:2729
#11 0x00000000008c9527 in ibuf_contract_for_n_pages (sync=0, n_pages=10) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/ibuf/ibuf0ibuf.c:2775
#12 0x0000000000825460 in srv_master_thread (arg=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/srv/srv0srv.c:3341
#13 0x00007f233adc18ca in start_thread () from /lib/libpthread.so.0
#14 0x00007f233a06292d in clone () from /lib/libc.so.6
#15 0x0000000000000000 in ?? ()
Thread 7 (Thread 0x7f231770d700 (LWP 16677)):
#0  0x00007f233adc61fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x00000000008f6285 in os_cond_wait (event=0x35b59d0, reset_sig_count=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/os/os0sync.c:207
#2  os_event_wait_low (event=0x35b59d0, reset_sig_count=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/os/os0sync.c:609
#3  0x000000000082dab2 in sync_array_wait_event (arr=0x3037d30, index=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/sync/sync0arr.c:458
#4  0x000000000082ecfc in mutex_spin_wait (mutex=0x7f2327c848c0, file_name=<value optimized out>, line=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/sync/sync0sync.c:653
#5  0x000000000087e4fe in mutex_enter_func (mutex=0x7f2327c848c0, file_name=0xa857f0 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0lru.c", line=1102) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/sync0sync.ic:222
#6  pfs_mutex_enter_func (mutex=0x7f2327c848c0, file_name=0xa857f0 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0lru.c", line=1102) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/sync0sync.ic:251
#7  0x00000000008853d8 in buf_LRU_try_free_flushed_blocks (buf_pool=0x7f2327c848c0) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0lru.c:1102

>>>>>
		buf_pool_mutex_enter(buf_pool);

while (buf_pool->LRU_flush_ended > 0) {

buf_pool_mutex_exit(buf_pool);

buf_LRU_search_and_free_block(buf_pool, 1);
>>>>>>

#8  0x00000000008854d6 in buf_LRU_try_free_flushed_blocks (buf_pool=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0lru.c:1099
#9  0x0000000000863dd8 in ibuf_should_try (index=0x7f2327e9a0c0, level=<value optimized out>, tuple=<value optimized out>, mode=<value optimized out>, latch_mode=<value optimized out>, cursor=<value optimized out>, has_search_latch=0, file=0xa7a110 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0row.c", line=797, mtr=0x7f231770c620) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/ibuf0ibuf.ic:138
#10 btr_cur_search_to_nth_level (index=0x7f2327e9a0c0, level=<value optimized out>, tuple=<value optimized out>, mode=<value optimized out>, latch_mode=<value optimized out>, cursor=<value optimized out>, has_search_latch=0, file=0xa7a110 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0row.c", line=797, mtr=0x7f231770c620) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/btr/btr0cur.c:609
#11 0x0000000000811487 in btr_pcur_open_func (index=0x35b5a14, entry=0x7f23280a3920, mode=8194, pcur=0x7f231770cad0, mtr=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/btr0pcur.ic:438
#12 row_search_index_entry (index=0x35b5a14, entry=0x7f23280a3920, mode=8194, pcur=0x7f231770cad0, mtr=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0row.c:797
#13 0x0000000000915b8e in row_purge_remove_sec_if_poss_leaf (node=0x7f2327cc5a40, index=0x7f2327e9a0c0, entry=0x7f23280a3920) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0purge.c:375
#14 0x0000000000916f1a in row_purge_remove_sec_if_poss (thr=0x7f2327cc5988) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0purge.c:433
#15 row_purge_del_mark (thr=0x7f2327cc5988) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0purge.c:484
#16 row_purge (thr=0x7f2327cc5988) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0purge.c:765
#17 row_purge_step (thr=0x7f2327cc5988) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0purge.c:805
#18 0x000000000090b52f in que_thr_step (thr=0x7f2327cc5988) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/que/que0que.c:1259
#19 que_run_threads_low (thr=0x7f2327cc5988) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/que/que0que.c:1319
#20 que_run_threads (thr=0x7f2327cc5988) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/que/que0que.c:1356
#21 0x0000000000832057 in trx_purge (limit=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/trx/trx0purge.c:1195
#22 0x000000000082918c in srv_purge_thread (arg=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/srv/srv0srv.c:3967
#23 0x00007f233adc18ca in start_thread () from /lib/libpthread.so.0
#24 0x00007f233a06292d in clone () from /lib/libc.so.6
#25 0x0000000000000000 in ?? ()
Thread 4 (Thread 0x7f233b01e700 (LWP 16691)):
#0  0x00007f233adc61fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x00000000008f6285 in os_cond_wait (event=0x35b59d0, reset_sig_count=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/os/os0sync.c:207
#2  os_event_wait_low (event=0x35b59d0, reset_sig_count=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/os/os0sync.c:609
#3  0x000000000082dab2 in sync_array_wait_event (arr=0x3037d30, index=0) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/sync/sync0arr.c:458
#4  0x000000000082ecfc in mutex_spin_wait (mutex=0x7f2327c848c0, file_name=<value optimized out>, line=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/sync/sync0sync.c:653
#5  0x000000000086f19e in mutex_enter_func (mutex=0x7f2327c848c0, file_name=0xa83d80 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c", line=3667) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/sync0sync.ic:222
#6  pfs_mutex_enter_func (mutex=0x7f2327c848c0, file_name=0xa83d80 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c", line=3667) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/sync0sync.ic:251
#7  0x00000000008773ae in buf_page_init_for_read (err=<value optimized out>, mode=<value optimized out>, space=108, zip_size=<value optimized out>, unzip=<value optimized out>, tablespace_version=<value optimized out>, offset=1603180) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c:3667

>>>>>>>
	buf_pool_mutex_enter(buf_pool);
	buf_pool->n_pend_reads++;
	buf_pool_mutex_exit(buf_pool);
>>>>>>>

#8  0x0000000000885b14 in buf_read_page_low (err=0x7f233b01a808, sync=<value optimized out>, mode=<value optimized out>, space=108, zip_size=0, unzip=0, tablespace_version=124, offset=1603180, trx=0x0) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0rea.c:165
#9  0x0000000000886670 in buf_read_page (space=108, zip_size=0, offset=1603180, trx=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0rea.c:452
#10 0x0000000000875fcb in buf_page_get_gen (space=<value optimized out>, zip_size=<value optimized out>, offset=<value optimized out>, rw_latch=<value optimized out>, guess=0x0, mode=<value optimized out>, file=0xa98d78 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0ins.c", line=2025, mtr=0x7f233b01b3a0) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c:2590
#11 0x0000000000863366 in btr_cur_search_to_nth_level (index=0x7f2327eb40c0, level=<value optimized out>, tuple=<value optimized out>, mode=<value optimized out>, latch_mode=<value optimized out>, cursor=<value optimized out>, has_search_latch=0, file=0xa98d78 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0ins.c", line=2025, mtr=0x7f233b01b3a0) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/btr/btr0cur.c:623
#12 0x00000000009130cf in row_ins_index_entry_low (mode=2, index=0x7f2327eb40c0, entry=0x7f2327ffe8c0, n_ext=<value optimized out>, thr=0x7f23280000c0) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0ins.c:2023
#13 0x000000000091567f in row_ins_index_entry (thr=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0ins.c:2278
#14 row_ins_index_entry_step (thr=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0ins.c:2363
#15 row_ins (thr=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0ins.c:2495
#16 row_ins_step (thr=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0ins.c:2612
#17 0x0000000000809896 in row_insert_for_mysql (mysql_rec=<value optimized out>, prebuilt=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/row/row0mysql.c:1255
#18 0x00000000007f3f81 in ha_innobase::write_row (this=0x3de2c70, record=0x3de48a0 "") at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/handler/ha_innodb.cc:5911
#19 0x000000000069b4ff in handler::ha_write_row (this=0x3de2c70, buf=0x3de48a0 "") at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/sql/handler.cc:5334
#20 0x00000000005803d0 in write_record (thd=0x3a67830, table=0x3de2390, info=0x7f233b01c340) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/sql/sql_insert.cc:1779
#21 0x0000000000586b03 in mysql_insert (thd=0x3a67830, table_list=0x3a77048, fields=<value optimized out>, values_list=<value optimized out>, update_fields=<value optimized out>, update_values=<value optimized out>, duplic=DUP_ERROR, ignore=false) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/sql/sql_insert.cc:956
#22 0x00000000005971c8 in mysql_execute_command (thd=0x3a67830) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/sql/sql_parse.cc:3065
#23 0x000000000059a853 in mysql_parse (thd=0x3a67830, rawbuf=<value optimized out>, length=<value optimized out>, parser_state=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/sql/sql_parse.cc:5811
#24 0x0000000000745144 in Query_log_event::do_apply_event (this=0x7f23100fdb70, rli=0x3a53aa8, query_arg=<value optimized out>, q_len_arg=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/sql/log_event.cc:3471
#25 0x0000000000533815 in Log_event::apply_event (ev=0x7f23100fdb70, thd=<value optimized out>, rli=0x3a53aa8) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/sql/log_event.h:1143
#26 apply_event_and_update_pos (ev=0x7f23100fdb70, thd=<value optimized out>, rli=0x3a53aa8) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/sql/slave.cc:2400
#27 0x0000000000538359 in exec_relay_log_event (thd=<value optimized out>, rli=0x3a53aa8) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/sql/slave.cc:2560
#28 0x0000000000539799 in handle_slave_sql (arg=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/sql/slave.cc:3385
#29 0x00007f233adc18ca in start_thread () from /lib/libpthread.so.0
#30 0x00007f233a06292d in clone () from /lib/libc.so.6
#31 0x0000000000000000 in ?? ()

=============================================

So, all three threads are waitin on buf_pool mutex. One of the threads (out of all) must be holding that mutex and sleeping or trying it recursively.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-08-28:

#17

I have reviewed the whole of XtraDB where it acquires the buffer pool mutex (the mutex split makes such part much smaller than in InnoDB) and I don't see any causes for this. We need a way to reproduce this locally.

Revision history for this message

Bogdan Rudas (bogdar-b) wrote on 2012-08-28:

#18

server error log - two hangs included Edit (366.0 KiB, text/plain)

Hello!
Yesterday I report slave hangs (thanx to Raghavendra D Prabhu)

Now I have another set of debug info from another server: Debian 6 amd64

Revision history for this message

Bogdan Rudas (bogdar-b) wrote on 2012-08-28:

#19

server config - deb64 non-replication Edit (1.5 KiB, text/plain)

Revision history for this message

Bogdan Rudas (bogdar-b) wrote on 2012-08-28:

#20

process list - deb64 non-replication Edit (43.7 KiB, text/plain)

Revision history for this message

Bogdan Rudas (bogdar-b) wrote on 2012-08-28:

#21

gdb trace - deb64 non-replication Edit (1.2 MiB, text/plain)

Revision history for this message

Bogdan Rudas (bogdar-b) wrote on 2012-08-28:

#22

gdb backtrace - centos 5 non-repl Edit (302.1 KiB, text/plain)

The next case is CentOS 5
Also hangs two or three times since last friday

Sadly, not so much info here - debuginfo was installed right befor gdb launch.

No processlist, but as I remember, longest query in processlist was 'optimize table'

Revision history for this message

Bogdan Rudas (bogdar-b) wrote on 2012-08-28:

#23

Centos 5 64bit non-repl Edit (1.1 KiB, text/plain)

Revision history for this message

Bogdan Rudas (bogdar-b) wrote on 2012-08-28:

#24

error log centos 5 64bit non-repl Edit (4.9 MiB, application/octet-stream)

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-08-28:

#25

Bogdan -

Thanks. As a completely unrelated this bug note, based on your deb64 processlist, you might want to look into disabling the query cache. It looks like it's a major bottleneck for writes.

Revision history for this message

Dan Rogers (drogers-l) wrote on 2012-08-28:

#26

After more than 24 hours, the slave-only server I updated to try and replicate this again didn't hang, so it's been downgraded.

The main difference between it and the one that hung is that the hung one has a Fusion-IO based SSD array for storage, and the one that didn't hang is a RAID-10 array of hard drives.

Here's a diff of the config differences between the two servers. db02 is the one that didn't hang, db04 is the one that did. Hopefully this will be helpful.

Dan.

--- sugar-db02/etc/my.cnf 2012-08-28 14:26:10.000000000 -0400
+++ sugar-db04/etc/my.cnf 2012-08-23 05:45:16.000000000 -0400
@@ -49,21 +46,26 @@
read_buffer_size=512K
read_rnd_buffer_size=4M
innodb_file_per_table
-innodb_buffer_pool_size=16G
+innodb_buffer_pool_size=28G

+innodb_adaptive_flushing_method = keep_average
innodb_additional_mem_pool_size=48M
innodb_log_buffer_size=16M
-innodb_flush_log_at_trx_commit=0
+innodb_flush_log_at_trx_commit=2
+innodb_read_ahead=none
+innodb_flush_neighbor_pages=0
innodb_flush_method = ALL_O_DIRECT

innodb_lock_wait_timeout=50
-innodb_log_group_home_dir=/data/logs
+innodb_log_group_home_dir=/dblogs/mysql
innodb_data_home_dir=/data/mysql
innodb_data_file_path=ibdata1:10M:autoextend
innodb_log_files_in_group=2
innodb_log_file_size=4G
+innodb_log_block_size=4096

-innodb_io_capacity=10000
+innodb_io_capacity=20000
innodb_write_io_threads=64
innodb_read_io_threads=64

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-28:

#27

trace2 Edit (116.1 KiB, text/plain)

Based on metadata lock in the attached processlist and

" Sessions could end up deadlocked when executing a combination of SELECT, DROP TABLE, KILL, and SHOW ENGINE INNODB STATUS. (Bug #60682, Bug #12636001)"

as per http://dev.mysql.com/doc/refman/5.5/en/news-5-5-27.html

I tested with http://bugs.mysql.com/file.php?id=17153 attached here http://bugs.mysql.com/bug.php?id=60682 and normal queries to sbtest table.

I am getting a similar trace.

The server also hard locks, so after killing both, the server doesn't shutdown normally and SIGKILL is required.

I also noticed like

""
120829 2:05:24 [ERROR] /usr/sbin/mysqld: Sort aborted: Server shutdown in progress
120829 2:05:25 [ERROR] /usr/sbin/mysqld: Sort aborted: Server shutdown in progress
120829 2:05:25 [ERROR] /usr/sbin/mysqld: Sort aborted: Server shutdown in progress

"""
in the error log.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-28:

#28

From the bug report http://bugs.mysql.com/bug.php?id=60682 (from which the test case was obtained), it looks like it has not been fixed fully yet. (last two comments)

Also, I was able to repeat Mark's mtr test case.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-28:

#29

This is the backtrace http://sprunge.us/cCYf 10 minutes after stopping all the queries. The server is hard stuck in that state.

Raghavendra D Prabhu (raghavendra-prabhu) on 2012-08-28

tags:	added: i24252
tags:	added: i25775

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-08-29:

#30

Everybody -

If it is possible for you to install the debug binaries of the server (not on production of course) to see if that deadlock is replaced with some crash, it would be great.

Raghu -

It's a good find. It might be a separate issue (http://bugs.mysql.com/bug.php?id=60682 is), but some (not all) of your stacktraces are shared with the stacktraces of this bug report.

Revision history for this message

Bogdan Rudas (bogdar-b) wrote on 2012-08-29:

#31

Well, prehaps I can install debug build on one of my servers.
Please, provide me an instructions.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-29:

#32

@Laurynas,

yes, it may not that issue directly, however, a combination of metadata type statements (which I got from that test case) and normal DML queries seems to be triggering it.

I saw similar statements in https://bugs.launchpad.net/percona-server/+bug/1040735/+attachment/3280289/+files/mysql.processlist and from what I heard about in i25775

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-29:

#33

@Bogdan, @others,

If you are using RPM and you have debug symbols installed, then mysqld-debug binary should suffice.

You can start it as

1. Make sure server is not running already.
2. sudo mv /usr/sbin/mysqld /usr/sbin/mysqld-release
3. sudo ln -sf /usr/sbin/mysqld-debug /usr/bin/mysqld

Start the mysql server with your init scripts -- service mysql start.

Revision history for this message

Bogdan (bogdar) wrote on 2012-08-29:

#34

Sorry, I can't play on CentOS system.
I need some way to reproduce it on Debian

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-30:

#35

@Bogdan,

We have produced debug binaries (with UNIV_DEBUG on) so that you can test it :)

Here, http://www.percona.com/downloads/TESTING/5.5-debug/

In that you would want the -server-5.5, -client-5.5, -common and libmysqlclient18

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-30:

#36

I tested with trunk of PS and didn't hit the bug yet.

I hit two assertions though:

1. http://sprunge.us/HDcK - reported in https://bugs.launchpad.net/percona-server/+bug/1038225 , also may be responsible for this bug (since it is fixed in trunk I tested)

2.http://sprunge.us/RMKO - Unlikely to be cause of this but reported separately here https://bugs.launchpad.net/percona-server/+bug/1043620

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-08-30:

#37

Also, for bug mentioned in #27 --

" Sessions could end up deadlocked when executing a combination of SELECT, DROP TABLE, KILL, and SHOW ENGINE INNODB STATUS. (Bug #60682, Bug #12636001)"

Even though this is present in 5.5.27 changelogs, it (commit: http://sprunge.us/AYKP) seems to have been null merged (so without changes) into 5.5.x tree. So, either it is not fixed in 5.5.x or not required there; not sure if this requires a separate bug report.

Revision history for this message

Bogdan Rudas (bogdar-b) wrote on 2012-08-30:

#38

Well, I installed debug build from http://www.percona.com/downloads/TESTING/5.5-debug/
Perhaps it may take 1 or 2 days to reproduce.
I'll get a backtrace and post here ASAP.

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2012-08-30: Re: [Bug 1040735] Re: Server hang in 5.5.27-rel28

#39

On 30.08.12 9:27, Raghavendra D Prabhu wrote:
> Also, for bug mentioned in #27 --
>
> " Sessions could end up deadlocked when executing a combination of
> SELECT, DROP TABLE, KILL, and SHOW ENGINE INNODB STATUS. (Bug #60682,
> Bug #12636001)"
>
> Even though this is present in 5.5.27 changelogs, it (commit:
> http://sprunge.us/AYKP) seems to have been null merged (so without
> changes) into 5.5.x tree. So, either it is not fixed in 5.5.x or not
> required there; not sure if this requires a separate bug report.
>

Quote from revision comments for the 5.1 version of the fix for
http://bugs.mysql.com/bug.php?id=60682 :

"
      Note: The problem is not found in 5.5. Introduction MDL subsystem
      caused metadata locking responsibility to be moved from TDC/TC to
      MDL subsystem. Due to this, responsibility of LOCK_open is reduced.
      As the use of LOCK_open is removed in open_table() and
      mysql_rm_table() the above mentioned CYCLE does not form.
"

Hence the null merge.

Revision history for this message

Ken Zalewski (ken-zalewski) wrote on 2012-08-31:

#40

I am confirming that the New York State Senate has experienced this exact problem as well. We have been running Percona Server (compiled from source) for about a year now, and have never had an issue with any release until several days ago, when we built and deployed 5.5.27-28.0. We had run on our development server for a week with no issues, so we deployed to production yesterday.

Our servers (dev and prod) both experienced semaphore errors in the logs and eventually locked up, where only a kill -9 would terminate the mysqld process. It only happens under heavier load, apparently, which is why we didn't see it happen on our dev server over the last week.

We have reverted to stock MySQL 5.5.27 for now, which is working fine.

This bug caused our Constituent Relationship Management application to fail for all 62 senators and their staffs yesterday. It was fortunate that we had the stock MySQL server ready to spin up in its place, but this has dealt a significant blow to our confidence in utilizing Percona's version of MySQL - especially considering the lack of acknowledgement by Percona. The 5.5.27-28.0 release is still downloadable from http://www.percona.com/downloads/Percona-Server-5.5/ at the time of this posting.

Revision history for this message

Hany Fahim (hany) wrote on 2012-08-31:

#41

I'd also like to confirm that we've been bitten by this bug as well. We had 5.5.27-28.0 deployed to several of our customers and at least 2 of them ran into this particular bug. This does occur whether the server is under load or not. In one customer's case, they were performing an ALTER statement on a brand new server, and ran into this.

In each case, we downgraded to the previous version which resolved the issue.

Whatever we can do to assist in resolving this bug, please let us know.

Revision history for this message

Vadim Tkachenko (vadim-tk) wrote on 2012-09-02:

#42

For a record: I've been running 5.5.27-28.0 under tpcc-mysql multi-thread workload,
on 2 high-end boxes for 3 days and I did not face this bug.

So I can't repeat it with tpcc-mysql.

Revision history for this message

Stewart Smith (stewart) wrote on 2012-09-03:

#43

I tried the C test case from http://bugs.mysql.com/bug.php?id=60682 and couldn't get it to trigger in 5.5 trunk

Revision history for this message

Bogdan Rudas (bogdar-b) wrote on 2012-09-03:

#44

I have debug version on 5.5.27 build running for 95 hours with no issue. Should I try again with release version ?

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-09-03:

#45

Ken -

We are sorry that you are experienced this issue, and fixing it is our highest priority at the moment.

Hany -

You could help us by installing a debug version of 28.0 release on some of your servers (either use mysqld-debug binary in release RPMs, either DEBs at http://www.percona.com/downloads/TESTING/5.5-debug/), installing gdb, and sending us crash messages if it crashes or gdb stacktraces if it deadlocks. For stacktraces, run gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pidof mysqld) at the time of deadlock. Thank you!

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-09-03:

#46

Bogdan -

Thanks. This is a very important data point for us. The debug build you have been running contains a fix for bug 1038225, which we have previously ruled out as non applying here, because the bug-triggering workloads do not necessarily have drop table/truncate table/etc.

We will follow-up shortly.

Revision history for this message

Stewart Smith (stewart) wrote on 2012-09-03:

#47

We'll do a 5.5.27-28.1 release with the fix for bug 1038225 in it, which should hit the archives within a few days. We'll continue investigating if this doesn't seem to resolve this problem.

Revision history for this message

Ignacio Nin (ignacio-nin) wrote on 2012-09-04:

#48

All,

We've posted a release candidate for 5.5.27-rel28.1, which contains a fix for this issue.

The binaries can be downloaded either directly from http://www.percona.com/downloads/TESTING/Percona-Server-55/Percona-Server-5.5.27-rel28.1/release-5.5.27-28.1/296/, or via our experimental repos where it's been uploaded to.

Please give these a try and let us know if it fixes your issue. We appreciate your input!

Regards,

N.

In order to use our experimental repos, install our main repo first (instructions at http://www.percona.com/doc/percona-server/5.5/installation.html#using-percona-software-repositories?id=repositories:start), and then:
- for deb's: add "experimental" to your deb and deb-src lines in sources.list (so it should be something like deb http://repo.percona.com/apt VERSION main experimental)
- for rhel5/rhel6: install http://repo.percona.com/testing/centos/6/os/noarch/percona-testing-0.0-1.noarch.rpm

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-09-05:

#49

All -

Our testing confirmed that this is duplicate of bug 1038225. Release 28.1 contains the fix. It is advised for all 28.0 users to upgrade immediately.

Closing this bug as a duplicate of bug 1038225. Please comment if you experience any further issues.

Thanks.

	Status	Importance	Assigned to
Percona Server moved to https://jira.percona.com/projects/PS	Fix Released	Critical	Unassigned
5.1	New	Undecided	Unassigned
5.5	Fix Released	Critical	Unassigned

Percona Server moved to https://jira.percona.com/projects/PS

Server hang in 5.5.27-rel28

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches