Server hang in 5.5.27-rel28

Bug #1040735 reported by Dan Rogers
92
This bug affects 16 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
Critical
Unassigned
5.1
New
Undecided
Unassigned
5.5
Fix Released
Critical
Unassigned

Bug Description

After upgrading one of my database servers to 5.5.27-rel28 (from 5.5.24-rel26) this morning , it ran for about an hour before hanging all connections due to semaphore problems.

This appears to be related to #1026926 or the fix for it. We never had the original problem, but there's something wrong in this release for sure.

I've rolled this server back to 5.5.25a-27.1 for now.

Here's the my.cnf and I've attached the error log.

Dan.

[mysqladmin]
socket=/var/lib/mysql/mysql.sock
[mysql]
socket=/var/lib/mysql/mysql.sock
[mysqld]
#skip-slave-start
datadir=/data/mysql
socket=/var/lib/mysql/mysql.sock
#performance_schema

slow-query-log = 1
slow-query-log-file=/var/log/mysql/mysqld_slow.log
# Log all queries, subject to the rate limit below
long_query_time = 0
# Log queries that are engaging in these kinds of anti-social behaviours
# log_slow_filter = full_scan,full_join,tmp_table,tmp_table_on_disk,filesort,filesort_on_disk
# Enable microsecond level precision for query times
slow_query_log_timestamp_precision = microsecond
# Log as much detail as we can, including query plan, etc.
log_slow_verbosity = full
# Log every nth session (1 = all)
log_slow_rate_limit = 250
# This will log queries that have data that isn't covered by indexes
# log-queries-not-using-indexes

server-id = 14
log-bin=/dblogs/mysql/mysql-bin
relay-log=sugar-prod-db04-lax1-relay-bin
# gt disabled due to disk space issues on 3/4/10
#log_slave_updates
max_connections=768
max_user_connections=768
join_buffer_size=2M
wait_timeout=15
myisam_sort_buffer_size=32M
table_cache=4000
table_definition_cache=4000
thread_cache_size=256
connect_timeout=5
max_allowed_packet=32M
max_connect_errors=99999999
key_buffer_size=16M
skip-name-resolve
innodb_thread_concurrency= 0
innodb_thread_sleep_delay=0
read_buffer_size=512K
read_rnd_buffer_size=4M
innodb_file_per_table

innodb_buffer_pool_size=28G

innodb_adaptive_flushing_method = keep_average
innodb_additional_mem_pool_size=48M
innodb_log_buffer_size=16M
innodb_flush_log_at_trx_commit=2
innodb_read_ahead=none
innodb_flush_neighbor_pages=0
innodb_flush_method = ALL_O_DIRECT

innodb_lock_wait_timeout=50
innodb_log_group_home_dir=/dblogs/mysql
innodb_data_home_dir=/data/mysql
innodb_data_file_path=ibdata1:10M:autoextend
innodb_log_files_in_group=2
innodb_log_file_size=4G
innodb_log_block_size=4096

innodb_io_capacity=20000
innodb_write_io_threads=64
innodb_read_io_threads=64

# Added 8/25/2009 by areitz.
default-storage-engine = innodb

# Added 3/10/2010 by gtoubassi
character-set-server = utf8

#query_cache_limit = 2M
#query_cache_size = 192M
query_cache_type = OFF
#bind-address=127.0.0.1
max_heap_table_size=64M
tmp_table_size=32M

[mysql.server]
user=mysql

[mysqld_safe]
err-log=/var/log/mysql/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
malloc-lib=/usr/local/lib/libtcmalloc_minimal.so

Revision history for this message
Dan Rogers (drogers-l) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Thanks for your report.

There is a workaround of innodb_lazy_drop_table=1. Can you see if it helps for you?

Revision history for this message
Dan Rogers (drogers-l) wrote :

I don't believe that would help in this case. There were no DROP TABLE or TRUNCATE TABLE commands running. There was a single INSERT being applied by the slave thread, then hundreds of hung SELECTs until the server ran out of connections.

Note as well that mine was hung on:

Mutex at 0xc776078 '&buf_pool->mutex', lock var 1

Whereas the other bug was:

S-lock on RW-latch at 0x416f048 '&buf_pool->page_hash_latch'

So perhaps they aren't related?

Revision history for this message
Thomas Babut (thbabut) wrote :

Same problem here. Multiple servers hang on random UPDATE statements since percona update to 5.5.27-28.0 (all servers with Debian Squeeze x86-64). These servers were running fine with 5.5.25a-27.1. There were no DROP TABLE or TRUNCATE TABLE commands running either.

Revision history for this message
Joern Heissler (joernheissler) wrote :
Download full text (6.4 KiB)

I'm running "percona-server-server-5.5 5.5.27-rel28.0-291.precise" on ubuntu 12.04, 64 bit.

The server runs in slave mode and only replicates, no other load.

The query which is currently being replicated looks like this:
INSERT INTO `sometable` (columms...) VALUES (...),... ON DUPLICATE KEY UPDATE some stuff;

Symptoms: Query got stuck, other queries won't get executed but hang. (show slave status, show processlist are the only exceptions), eventually all connections are used up. Only little CPU load, maybe from replication thread which download logs.

Similar problems happened with 5.5.25, there it was "TRUNCATE TABLE" or "DROP TABLE" iirc.

Another slave running "5.5.24-rel26.0-256.precise" seems to be stable. I downgraded to see if it helps.

InnoDB: Warning: a long semaphore wait:
--Thread 139685069317888 has waited at buf0lru.c line 1102 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619390080768 has waited at buf0flu.c line 1887 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: Warning: a long semaphore wait:
--Thread 139619381688064 has waited at buf0lru.c line 1102 for 241.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
120826 9:20:05 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 15 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2355 1_second, 2283 sleeps, 227 10_second, 4171 background, 4170 flush
srv_master_thread log flush and writes: 3317
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 98450, signal count 319060
--Thread 139685069317888 has waited at buf0lru.c line 1102 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
--Thread 139619390080768 has waited at buf0flu.c line 1887 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
--Thread 139619381688064 has waited at buf0lru.c line 1102 for 248.00 seconds the semaphore:
Mutex at 0x2f0e3f8 '&buf_pool->mutex', lock var 1
waiters flag 1
Mutex spin waits 1025363, rounds 2430593, OS waits 40278
RW-shared spins 388510, rounds 4354413, OS waits 28770
RW-excl spins 124035, rounds 2089687, OS waits 20762
Spin rounds per wait: 2.37 mutex, 11.21 RW-shared, 16.85 RW-excl
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio reque...

Read more...

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Thank you for all the information.

However, it would be better if someone can provide a full backtrace with something like: (debug symbols package for mysql/percona-server must be installed)

 sudo gdb --batch --quiet -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' -p $(pidof mysqld)

Revision history for this message
Joern Heissler (joernheissler) wrote :

Okay, here's the gdb output. Hope it helps :)

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Thank for the gdb output, I presume this backtrace was taken when the server was hung?

Revision history for this message
Joern Heissler (joernheissler) wrote :

Yes, I took it after the server started to spit out logs regarding the problem.

The two servers I downgraded to 5.5.24 are running fine now. Could be coincidence, but I'm pretty sure that it is not.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

The lock contention seems to be mainly about buf_pool mutex in buf_LRU_try_free_flushed_blocks and buf_flush_start. However, I have not been able to reproduce it with configuration in description and sysbench. (also in master slave mode). In the backtrace provided there isn't much in buf0buf except for innodb monitor print.

 If steps to reproduce this can be provided(along with cnf) , that would be great.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Threads 21-12 are idling I/O threads.
Thread 11 is the lock timeout thread, idle.
Thread 10 is the error monitor thread waiting to acquire srv_innodb_monitor_mutex.
Thread 9 is the monitor thread waiting to acquire the buffer pool mutex, it holds LRU and free list mutexes.
Thread 8 is the LRU dump thread, idle.
Thread 7 is the InnoDB master thread doing buffer flush, waiting to acquire the buffer pool mutex.
Thread 6 is the purge worker thread, waiting to lock the index tree.
Thread 5 is the signal handler thread, idle.
Thread 4 is the slave I/O thread.
Thread 3 is the slave SQL thread, waiting to acquire the buffer pool mutex.
Thread 2 is a cached MySQL connection thread.
Thread 1 is the main server thread.

It is not apparent which of the threads deadlock here, but it's a reasonably small number of threads for further analysis.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

See also comments #10 and #11 on bug 1026926, and
comments #28 and #29 on bug 1007268.

Revision history for this message
Dan Rogers (drogers-l) wrote :

I've updated one of my slave-only databases to 5.5.27-28.0 and installed the debuginfo package to see if I can get it to hang like the other one did. The configurations are nearly identical, so hopefully it'll provide useful information.

As another data point, the server that hung last week and was the basis for this bug report was downgraded to 5.5.25a-27.1 and has been running fine since August 23rd on this release. With 5.5.27-28.0, it hung only an hour after being restarted.

Revision history for this message
Joern Heissler (joernheissler) wrote :

> If steps to reproduce this can be provided(along with cnf) , that would be great.

Not really. I just start my server and it hangs before long. In debug output from #7 no other connections were made to the server (blocked via iptables), so only replication was running. It hang almost immediately.

Configuration is attached.
The init-file shouldn't be at fault; when I skip the slave start, it seems to work, until I start it.

Btw, 5.5.24 still not crashed since downgrade on two servers.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Bogdar on #percona channel reported a similar reproducible hang on his slave.

Here is the backtrace:

http://pastebin.ca/2198485

Monitor output: http://pastebin.ca/2198486 http://pastebin.ca/2198488

It has also been reported/discussed here http://forum.percona.com/index.php?t=msg&th=2522&goto=9240&

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :
Download full text (18.2 KiB)

From the previous backtrace, this is what the gist is: (annotated with where it spins)

Thread 8 (Thread 0x7f2317f0e700 (LWP 16676)):
#0 0x00007f233adc61fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1 0x00000000008f6285 in os_cond_wait (event=0x35b59d0, reset_sig_count=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/os/os0sync.c:207
#2 os_event_wait_low (event=0x35b59d0, reset_sig_count=1) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/os/os0sync.c:609
#3 0x000000000082dab2 in sync_array_wait_event (arr=0x3037d30, index=2) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/sync/sync0arr.c:458
#4 0x000000000082ecfc in mutex_spin_wait (mutex=0x7f2327c848c0, file_name=<value optimized out>, line=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/sync/sync0sync.c:653
#5 0x000000000086f19e in mutex_enter_func (mutex=0x7f2327c848c0, file_name=0xa83d80 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c", line=3667) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/sync0sync.ic:222
#6 pfs_mutex_enter_func (mutex=0x7f2327c848c0, file_name=0xa83d80 "/home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c", line=3667) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/include/sync0sync.ic:251
#7 0x00000000008773ae in buf_page_init_for_read (err=<value optimized out>, mode=<value optimized out>, space=108, zip_size=<value optimized out>, unzip=<value optimized out>, tablespace_version=<value optimized out>, offset=1771977) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0buf.c:3667

>>>>
 buf_pool_mutex_enter(buf_pool);
 buf_pool->n_pend_reads++;
 buf_pool_mutex_exit(buf_pool);
>>>>

#8 0x0000000000885b14 in buf_read_page_low (err=0x7f2317f0d028, sync=<value optimized out>, mode=<value optimized out>, space=108, zip_size=0, unzip=0, tablespace_version=124, offset=1771977, trx=0x0) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0rea.c:165
#9 0x00000000008865a5 in buf_read_ibuf_merge_pages (sync=<value optimized out>, space_ids=<value optimized out>, space_versions=<value optimized out>, page_nos=0x7f2317f0d6a0, n_stored=<value optimized out>) at /home/jenkins/workspace/percona-server-5.5-debs/label_exp/debian6-64/target/Percona-Server-5.5.27-rel28.0/storage/innobase/buf/buf0rea.c:816
#10 0x00000000008c9440 in ibuf_contract_ext (n_pages=0x7f2317f0d728, sync=<value optimized out>) at /home/jenkins/workspace/perc...

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

I have reviewed the whole of XtraDB where it acquires the buffer pool mutex (the mutex split makes such part much smaller than in InnoDB) and I don't see any causes for this. We need a way to reproduce this locally.

Revision history for this message
Bogdan Rudas (bogdar-b) wrote :

Hello!
Yesterday I report slave hangs (thanx to Raghavendra D Prabhu)

Now I have another set of debug info from another server: Debian 6 amd64

Revision history for this message
Bogdan Rudas (bogdar-b) wrote :
Revision history for this message
Bogdan Rudas (bogdar-b) wrote :
Revision history for this message
Bogdan Rudas (bogdar-b) wrote :
Revision history for this message
Bogdan Rudas (bogdar-b) wrote :

The next case is CentOS 5
Also hangs two or three times since last friday

Sadly, not so much info here - debuginfo was installed right befor gdb launch.

No processlist, but as I remember, longest query in processlist was 'optimize table'

Revision history for this message
Bogdan Rudas (bogdar-b) wrote :
Revision history for this message
Bogdan Rudas (bogdar-b) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Bogdan -

Thanks. As a completely unrelated this bug note, based on your deb64 processlist, you might want to look into disabling the query cache. It looks like it's a major bottleneck for writes.

Revision history for this message
Dan Rogers (drogers-l) wrote :

After more than 24 hours, the slave-only server I updated to try and replicate this again didn't hang, so it's been downgraded.

The main difference between it and the one that hung is that the hung one has a Fusion-IO based SSD array for storage, and the one that didn't hang is a RAID-10 array of hard drives.

Here's a diff of the config differences between the two servers. db02 is the one that didn't hang, db04 is the one that did. Hopefully this will be helpful.

Dan.

--- sugar-db02/etc/my.cnf 2012-08-28 14:26:10.000000000 -0400
+++ sugar-db04/etc/my.cnf 2012-08-23 05:45:16.000000000 -0400
@@ -49,21 +46,26 @@
 read_buffer_size=512K
 read_rnd_buffer_size=4M
 innodb_file_per_table
-innodb_buffer_pool_size=16G
+innodb_buffer_pool_size=28G

+innodb_adaptive_flushing_method = keep_average
 innodb_additional_mem_pool_size=48M
 innodb_log_buffer_size=16M
-innodb_flush_log_at_trx_commit=0
+innodb_flush_log_at_trx_commit=2
+innodb_read_ahead=none
+innodb_flush_neighbor_pages=0
 innodb_flush_method = ALL_O_DIRECT

 innodb_lock_wait_timeout=50
-innodb_log_group_home_dir=/data/logs
+innodb_log_group_home_dir=/dblogs/mysql
 innodb_data_home_dir=/data/mysql
 innodb_data_file_path=ibdata1:10M:autoextend
 innodb_log_files_in_group=2
 innodb_log_file_size=4G
+innodb_log_block_size=4096

-innodb_io_capacity=10000
+innodb_io_capacity=20000
 innodb_write_io_threads=64
 innodb_read_io_threads=64

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Based on metadata lock in the attached processlist and

" Sessions could end up deadlocked when executing a combination of SELECT, DROP TABLE, KILL, and SHOW ENGINE INNODB STATUS. (Bug #60682, Bug #12636001)"

as per http://dev.mysql.com/doc/refman/5.5/en/news-5-5-27.html

I tested with http://bugs.mysql.com/file.php?id=17153 attached here http://bugs.mysql.com/bug.php?id=60682 and normal queries to sbtest table.

I am getting a similar trace.

The server also hard locks, so after killing both, the server doesn't shutdown normally and SIGKILL is required.

I also noticed like

""
120829 2:05:24 [ERROR] /usr/sbin/mysqld: Sort aborted: Server shutdown in progress
120829 2:05:25 [ERROR] /usr/sbin/mysqld: Sort aborted: Server shutdown in progress
120829 2:05:25 [ERROR] /usr/sbin/mysqld: Sort aborted: Server shutdown in progress

"""
in the error log.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

From the bug report http://bugs.mysql.com/bug.php?id=60682 (from which the test case was obtained), it looks like it has not been fixed fully yet. (last two comments)

Also, I was able to repeat Mark's mtr test case.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

This is the backtrace http://sprunge.us/cCYf 10 minutes after stopping all the queries. The server is hard stuck in that state.

tags: added: i24252
tags: added: i25775
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Everybody -

If it is possible for you to install the debug binaries of the server (not on production of course) to see if that deadlock is replaced with some crash, it would be great.

Raghu -

It's a good find. It might be a separate issue (http://bugs.mysql.com/bug.php?id=60682 is), but some (not all) of your stacktraces are shared with the stacktraces of this bug report.

Revision history for this message
Bogdan Rudas (bogdar-b) wrote :

Well, prehaps I can install debug build on one of my servers.
Please, provide me an instructions.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Laurynas,

yes, it may not that issue directly, however, a combination of metadata type statements (which I got from that test case) and normal DML queries seems to be triggering it.

I saw similar statements in https://bugs.launchpad.net/percona-server/+bug/1040735/+attachment/3280289/+files/mysql.processlist and from what I heard about in i25775

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Bogdan, @others,

If you are using RPM and you have debug symbols installed, then mysqld-debug binary should suffice.

You can start it as

1. Make sure server is not running already.
2. sudo mv /usr/sbin/mysqld /usr/sbin/mysqld-release
3. sudo ln -sf /usr/sbin/mysqld-debug /usr/bin/mysqld

Start the mysql server with your init scripts -- service mysql start.

Revision history for this message
Bogdan (bogdar) wrote :

Sorry, I can't play on CentOS system.
I need some way to reproduce it on Debian

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Bogdan,

We have produced debug binaries (with UNIV_DEBUG on) so that you can test it :)

Here, http://www.percona.com/downloads/TESTING/5.5-debug/

In that you would want the -server-5.5, -client-5.5, -common and libmysqlclient18

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I tested with trunk of PS and didn't hit the bug yet.

I hit two assertions though:

1. http://sprunge.us/HDcK - reported in https://bugs.launchpad.net/percona-server/+bug/1038225 , also may be responsible for this bug (since it is fixed in trunk I tested)

2.http://sprunge.us/RMKO - Unlikely to be cause of this but reported separately here https://bugs.launchpad.net/percona-server/+bug/1043620

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Also, for bug mentioned in #27 --

" Sessions could end up deadlocked when executing a combination of SELECT, DROP TABLE, KILL, and SHOW ENGINE INNODB STATUS. (Bug #60682, Bug #12636001)"

Even though this is present in 5.5.27 changelogs, it (commit: http://sprunge.us/AYKP) seems to have been null merged (so without changes) into 5.5.x tree. So, either it is not fixed in 5.5.x or not required there; not sure if this requires a separate bug report.

Revision history for this message
Bogdan Rudas (bogdar-b) wrote :

Well, I installed debug build from http://www.percona.com/downloads/TESTING/5.5-debug/
Perhaps it may take 1 or 2 days to reproduce.
I'll get a backtrace and post here ASAP.

Revision history for this message
Alexey Kopytov (akopytov) wrote : Re: [Bug 1040735] Re: Server hang in 5.5.27-rel28

On 30.08.12 9:27, Raghavendra D Prabhu wrote:
> Also, for bug mentioned in #27 --
>
> " Sessions could end up deadlocked when executing a combination of
> SELECT, DROP TABLE, KILL, and SHOW ENGINE INNODB STATUS. (Bug #60682,
> Bug #12636001)"
>
> Even though this is present in 5.5.27 changelogs, it (commit:
> http://sprunge.us/AYKP) seems to have been null merged (so without
> changes) into 5.5.x tree. So, either it is not fixed in 5.5.x or not
> required there; not sure if this requires a separate bug report.
>

Quote from revision comments for the 5.1 version of the fix for
http://bugs.mysql.com/bug.php?id=60682 :

"
      Note: The problem is not found in 5.5. Introduction MDL subsystem
      caused metadata locking responsibility to be moved from TDC/TC to
      MDL subsystem. Due to this, responsibility of LOCK_open is reduced.
      As the use of LOCK_open is removed in open_table() and
      mysql_rm_table() the above mentioned CYCLE does not form.
"

Hence the null merge.

Revision history for this message
Ken Zalewski (ken-zalewski) wrote :

I am confirming that the New York State Senate has experienced this exact problem as well. We have been running Percona Server (compiled from source) for about a year now, and have never had an issue with any release until several days ago, when we built and deployed 5.5.27-28.0. We had run on our development server for a week with no issues, so we deployed to production yesterday.

Our servers (dev and prod) both experienced semaphore errors in the logs and eventually locked up, where only a kill -9 would terminate the mysqld process. It only happens under heavier load, apparently, which is why we didn't see it happen on our dev server over the last week.

We have reverted to stock MySQL 5.5.27 for now, which is working fine.

This bug caused our Constituent Relationship Management application to fail for all 62 senators and their staffs yesterday. It was fortunate that we had the stock MySQL server ready to spin up in its place, but this has dealt a significant blow to our confidence in utilizing Percona's version of MySQL - especially considering the lack of acknowledgement by Percona. The 5.5.27-28.0 release is still downloadable from http://www.percona.com/downloads/Percona-Server-5.5/ at the time of this posting.

Revision history for this message
Hany Fahim (hany) wrote :

I'd also like to confirm that we've been bitten by this bug as well. We had 5.5.27-28.0 deployed to several of our customers and at least 2 of them ran into this particular bug. This does occur whether the server is under load or not. In one customer's case, they were performing an ALTER statement on a brand new server, and ran into this.

In each case, we downgraded to the previous version which resolved the issue.

Whatever we can do to assist in resolving this bug, please let us know.

Revision history for this message
Vadim Tkachenko (vadim-tk) wrote :

For a record: I've been running 5.5.27-28.0 under tpcc-mysql multi-thread workload,
on 2 high-end boxes for 3 days and I did not face this bug.

So I can't repeat it with tpcc-mysql.

Revision history for this message
Stewart Smith (stewart) wrote :

I tried the C test case from http://bugs.mysql.com/bug.php?id=60682 and couldn't get it to trigger in 5.5 trunk

Revision history for this message
Bogdan Rudas (bogdar-b) wrote :

I have debug version on 5.5.27 build running for 95 hours with no issue. Should I try again with release version ?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Ken -

We are sorry that you are experienced this issue, and fixing it is our highest priority at the moment.

Hany -

You could help us by installing a debug version of 28.0 release on some of your servers (either use mysqld-debug binary in release RPMs, either DEBs at http://www.percona.com/downloads/TESTING/5.5-debug/), installing gdb, and sending us crash messages if it crashes or gdb stacktraces if it deadlocks. For stacktraces, run gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pidof mysqld) at the time of deadlock. Thank you!

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Bogdan -

Thanks. This is a very important data point for us. The debug build you have been running contains a fix for bug 1038225, which we have previously ruled out as non applying here, because the bug-triggering workloads do not necessarily have drop table/truncate table/etc.

We will follow-up shortly.

Revision history for this message
Stewart Smith (stewart) wrote :

We'll do a 5.5.27-28.1 release with the fix for bug 1038225 in it, which should hit the archives within a few days. We'll continue investigating if this doesn't seem to resolve this problem.

Revision history for this message
Ignacio Nin (ignacio-nin) wrote :

All,

We've posted a release candidate for 5.5.27-rel28.1, which contains a fix for this issue.

The binaries can be downloaded either directly from http://www.percona.com/downloads/TESTING/Percona-Server-55/Percona-Server-5.5.27-rel28.1/release-5.5.27-28.1/296/, or via our experimental repos where it's been uploaded to.

Please give these a try and let us know if it fixes your issue. We appreciate your input!

Regards,

N.

In order to use our experimental repos, install our main repo first (instructions at http://www.percona.com/doc/percona-server/5.5/installation.html#using-percona-software-repositories?id=repositories:start), and then:
- for deb's: add "experimental" to your deb and deb-src lines in sources.list (so it should be something like deb http://repo.percona.com/apt VERSION main experimental)
- for rhel5/rhel6: install http://repo.percona.com/testing/centos/6/os/noarch/percona-testing-0.0-1.noarch.rpm

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

All -

Our testing confirmed that this is duplicate of bug 1038225. Release 28.1 contains the fix. It is advised for all 28.0 users to upgrade immediately.

Closing this bug as a duplicate of bug 1038225. Please comment if you experience any further issues.

Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.