segfault during shutdown

Bug #974093 reported by Teemu Ollakka
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
New
Undecided
Unassigned
5.5
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Incomplete
Undecided
Unassigned
5.6
Incomplete
Undecided
Unassigned

Bug Description

Server crashed during graceful shutdown under load:

120405 12:23:06 [Note] WSREP: New cluster view: global state: 49ee59de-7efe-11e1-0800-e1b4a35e6388:144388, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 1
120405 12:23:06 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120405 12:23:06 [Note] WSREP: applier thread exiting (code:0)
120405 12:23:06 [Note] WSREP: closing applier 3
120405 12:23:06 [Note] WSREP: recv_thread() joined.
120405 12:23:06 [Note] WSREP: Closing slave action queue.
120405 12:23:06 [Note] WSREP: closing connection 788
120405 12:23:06 [Note] WSREP: closing connection 781
120405 12:23:06 [Note] WSREP: Deadlock error for: (null)
120405 12:23:06 [Note] WSREP: BF aborted, thd: 781 is_AC: 0, retry: 0 - 1 SQL: (null)
09:23:06 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=16
max_threads=1024
thread_count=9
connection_count=9
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2238930 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
120405 12:23:06 [Note] WSREP: Before Lock_thread_count
120405 12:23:06 [Note] WSREP: applier thread exiting (code:5)
120405 12:23:06 [Note] WSREP: applier thread exiting (code:5)
120405 12:23:06 [Note] WSREP: applier thread exiting (code:5)
stack_bottom = 0 thread_stack 0x40000
/run/shm/galera/local3/mysql/sbin/mysqld(my_print_stacktrace+0x2e)[0x7c920e]
/run/shm/galera/local3/mysql/sbin/mysqld(handle_fatal_signal+0x376)[0x698e06]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10060)[0x7f7a8f7f4060]
/lib/x86_64-linux-gnu/libpthread.so.0(__pthread_mutex_unlock+0x4)[0x7f7a8f7ef1d4]
/run/shm/galera/local3/mysql/sbin/mysqld[0x515541]
/run/shm/galera/local3/mysql/sbin/mysqld(_Z30wsrep_close_client_connectionsc+0x12a)[0x516dfa]
/run/shm/galera/local3/mysql/sbin/mysqld(_Z22wsrep_stop_replicationP3THD+0x93)[0x653803]
/run/shm/galera/local3/mysql/sbin/mysqld[0x518309]
/run/shm/galera/local3/mysql/sbin/mysqld(kill_server_thread+0xe)[0x5184ce]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc)[0x7f7a8f7ebefc]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f7a8f52659d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file

Backtrace:

(gdb) bt
#0 __pthread_kill (threadid=<optimized out>, signo=11) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:63
#1 0x0000000000698dc9 in handle_fatal_signal (sig=11)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/signal_handler.cc:246
#2 <signal handler called>
#3 __pthread_mutex_unlock_usercnt (mutex=<optimized out>, decr=<optimized out>) at pthread_mutex_unlock.c:37
#4 __pthread_mutex_unlock (mutex=0x1) at pthread_mutex_unlock.c:290
#5 0x0000000000515541 in inline_mysql_mutex_unlock (that=0x0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/include/mysql/psi/mysql_thread.h:671
#6 wsrep_close_thread (thd=0x7f7a7c788380)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/mysqld.cc:4443
#7 0x0000000000516dfa in wsrep_close_client_connections (wait_to_end=1 '\001')
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/mysqld.cc:4476
#8 0x0000000000653803 in wsrep_stop_replication (thd=0x0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/wsrep_mysqld.cc:566
#9 0x0000000000518309 in kill_server (sig_ptr=0x0)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/mysqld.cc:1362
#10 0x00000000005184ce in kill_server_thread (arg=<optimized out>)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/mysqld.cc:1396
#11 0x00007f7a8f7ebefc in start_thread (arg=0x7f7a71693700) at pthread_create.c:304
#12 0x00007f7a8f52659d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#13 0x0000000000000000 in ?? ()

(gdb) f 6
#6 wsrep_close_thread (thd=0x7f7a7c788380)
    at /home/teemu/codership/galera/bzr/codership-mysql/5.5/sql/mysqld.cc:4443
4443 mysql_mutex_unlock(thd->mysys_var->current_mutex);
(gdb) p thd->mysys_var->current_mutex
$4 = (mysql_mutex_t * volatile) 0x0

Revision history for this message
Alex Yurchenko (ayurchen) wrote :
Download full text (7.6 KiB)

Seems to be still there as of r3840. Reproduced with heavy CPU-contending load (total 127 threads at the moment of crash).

Offending code:
    if (thd->mysys_var->current_cond)
    {
      mysql_mutex_lock(thd->mysys_var->current_mutex);
      mysql_cond_broadcast(thd->mysys_var->current_cond);
      mysql_mutex_unlock(thd->mysys_var->current_mutex);
    }

by the time mysql_mutex_unlock() is called both current_cond and current_mutex are 0x0.

Out of 8 wsrep slave threads 1 is applying writeset and other 7 are waiting fro commit.

Thread 2 (Thread 0x7fde783df700 (LWP 17210)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007fde87b82fd5 in wait (cond=..., this=<optimized out>) at galerautils/src/gu_lock.hpp:56
#2 galera::Monitor<galera::ReplicatorSMM::CommitOrder>::enter (this=this@entry=0x295ed58, obj=...) at galera/src/monitor.hpp:126
#3 0x00007fde87b7c9d2 in galera::ReplicatorSMM::apply_trx (this=this@entry=0x295e2c0, recv_ctx=recv_ctx@entry=0x7fde40000990, trx=trx@entry=0x7fde40059990) at galera/src/replicator_smm.cpp:471

Thread 85 (Thread 0x7fde78524700 (LWP 17205)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007fde87b82fd5 in wait (cond=..., this=<optimized out>) at galerautils/src/gu_lock.hpp:56
#2 galera::Monitor<galera::ReplicatorSMM::CommitOrder>::enter (this=this@entry=0x295ed58, obj=...) at galera/src/monitor.hpp:126
#3 0x00007fde87b7c9d2 in galera::ReplicatorSMM::apply_trx (this=this@entry=0x295e2c0, recv_ctx=recv_ctx@entry=0x7fde30000990, trx=trx@entry=0x7fde3002c790) at galera/src/replicator_smm.cpp:471

Thread 95 (Thread 0x7fde78461700 (LWP 17208)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007fde87b82fd5 in wait (cond=..., this=<optimized out>) at galerautils/src/gu_lock.hpp:56
#2 galera::Monitor<galera::ReplicatorSMM::CommitOrder>::enter (this=this@entry=0x295ed58, obj=...) at galera/src/monitor.hpp:126
#3 0x00007fde87b7c9d2 in galera::ReplicatorSMM::apply_trx (this=this@entry=0x295e2c0, recv_ctx=recv_ctx@entry=0x7fde48000990, trx=trx@entry=0x7fde48224520) at galera/src/replicator_smm.cpp:471

Thread 97 (Thread 0x7fde784a2700 (LWP 17207)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007fde87b82fd5 in wait (cond=..., this=<optimized out>) at galerautils/src/gu_lock.hpp:56
#2 galera::Monitor<galera::ReplicatorSMM::CommitOrder>::enter (this=this@entry=0x295ed58, obj=...) at galera/src/monitor.hpp:126
#3 0x00007fde87b7c9d2 in galera::ReplicatorSMM::apply_trx (this=this@entry=0x295e2c0, recv_ctx=recv_ctx@entry=0x7fde2c000990, trx=trx@entry=0x7fde2c3d85a0) at galera/src/replicator_smm.cpp:471

Thread 99 (Thread 0x7fde78565700 (LWP 17204)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007fde87b82fd5 in wait (cond=..., this=<optimized out>) at galerautils/src/gu_lock.hpp:56
#2 galera::Monitor<galera::ReplicatorSMM::CommitOrder>::enter (this=this@entry=0x295ed58, obj=...) at galera/src/mon...

Read more...

no longer affects: codership-mysql/5.5
Changed in codership-mysql:
milestone: 5.5.33-23.7.6 → none
Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

Can you check the same with latest version of PXC 5.5.41?

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1213

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.