deadlock between PurgeAndDiscard() and apply_trx()

Bug #1189526 reported by Teemu Ollakka
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Galera
Fix Released
High
Teemu Ollakka
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Fix Released
Undecided
Unassigned

Bug Description

Galera provider may deadlock if applier thread is still executing apply_trx() while processing commit cut causes corresponding trx to be purged from cert index. Threads try to lock cert index and trx in different order.

Thread backtraces:

Thread 28 (Thread 0x7f995d2ef700 (LWP 27500)):
#0 0x00007fb6c447389c in __lll_lock_wait () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007fb6c446f065 in _L_lock_858 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007fb6c446eeba in pthread_mutex_lock () from
/lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007fb6c24d6457 in gu::Lock::Lock(gu::Mutex const&) () from
/usr/lib/galera/libgalera_smm.so
#4 0x00007fb6c25e7300 in galera::ReplicatorSMM::apply_trx(void*,
galera::TrxHandle*) () from /usr/lib/galera/libgalera_smm.so
#5 0x00007fb6c25e77c5 in galera::ReplicatorSMM::process_trx(void*,
galera::TrxHandle*) () from /usr/lib/galera/libgalera_smm.so
#6 0x00007fb6c25be124 in galera::GcsActionSource::dispatch(void*,
gcs_action const&) () from /usr/lib/galera/libgalera_smm.so
#7 0x00007fb6c25be39a in galera::GcsActionSource::process(void*) ()
from /usr/lib/galera/libgalera_smm.so
#8 0x00007fb6c25df175 in galera::ReplicatorSMM::async_recv(void*) ()
from /usr/lib/galera/libgalera_smm.so
#9 0x00007fb6c25f6f93 in galera_recv () from
/usr/lib/galera/libgalera_smm.so
#10 0x00007fb6c5ab1ab1 in wsrep_replication_process(THD*) ()
#11 0x00007fb6c5a2d71b in start_wsrep_THD ()
#12 0x00007fb6c446ce9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007fb6c3b9acbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()

Thread 21 (Thread 0x7f99d88eb700 (LWP 27507)):
#0 0x00007fb6c447389c in __lll_lock_wait () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007fb6c446f065 in _L_lock_858 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007fb6c446eeba in pthread_mutex_lock () from
/lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007fb6c25b9c4d in
galera::Certification::PurgeAndDiscard::operator()(std::pair<long const,
galera::TrxHandle*>&) const () from /usr/lib/galera/libgalera_smm.so
#4 0x00007fb6c25b429c in galera::Certification::purge_trxs_upto_(long)
() from /usr/lib/galera/libgalera_smm.so
#5 0x00007fb6c25de292 in
galera::ReplicatorSMM::process_commit_cut(long, long) () from
/usr/lib/galera/libgalera_smm.so
#6 0x00007fb6c25be161 in galera::GcsActionSource::dispatch(void*,
gcs_action const&) () from /usr/lib/galera/libgalera_smm.so
#7 0x00007fb6c25be39a in galera::GcsActionSource::process(void*) ()
from /usr/lib/galera/libgalera_smm.so
#8 0x00007fb6c25df175 in galera::ReplicatorSMM::async_recv(void*) ()
from /usr/lib/galera/libgalera_smm.so
#9 0x00007fb6c25f6f93 in galera_recv () from
/usr/lib/galera/libgalera_smm.so
#10 0x00007fb6c5ab1ab1 in wsrep_replication_process(THD*) ()
#11 0x00007fb6c5a2d71b in start_wsrep_THD ()
#12 0x00007fb6c446ce9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007fb6c3b9acbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()

Changed in galera:
milestone: none → 23.2.6
assignee: nobody → Teemu Ollakka (teemu-ollakka)
Changed in percona-xtradb-cluster:
milestone: none → 5.5.31-24.8
Changed in galera:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Potential fix committed in r152

Changed in galera:
status: Confirmed → In Progress
status: In Progress → Fix Committed
Changed in percona-xtradb-cluster:
status: New → Fix Released
Changed in galera:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1369

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.