rbd snap_unprotect deadlock

Bug #1731819 reported by zhengxiang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Fix Released
Medium
Unassigned
Xenial
Triaged
Medium
Unassigned
Zesty
Triaged
Medium
Unassigned
Artful
Triaged
Medium
Unassigned
Bionic
Fix Released
Medium
Unassigned

Bug Description

Hello everyone:

I'm using openstack mitaka and ceph-jewel-10.2.10 to do snapshot actions. And sometimes it occurs below deadlock condition.

ps -ef | grep cinder-volume
gdb -q python-dbg -p xx

I found 2 frames are racing the lock:

Thread 14 (Thread 0x7f510784c700 (LWP 759193)):
#0 0x00007f513272603e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1 0x00007f5112a4a83c in RWLock::get_write (this=0x5db1258, lockdep=<optimized out>) at ./common/RWLock.h:123
#2 0x00007f5112ad77c5 in WLocker (lock=..., this=<synthetic pointer>) at ./common/RWLock.h:183
#3 librbd::image::RefreshRequest<librbd::ImageCtx>::apply (this=this@entry=0x7f507c02bf10) at librbd/image/RefreshRequest.cc:855
#4 0x00007f5112ad87f8 in librbd::image::RefreshRequest<librbd::ImageCtx>::handle_v2_apply (this=0x7f507c02bf10, result=result@entry=0x7f510784bb2c) at librbd/image/RefreshRequest.cc:655
#5 0x00007f5112ad89ab in librbd::util::detail::C_StateCallbackAdapter<librbd::image::RefreshRequest<librbd::ImageCtx>, &librbd::image::RefreshRequest<librbd::ImageCtx>::handle_v2_apply, true>::complete (this=0x7f507c31e7b0, r=0) at ./librbd/Utils.h:66
#6 0x00007f5112a3eb54 in ContextWQ::process (this=0x6e96b20, ctx=0x7f507c31e7b0) at ./common/WorkQueue.h:611
#7 0x00007f5112c37a7e in ThreadPool::worker (this=0x7c222b0, wt=0x60fe290) at common/WorkQueue.cc:128
#8 0x00007f5112c38950 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:448
#9 0x00007f5132722dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f5131d4873d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f5132f10740 (LWP 2617826)):
#0 0x00007f51327266d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f5112a14b60 in Wait (mutex=..., this=0x7ffd0e2ba8f0) at ./common/Cond.h:56
#2 C_SaferCond::wait (this=this@entry=0x7ffd0e2ba890) at ./common/Cond.h:202
#3 0x00007f5112ab5b8e in librbd::Operations<librbd::ImageCtx>::snap_unprotect (this=0x562bdb0, snap_name=snap_name@entry=0x58fa894 "snapshot-90259d85-3edc-40e7-b306-cfff1b855cd6")
    at librbd/Operations.cc:1079
#4 0x00007f51129fb0d4 in rbd_snap_unprotect (image=0x5db10c0, snap_name=snap_name@entry=0x58fa894 "snapshot-90259d85-3edc-40e7-b306-cfff1b855cd6") at librbd/librbd.cc:2385
#5 0x00007f511c32f427 in __pyx_pf_3rbd_5Image_50unprotect_snap (__pyx_v_self=0x7017260, __pyx_v_self=0x7017260, __pyx_v_name=0x58fa870) at rbd.c:12928
#6 __pyx_pw_3rbd_5Image_51unprotect_snap (__pyx_v_self=0x7017260, __pyx_v_name=<optimized out>) at rbd.c:12843
#7 0x00007f5132a1ba62 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
......

The full backtrace is in the attachment.

Thancks a lot if anyone can give advise, ^ _ ^

Tags: deadlock rbd
Revision history for this message
zhengxiang (zhengxiang-chn) wrote :
Revision history for this message
Jason Dillaman (jdillaman) wrote :

Tracked via upstream ticket: http://tracker.ceph.com/issues/22120

Revision history for this message
James Page (james-page) wrote :

Work inflight upstream to fix and then backport to L and J; Ubuntu will pick those up with the next set of point releases.

Changed in ceph (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in ceph (Ubuntu Artful):
status: New → Triaged
Changed in ceph (Ubuntu Zesty):
status: New → Triaged
Changed in ceph (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Medium
Changed in ceph (Ubuntu Zesty):
importance: Undecided → Medium
Changed in ceph (Ubuntu Artful):
importance: Undecided → Medium
James Page (james-page)
Changed in ceph (Ubuntu Bionic):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 12.2.4-0ubuntu1

---------------
ceph (12.2.4-0ubuntu1) bionic; urgency=medium

  [ James Page ]
  * New upstream point release (LP: #1750826, #1731819, #1718134).
  * d/ceph-osd.install: Add ceph-volume tools (LP: #1750376).
  * d/*: wrap-and-sort -bast.
  * d/control,compat: Bump debhelper compat level to 10.
  * d/control: Switch to using python3-sphinx.
  * d/rules: Switch to using WITH_BOOST_CONTEXT for rgw beast frontend
    enablement.
  * d/rules,control: Switch to using vendored boost as 1.66 is required.
  * d/control: Add python-jinja2 to Depends of ceph-mgr (LP: #1752308).

  [ Tiago Stürmer Daitx ]
  * Update java source and target flags from 1.5 to 1.8. Allows it to run
    using OpenJDK 8 or earlier and to be build with OpenJDK 9, 10, and 11
    (LP: #1756854).

  [ James Page ]
  * d/ceph*.prerm: Drop, no longer needed as only use for removed upstart
    and init.d methods of managing ceph daemons (LP: #1754585).

 -- James Page <email address hidden> Tue, 20 Mar 2018 09:28:22 +0000

Changed in ceph (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.