rbd snap_unprotect deadlock

Bug #1731819 reported by zhengxiang on 2017-11-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Medium
Unassigned
Xenial
Medium
Unassigned
Zesty
Medium
Unassigned
Artful
Medium
Unassigned
Bionic
Medium
Unassigned

Bug Description

Hello everyone:

I'm using openstack mitaka and ceph-jewel-10.2.10 to do snapshot actions. And sometimes it occurs below deadlock condition.

ps -ef | grep cinder-volume
gdb -q python-dbg -p xx

I found 2 frames are racing the lock:

Thread 14 (Thread 0x7f510784c700 (LWP 759193)):
#0 0x00007f513272603e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1 0x00007f5112a4a83c in RWLock::get_write (this=0x5db1258, lockdep=<optimized out>) at ./common/RWLock.h:123
#2 0x00007f5112ad77c5 in WLocker (lock=..., this=<synthetic pointer>) at ./common/RWLock.h:183
#3 librbd::image::RefreshRequest<librbd::ImageCtx>::apply (this=this@entry=0x7f507c02bf10) at librbd/image/RefreshRequest.cc:855
#4 0x00007f5112ad87f8 in librbd::image::RefreshRequest<librbd::ImageCtx>::handle_v2_apply (this=0x7f507c02bf10, result=result@entry=0x7f510784bb2c) at librbd/image/RefreshRequest.cc:655
#5 0x00007f5112ad89ab in librbd::util::detail::C_StateCallbackAdapter<librbd::image::RefreshRequest<librbd::ImageCtx>, &librbd::image::RefreshRequest<librbd::ImageCtx>::handle_v2_apply, true>::complete (this=0x7f507c31e7b0, r=0) at ./librbd/Utils.h:66
#6 0x00007f5112a3eb54 in ContextWQ::process (this=0x6e96b20, ctx=0x7f507c31e7b0) at ./common/WorkQueue.h:611
#7 0x00007f5112c37a7e in ThreadPool::worker (this=0x7c222b0, wt=0x60fe290) at common/WorkQueue.cc:128
#8 0x00007f5112c38950 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:448
#9 0x00007f5132722dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f5131d4873d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f5132f10740 (LWP 2617826)):
#0 0x00007f51327266d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f5112a14b60 in Wait (mutex=..., this=0x7ffd0e2ba8f0) at ./common/Cond.h:56
#2 C_SaferCond::wait (this=this@entry=0x7ffd0e2ba890) at ./common/Cond.h:202
#3 0x00007f5112ab5b8e in librbd::Operations<librbd::ImageCtx>::snap_unprotect (this=0x562bdb0, snap_name=snap_name@entry=0x58fa894 "snapshot-90259d85-3edc-40e7-b306-cfff1b855cd6")
    at librbd/Operations.cc:1079
#4 0x00007f51129fb0d4 in rbd_snap_unprotect (image=0x5db10c0, snap_name=snap_name@entry=0x58fa894 "snapshot-90259d85-3edc-40e7-b306-cfff1b855cd6") at librbd/librbd.cc:2385
#5 0x00007f511c32f427 in __pyx_pf_3rbd_5Image_50unprotect_snap (__pyx_v_self=0x7017260, __pyx_v_self=0x7017260, __pyx_v_name=0x58fa870) at rbd.c:12928
#6 __pyx_pw_3rbd_5Image_51unprotect_snap (__pyx_v_self=0x7017260, __pyx_v_name=<optimized out>) at rbd.c:12843
#7 0x00007f5132a1ba62 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
......

The full backtrace is in the attachment.

Thancks a lot if anyone can give advise, ^ _ ^

zhengxiang (zhengxiang-chn) wrote :
Jason Dillaman (jdillaman) wrote :

Tracked via upstream ticket: http://tracker.ceph.com/issues/22120

James Page (james-page) wrote :

Work inflight upstream to fix and then backport to L and J; Ubuntu will pick those up with the next set of point releases.

Changed in ceph (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in ceph (Ubuntu Artful):
status: New → Triaged
Changed in ceph (Ubuntu Zesty):
status: New → Triaged
Changed in ceph (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Medium
Changed in ceph (Ubuntu Zesty):
importance: Undecided → Medium
Changed in ceph (Ubuntu Artful):
importance: Undecided → Medium
James Page (james-page) on 2018-03-20
Changed in ceph (Ubuntu Bionic):
status: Triaged → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 12.2.4-0ubuntu1

---------------
ceph (12.2.4-0ubuntu1) bionic; urgency=medium

  [ James Page ]
  * New upstream point release (LP: #1750826, #1731819, #1718134).
  * d/ceph-osd.install: Add ceph-volume tools (LP: #1750376).
  * d/*: wrap-and-sort -bast.
  * d/control,compat: Bump debhelper compat level to 10.
  * d/control: Switch to using python3-sphinx.
  * d/rules: Switch to using WITH_BOOST_CONTEXT for rgw beast frontend
    enablement.
  * d/rules,control: Switch to using vendored boost as 1.66 is required.
  * d/control: Add python-jinja2 to Depends of ceph-mgr (LP: #1752308).

  [ Tiago Stürmer Daitx ]
  * Update java source and target flags from 1.5 to 1.8. Allows it to run
    using OpenJDK 8 or earlier and to be build with OpenJDK 9, 10, and 11
    (LP: #1756854).

  [ James Page ]
  * d/ceph*.prerm: Drop, no longer needed as only use for removed upstart
    and init.d methods of managing ceph daemons (LP: #1754585).

 -- James Page <email address hidden> Tue, 20 Mar 2018 09:28:22 +0000

Changed in ceph (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers