rbd driver should check for watchers before delete

Bug #1256259 reported by Edward Hope-Morley
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Edward Hope-Morley
Havana
Fix Released
Medium
Mike Perez

Bug Description

When deleting an rbd image/volume, if the image still has so-called 'watchers' on it i.e. client connection e.g. kvm, the delete operation will fail with a message similar to:

error: image still has watchers. This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.

Currently if this occurs, the cinder volume is left stuck in the 'error_deleting' state. This has now been observed by a number of people e.g. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-August/003718.html

One way to remedy this could be to check for watchers prior to delete and if any exist, either retry after a fixed period (30s?) or simply raise ImageBusy exception so that the user retries at a later time.

Revision history for this message
Haomai Wang (haomai) wrote :

I'd prefer raise Busy exception. Nice job!

tags: added: ceph rbd
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/60105

Changed in cinder:
assignee: nobody → Edward Hope-Morley (hopem)
status: New → In Progress
Changed in cinder:
importance: Undecided → Medium
milestone: none → icehouse-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/60105
Committed: http://github.com/openstack/cinder/commit/f31d62a178a370ae9d736c09a3186ea9a3c92ee3
Submitter: Jenkins
Branch: master

commit f31d62a178a370ae9d736c09a3186ea9a3c92ee3
Author: Edward Hope-Morley <email address hidden>
Date: Wed Dec 4 18:13:06 2013 +0000

    Catch ImageBusy exception when deleting rbd volume

    If we try to delete an rbd volume that has 'watchers' on it
    i.e. client connections that have not yet been closed
    possibly because a client crashed, the remove() will throw an
    ImageBusy exception. We now catch this exception and raise
    VolumeIsBusy with a useful message.

    If the volume delete fails in this way it will now stay as
    'available' instead of going to 'error_deleting' so that the
    delete can be retried (since it is expected to work on a
    retry after waiting for the connection to timeout).

    Change-Id: I5bc9a5f71bdb0f9c5d12b5577e68377e66561f5b
    Closes-bug: 1256259

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/70260

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/77248

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/77248
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=a2f2a0e0d2f9516d86ef5988e083f70804c3977c
Submitter: Jenkins
Branch: master

commit a2f2a0e0d2f9516d86ef5988e083f70804c3977c
Author: Mike Perez <email address hidden>
Date: Fri Feb 28 11:09:27 2014 -0800

    Change RBD delete failure log level to warn

    This is a recoverable issue in the backend, so we don't have to provide
    the message on the error level.

    Change-Id: I35711876b2c088ad28f32abd39248dc9a467d00d
    Closes-Bug: #1256259

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/havana)

Reviewed: https://review.openstack.org/70260
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=24a1bd855ed90d51de3b2a458f9c51a0fe6faa58
Submitter: Jenkins
Branch: stable/havana

commit 24a1bd855ed90d51de3b2a458f9c51a0fe6faa58
Author: Edward Hope-Morley <email address hidden>
Date: Wed Dec 4 18:13:06 2013 +0000

    Catch ImageBusy exception when deleting rbd volume

    If we try to delete an rbd volume that has 'watchers' on it
    i.e. client connections that have not yet been closed
    possibly because a client crashed, the remove() will throw an
    ImageBusy exception. We now catch this exception and raise
    VolumeIsBusy with a useful message.

    If the volume delete fails in this way it will now stay as
    'available' instead of going to 'error_deleting' so that the
    delete can be retried (since it is expected to work on a
    retry after waiting for the connection to timeout).

    Change-Id: I5bc9a5f71bdb0f9c5d12b5577e68377e66561f5b
    Closes-bug: 1256259
    (cherry picked from commit f31d62a178a370ae9d736c09a3186ea9a3c92ee3)

tags: added: in-stable-havana
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/81451

Thierry Carrez (ttx)
Changed in cinder:
milestone: icehouse-2 → 2014.1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/havana)

Reviewed: https://review.openstack.org/81451
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=0b2041fb68cc845f0adb270304352045de6c3754
Submitter: Jenkins
Branch: stable/havana

commit 0b2041fb68cc845f0adb270304352045de6c3754
Author: Mike Perez <email address hidden>
Date: Fri Feb 28 11:09:27 2014 -0800

    Change RBD delete failure log level to warn

    This is a recoverable issue in the backend, so we don't have to provide
    the message on the error level.

    Change-Id: I35711876b2c088ad28f32abd39248dc9a467d00d
    Closes-Bug: #1256259
    (cherry picked from commit a2f2a0e0d2f9516d86ef5988e083f70804c3977c)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.