rbd driver should check for watchers before delete

Bug #1256259 reported by Edward Hope-Morley on 2013-11-29
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Medium
Edward Hope-Morley
Havana
Medium
Mike Perez

Bug Description

When deleting an rbd image/volume, if the image still has so-called 'watchers' on it i.e. client connection e.g. kvm, the delete operation will fail with a message similar to:

error: image still has watchers. This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.

Currently if this occurs, the cinder volume is left stuck in the 'error_deleting' state. This has now been observed by a number of people e.g. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-August/003718.html

One way to remedy this could be to check for watchers prior to delete and if any exist, either retry after a fixed period (30s?) or simply raise ImageBusy exception so that the user retries at a later time.

Haomai Wang (haomai) wrote :

I'd prefer raise Busy exception. Nice job!

tags: added: ceph rbd

Fix proposed to branch: master
Review: https://review.openstack.org/60105

Changed in cinder:
assignee: nobody → Edward Hope-Morley (hopem)
status: New → In Progress
Changed in cinder:
importance: Undecided → Medium
milestone: none → icehouse-2

Reviewed: https://review.openstack.org/60105
Committed: http://github.com/openstack/cinder/commit/f31d62a178a370ae9d736c09a3186ea9a3c92ee3
Submitter: Jenkins
Branch: master

commit f31d62a178a370ae9d736c09a3186ea9a3c92ee3
Author: Edward Hope-Morley <email address hidden>
Date: Wed Dec 4 18:13:06 2013 +0000

    Catch ImageBusy exception when deleting rbd volume

    If we try to delete an rbd volume that has 'watchers' on it
    i.e. client connections that have not yet been closed
    possibly because a client crashed, the remove() will throw an
    ImageBusy exception. We now catch this exception and raise
    VolumeIsBusy with a useful message.

    If the volume delete fails in this way it will now stay as
    'available' instead of going to 'error_deleting' so that the
    delete can be retried (since it is expected to work on a
    retry after waiting for the connection to timeout).

    Change-Id: I5bc9a5f71bdb0f9c5d12b5577e68377e66561f5b
    Closes-bug: 1256259

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2014-01-22
Changed in cinder:
status: Fix Committed → Fix Released

Reviewed: https://review.openstack.org/77248
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=a2f2a0e0d2f9516d86ef5988e083f70804c3977c
Submitter: Jenkins
Branch: master

commit a2f2a0e0d2f9516d86ef5988e083f70804c3977c
Author: Mike Perez <email address hidden>
Date: Fri Feb 28 11:09:27 2014 -0800

    Change RBD delete failure log level to warn

    This is a recoverable issue in the backend, so we don't have to provide
    the message on the error level.

    Change-Id: I35711876b2c088ad28f32abd39248dc9a467d00d
    Closes-Bug: #1256259

Reviewed: https://review.openstack.org/70260
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=24a1bd855ed90d51de3b2a458f9c51a0fe6faa58
Submitter: Jenkins
Branch: stable/havana

commit 24a1bd855ed90d51de3b2a458f9c51a0fe6faa58
Author: Edward Hope-Morley <email address hidden>
Date: Wed Dec 4 18:13:06 2013 +0000

    Catch ImageBusy exception when deleting rbd volume

    If we try to delete an rbd volume that has 'watchers' on it
    i.e. client connections that have not yet been closed
    possibly because a client crashed, the remove() will throw an
    ImageBusy exception. We now catch this exception and raise
    VolumeIsBusy with a useful message.

    If the volume delete fails in this way it will now stay as
    'available' instead of going to 'error_deleting' so that the
    delete can be retried (since it is expected to work on a
    retry after waiting for the connection to timeout).

    Change-Id: I5bc9a5f71bdb0f9c5d12b5577e68377e66561f5b
    Closes-bug: 1256259
    (cherry picked from commit f31d62a178a370ae9d736c09a3186ea9a3c92ee3)

tags: added: in-stable-havana
Thierry Carrez (ttx) on 2014-04-17
Changed in cinder:
milestone: icehouse-2 → 2014.1

Reviewed: https://review.openstack.org/81451
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=0b2041fb68cc845f0adb270304352045de6c3754
Submitter: Jenkins
Branch: stable/havana

commit 0b2041fb68cc845f0adb270304352045de6c3754
Author: Mike Perez <email address hidden>
Date: Fri Feb 28 11:09:27 2014 -0800

    Change RBD delete failure log level to warn

    This is a recoverable issue in the backend, so we don't have to provide
    the message on the error level.

    Change-Id: I35711876b2c088ad28f32abd39248dc9a467d00d
    Closes-Bug: #1256259
    (cherry picked from commit a2f2a0e0d2f9516d86ef5988e083f70804c3977c)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers