failed to delete volume and cloned-volume concurrently

Bug #1641518 reported by suntao
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
In Progress
Medium
renminmin

Bug Description

release:liberty
cinder-backend = rbd

steps
1)create a volume xxx
2) create a new volume yyy by source volume xxx
3) delete volume xxx and yyy concurrently

cinder-volume.log:
Traceback:
... ...
  File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 729, in delete-volume
    self._delete_clone_parent_refs(client, parent, parent_snap)
  File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 636, in _delete_clone_parent_refs
    parent_rbd = self.rbd.Image(client.ioctx parent, name)
  File "rbd.pyx", line 637, in rbd.Image.__init__(rbd.c:4909)
ImageNotFound: error opening image volume-xxx at snapshot None

Tags: ceph drivers rbd
Eric Harney (eharney)
tags: added: ceph drivers rbd
Revision history for this message
Jon Bernard (jbernard) wrote :

Are you able to reproduce this reliably? It sounds like the deletion of 'yyy' attempts to delete the parent, but the parent is no longer there becuase its deletion raced and won. This sequence is possible based on what I'm reading. If I make a patch would you be able to test it and report wether it addresses the problem?

Revision history for this message
Eric Harney (eharney) wrote :

Suggested fix: handle ImageNotFound errors in _delete_clone_parent_refs and treat them as successful, like we do in delete_volume and delete_snapshot.

Changed in cinder:
status: New → Confirmed
Jon Bernard (jbernard)
Changed in cinder:
assignee: nobody → Jon Bernard (jbernard)
importance: Undecided → Medium
Revision history for this message
Jon Bernard (jbernard) wrote :
Changed in cinder:
status: Confirmed → In Progress
Revision history for this message
suntao (244914362-q) wrote :

i think, if just return when "parent_rbd = self.rbd.Image(...)" failed, the parent and parant_snap will be left in rbd pool, like "volume-xxx.deleted" and "volume-xxx.deleted.clone_snap".

is it possible to get the parent and parant_snap again when error happens?

Revision history for this message
suntao (244914362-q) wrote :

i try to solve the problem like this, is it right?

def _delete_clone_parent_refs(self, client, parent_name, parent_snap)
    parent_image_found = True
    try:
        parent_rbd = self.rbd.Image(client.ioctx, parent_name)
    except self.rbd.ImageNotFound:
        parent_image_found = False
        LOG.info("parent volume %s no longer exists in backend", parent_name)

    if not parent_image_found:
        parent_name = parent_name + '.deleted'
        try:
            parent_rbd = self.rbd.Image(client.ioctx, parent_name)
        except self.rbd.ImageNotFound:
            LOG.info("parent volume %s no longer exists in backend", parent_name)
            return

    parent_has_snapts = False
    ... ...

Revision history for this message
Jon Bernard (jbernard) wrote :

It's hard to say without more information. Can we go back a bit and get the cinder and ceph state for volumes and snapshots just after the create, and then again just after the failure?

Revision history for this message
suntao (244914362-q) wrote :

1) create volume xxx;
2)clone volume xxx: xxx-clone-1,xxx-clone-2,xxx-clone3
now ceph state is :
# rbd -p volumes ls -l
volume-xxx
volume-xxx@volume-xxx-clone-1.clone_snap
volume-xxx@volume-xxx-clone-2.clone_snap
volume-xxx@volume-xxx-clone-3.clone_snap
volume-xxx-clone-1
volume-xxx-clone-2
volume-xxx-clone-3

3) delete volume and clones concurrently
now ceph state is:
# rbd -p volumes ls -l
volume-xxx.deleted
volume-xxx.deleted@volume-xxx-clone-1.clone_snap

if i try like https://review.openstack.org/#/c/397863/, ceph state is the same, just no error message returned to user.

Revision history for this message
suntao (244914362-q) wrote :

anyone helps ?

Revision history for this message
Jon Bernard (jbernard) wrote :

Just returned from holidays, it's in my queue to look at.

Revision history for this message
Jon Bernard (jbernard) wrote :

Yes, you're absolutely correct. I'll update my patch to handle this case. Thanks for pointing this out.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote : Bug Assignee Expired

Unassigning due to no activity for > 6 months.

Changed in cinder:
assignee: Jon Bernard (jbernard) → nobody
Changed in cinder:
status: In Progress → New
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Mike Perez (<email address hidden>) on branch: master
Review: https://review.openstack.org/397863
Reason: Feel free to rebase and open again when you have time. Thank you.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

We are seeing a related issue with the scenario in comment #7. We have a snapshot that volume-xxx was created from that won't delete now because the volume-xxx.deleted@clone_snap and volume-xxx.deleted exist as children from the snapshot-pre-volume-xxx. Is there a cleanup routine for this?

Changed in cinder:
status: New → Confirmed
status: Confirmed → New
renminmin (rmm0811)
Changed in cinder:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/709342

Changed in cinder:
assignee: nobody → renminmin (rmm0811)
status: Confirmed → In Progress
Revision history for this message
wang (yunhua) wrote :

stay tuned

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/843309

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by "Tushar Trambak Gite <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/843309

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.