Online snapshot delete fails for network disk type

Bug #1477110 reported by Einst Crazy
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Undecided
Unassigned
OpenStack Compute (nova)
Fix Released
Medium
Deepak C Shetty

Bug Description

I have a test that cinder use GlusterFS (libgfapi) as storage.
1. create a instance
2. create a volume
3. attach the volume to the instance
4. make snapshot to the volume
5. delete the snapshot

It get an error.

OS: CentOS 7

affects: bagpipe-l2 → openstack-community
Changed in openstack-community:
assignee: nobody → Einst Crazy (yu-changcai)
affects: openstack-community → nova
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Einst Crazy:

1) Can you add the error you observed, please? The section from the logs would be great too.
2) It sounds that this is related to OpenStack's Cinder project, could that be right?

tags: added: snapshot
summary: - cinder delete snapshot failed
+ cinder delete Gluster snapshot failed
Revision history for this message
Deepak C Shetty (dpkshetty) wrote : Re: cinder delete Gluster snapshot failed

@Einst,
  Agree with Markus, plus few more comments...

1) Looking at the patch, it seems the case when active_protocol is not None, which means you are using libgfapi to access GlusterfS volumes, right ?

2) IF yes to #1, pls attach cinder.conf and nova.conf to understand the configs. Also add libgfapi tag to this bug and mention it in the bug title, so its clear to others.

3) Also as Markus said, need c-vol logs and n-cpu logs. In the code i do see that _get_snap_dev is using backing_store only if its not None, but i presume, the my_snap_dev will be formed incorrectly. Attaching the logs will helps us understand better

4) FWIW, pls check patch @ https://review.openstack.org/#/c/202442/7 that fixes a similar issue, but for the non-libgfapi (fuse mount) case. I recommend if you can pair up with that patch's author and collaborate on the same patch, that ways 1 patch fixes the issue for both fuse and libgfapi based disks.

Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

@Markus,
  The fix is in Nova project, so bug should be on Nova only. The effect of this bug is seen in Cinder as Cinder calls Nova for
online snapshot create/delete operations for GlusterFS backend.

Changed in cinder:
status: New → Invalid
Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

Correcting my comment #2 above...

   I spoke too fast... I think my comment in point #3 is wrong. active_disk_object.backing_store can't be none, given the steps
you provided at the start of this bug. Either you are using a older version of libvirt that doesn't support backing_store or you
are NOT using libgfapi, in which case the patch fix doesn't seem right. You are then hitting the same issue as https://review.openstack.org/#/c/202442/7

Attaching the configs as mentioned in point #2 will help.

description: updated
Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

@Einst,
  Can you attach the cinder and nova error logs & conf files ?

tags: added: cinder glusterfs libgfapi nova
Revision history for this message
Bharat Kumar Kobagana (bharat-kobagana) wrote :

Successfully reproduced this bug in below environment:

Operating System: Fedora 22
Qemu-KVM version: 2.3.0
Using Libgfapi to attach Gluster volume to compute instances.

Configuration Files:

nova.conf: http://paste.openstack.org/show/406707/
cinder.conf: http://paste.openstack.org/show/406708/

Followed below steps:

1. Created a VM (vm1)
2. Created a Volume vol1 (GlusterFS backend)
3. Attached vol1 to vm1 (vol1 -> vm1)
4. Created snapshot on attached volume
       > cinder snapshot-create vol1 --force True
    Snapshot came to "available" state, but found http://paste.openstack.org/show/406709/ error in n-cpu.log
5. Tried to delete the snapshot
     Snapshot went into "error_deleting" state.
     And found http://paste.openstack.org/show/406710/ error in n-cpu.log

Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

@Bharat,
  Thanks for helping re-create the issue

I think the error you saw in n-cpu.log as part of #4 in your comment is expected, and is not an error

Nova always tries to take a snap by quiescing using w/ qemu guest agent, and if that isn't present, takes without quiescing, which is what follows after the excp, so #4 can be ignored

#5 seems to re-create the same issue that @Einst reported

Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

I too re-created the bug and then pulled in the patch posted by Einst @

https://review.openstack.org/#/c/204617/

and it _does_not_ fix the issue reported in this bug !

I still got the same Excp that originally occured without Einst's patch.

@Einst,
  Did you test whether ur patch fixes the issue ? From my testing it does not

Revision history for this message
Deepak C Shetty (dpkshetty) wrote :
Download full text (6.8 KiB)

Dumping the excp I see after using Einst's patch as mentioned in #8 above

2015-08-12 07:14:36.530 DEBUG nova.virt.libvirt.driver [req-d2743d90-3470-4c07-a920-c7c22f456b03 nova service] [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] volume_snapshot_delete: delete_info: {u'type': u'qcow2', u'merge_target_file': None, u'file_to_merge': None, u'volume_id': u'80f5481a-9ddc-4cb6-a9bc-adb2c1824211'} from (pid=29535) _volume_snapshot_delete /opt/stack/nova/nova/virt/libvirt/driver.py:1862
2015-08-12 07:14:36.537 DEBUG nova.virt.libvirt.driver [req-d2743d90-3470-4c07-a920-c7c22f456b03 nova service] [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] found device at vdb from (pid=29535) _volume_snapshot_delete /opt/stack/nova/nova/virt/libvirt/driver.py:1906
2015-08-12 07:14:36.537 ERROR nova.virt.libvirt.driver [req-d2743d90-3470-4c07-a920-c7c22f456b03 nova service] [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] Error occurred during volume_snapshot_delete, sending error status to Cinder.
2015-08-12 07:14:36.537 TRACE nova.virt.libvirt.driver [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] Traceback (most recent call last):
2015-08-12 07:14:36.537 TRACE nova.virt.libvirt.driver [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2049, in volume_snapshot_delete
2015-08-12 07:14:36.537 TRACE nova.virt.libvirt.driver [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] snapshot_id, delete_info=delete_info)
2015-08-12 07:14:36.537 TRACE nova.virt.libvirt.driver [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1957, in _volume_snapshot_delete
2015-08-12 07:14:36.537 TRACE nova.virt.libvirt.driver [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] active_disk_object.backing_store)
2015-08-12 07:14:36.537 TRACE nova.virt.libvirt.driver [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1911, in _get_snap_dev
2015-08-12 07:14:36.537 TRACE nova.virt.libvirt.driver [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] raise exception.NovaException(msg)
2015-08-12 07:14:36.537 TRACE nova.virt.libvirt.driver [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37] NovaException: filename cannot be None
2015-08-12 07:14:36.537 TRACE nova.virt.libvirt.driver [instance: b02b29fb-e873-4a1b-a09a-6947afd58b37]
2015-08-12 07:14:36.540 DEBUG keystoneclient.session [req-d2743d90-3470-4c07-a920-c7c22f456b03 nova service] REQ: curl -g -i -X POST http://192.168.122.182:8776/v2/fc9c9865a02547e6b6e27c7b7561f6f3/snapshots/670080ab-42ee-442a-90a7-97dc351330d6/action -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}56d4827d39d20a31b656e23a9f0c6b8d601a966b" -d '{"os-update_snapshot_status": {"status": "error_deleting", "progress": "90%"}}' from (pid=29535) _http_log_request /usr/lib/python2.7/site-packages/keystoneclient/session.py:195
2015-08-12 07:14:36.635 DEBUG keystoneclient.session [req-d2743d90-3470-4c07-a920-c7c22f456b03 nova service] RESP: [202] date: Wed, 12 Aug 2015 07:14:36 GMT connection: keep-alive content-type: text/html; c...

Read more...

Changed in nova:
status: New → Confirmed
Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

Taking over as I think I have found the issue and patch is on its way.....

Changed in nova:
assignee: Einst Crazy (yu-changcai) → Deepak C Shetty (dpkshetty)
status: Confirmed → New
status: New → In Progress
Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

here is my analysis of the issue...

1) Cinder sends Nova a delete_info dict which has :

delete_info['file_to_merge'] = None
For the case where you are deleting the most recent snapshot, as in ...

base <-- snap1 (active file)
and you delete snap1

which is what causes this bug report

2) When that happens, the below code calls _get_snap_dev() with None as the 1st arg :

        if delete_info['merge_target_file'] is None:
            # pull via blockRebase()

            # Merge the most recent snapshot into the active image

            rebase_disk = my_dev
            rebase_base = delete_info['file_to_merge'] # often None
            if active_protocol is not None:
                rebase_base = _get_snap_dev(delete_info['file_to_merge'],
                                            active_disk_object.backing_store)

which blows up as _get_snap_dev() the first thing it does is :

        def _get_snap_dev(filename, backing_store):
            if filename is None:
                msg = _('filename cannot be None')
                raise exception.NovaException(msg)

3) Solution to this is to NOT call _get_snap_dev if delete_info['file_to_merge'] is None.

The reason why delete_info['file_to_merge'] is None is because the rebase_base file is None (which is what is represented by delete_info['file_to_merge'] by Cinder) . In other words, post the rebase, the snap1 file will exist without any backing_file (as in rebased to _no_ backing file).

Patch on its way ....

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/212518

summary: - cinder delete Gluster snapshot failed
+ Online snapshot delete fails for network disk type
Changed in nova:
assignee: Deepak C Shetty (dpkshetty) → zhangjinnan (zhang-jinnan)
Changed in nova:
assignee: zhangjinnan (zhang-jinnan) → nobody
Changed in nova:
assignee: nobody → zhangjinnan (zhang-jinnan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Einst Crazy (yu.changcai@99cloud.net) on branch: master
Review: https://review.openstack.org/204617
Reason: This bug fix in https://review.openstack.org/#/c/212518/

Changed in nova:
assignee: zhangjinnan (zhang-jinnan) → Deepak C Shetty (dpkshetty)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/212518
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7d5df265e53915bd9de066373596f12a62a11f0b
Submitter: Jenkins
Branch: master

commit 7d5df265e53915bd9de066373596f12a62a11f0b
Author: Deepak C Shetty <email address hidden>
Date: Thu Aug 13 13:07:53 2015 +0000

    libvirt: Fix snapshot delete for network disk type for blockRebase op

    _volume_snapshot_delete was raising an exception as part of blockRebase
    op when the new backing file happens to be None, which is a valid
    scenario.

    This patch ensures that we don't look for the libvirt disk spec (skip
    calling _get_snap_dev()) if the new backing file for rebase op is
    None.

    Change-Id: I98deda75310f0b44b70257071d282aa50babe06b
    Tested-By: Zhang Jinnan<zhang.jinnan@99cloud.net>
    Closes-Bug: #1477110

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-rc1 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.