os-assisted-volume-snapshots:delete doesn't work if instance is SHUTOFF

Bug #1465416 reported by Dmitry Guryanov on 2015-06-15
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Jordan Pittier
Liberty
Medium
Deepak C Shetty

Bug Description

If the instance is in SHUTOFF state, volume state is 'in-use', so a volume driver for a NAS storage decides to call os-assisted-volume-snapshots:delete.

The only driver, which supports this API is libvirt, so we go to LibvirtDriver.volume_snapshot_delete. Which in turn calls

            result = virt_dom.blockRebase(rebase_disk, rebase_base,
                                          rebase_bw, rebase_flags)

Which raises an exception if a domain is not running:

  volume_snapshot_delete: delete_info: {u'type': u'qcow2', u'merge_target_file': None, u'file_to_merge': None, u'volume_id': u'e650a0cb-abbf-4bb3-843e-9fb762953c7e'} from (pid=20313) _volume_snapshot_delete /opt/stack/nova/nova/virt/libvirt/driver.py:1826
  found device at vda from (pid=20313) _volume_snapshot_delete /opt/stack/nova/nova/virt/libvirt/driver.py:1875
  disk: vda, base: None, bw: 0, flags: 0 from (pid=20313) _volume_snapshot_delete /opt/stack/nova/nova/virt/libvirt/driver.py:1947
 Error occurred during volume_snapshot_delete, sending error status to Cinder.
 Traceback (most recent call last):
   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2020, in volume_snapshot_delete
     snapshot_id, delete_info=delete_info)
   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1950, in _volume_snapshot_delete
     rebase_bw, rebase_flags)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
     result = proxy_call(self._autowrap, f, *args, **kwargs)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
     rv = execute(f, *args, **kwargs)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
     six.reraise(c, e, tb)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
     rv = meth(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/libvirt.py", line 865, in blockRebase
     if ret == -1: raise libvirtError ('virDomainBlockRebase() failed', dom=self)
 libvirtError: Requested operation is not valid: domain is not running

I'm, using devstack, which checked out openstack's repos on 15.06.2015.
I'm experiencing the problem with my new volume driver https://review.openstack.org/#/c/188869/8 , but glusterfs and quobyte volume drivers are surely have the same bug.

description: updated
Sylvain Bauza (sylvain-bauza) wrote :

Just wondering if the solution has to be part of Nova. Are you thinking of any check in Nova verifying that the instance is running before calling libvirt ?

Changed in nova:
status: New → Opinion
tags: added: libvirt
Sylvain Bauza (sylvain-bauza) wrote :

Also, could you please tell us your libvirt version ?

Kashyap Chamarthy (kashyapc) wrote :

Just a small note on blockRebase API operation:

libvirt's 'blockRebase' API works only (that's not a bug) when the guest is online -- the main functionality of blockRebase is that it allows the guest to concurrently read/write while the copy is taking place. So the behavior of libvirt (blockRebase API, to be precise) throwing an error when it doesn't see a running domain (guest) is expected.

Sylvain Bauza (sylvain-bauza) wrote :

@Kashayp, fair point, we should then prevent Nova to call libvirt if the instance is not in a correct state.

Changed in nova:
status: Opinion → Triaged
importance: Undecided → Low
tags: added: low-hanging-fruit

A good way to "fix" this is to switch to qemu-img tool if guest is not running as proposed here:

  https://review.openstack.org/#/c/192736/

Also as mentioned here [1] We can expect to see libvirt to handle blokc Rebase when guest is not running

 1. http://lists.openstack.org/pipermail/openstack-dev/2015-April/061013.htm

Jordan Pittier (jordan-pittier) wrote :

Small typo in the last comment, the correct link to the mailing list is http://lists.openstack.org/pipermail/openstack-dev/2015-April/061013.html (notice .html instead of .htm).

Changed in nova:
assignee: nobody → Jordan Pittier (jordan-pittier)
status: Triaged → In Progress
Silvan Kaiser (2-silvan) wrote :

Hi everyone,
acknowledged, Quobyte is hitting the same issue.
Example log can be found here: http://176.9.127.22:8081/refs-changes-29-198829-6/

Changed in nova:
assignee: Jordan Pittier (jordan-pittier) → Matt Riedemann (mriedem)

Reviewed: https://review.openstack.org/192736
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1cf793df252756615dc01a105953346a8747e755
Submitter: Jenkins
Branch: master

commit 1cf793df252756615dc01a105953346a8747e755
Author: Jordan Pittier <email address hidden>
Date: Wed Jul 15 14:07:29 2015 +0200

    libvirt:on snapshot delete, use qemu-img to blockRebase if VM is stopped

    Libvirt won't do a blockRebase on a domain that is not running. So,
    in that case, use "qemu-img rebase" instead.

    Note: For now, trying to rebase a network disk using qemu-img raises
    a NovaException error because I can't test that it successfully works
    for every protocol (gluster, sheepdog, etc) that executes this code
    path. I successfully tested this with file-based disk.

    Change-Id: I0e6819a6c8dc70b9bd7d1a9c18dc185b4537a3e4
    Closes-Bug: #1444806
    Closes-Bug: #1465416

Changed in nova:
status: In Progress → Fix Committed
tags: added: liberty-backport-potential

This issue was fixed in the openstack/nova 13.0.0.0b1 development milestone.

Changed in nova:
status: Fix Committed → Fix Released
Matt Riedemann (mriedem) on 2016-01-07
Changed in nova:
assignee: Matt Riedemann (mriedem) → Jordan Pittier (jordan-pittier)
importance: Low → Medium

Reviewed: https://review.openstack.org/243028
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aea87902180bebae93680e156a950f00a5f360fd
Submitter: Jenkins
Branch: stable/liberty

commit aea87902180bebae93680e156a950f00a5f360fd
Author: Jordan Pittier <email address hidden>
Date: Wed Jul 15 14:07:29 2015 +0200

    libvirt:on snapshot delete, use qemu-img to blockRebase if VM is stopped

    Libvirt won't do a blockRebase on a domain that is not running. So,
    in that case, use "qemu-img rebase" instead.

    Note: For now, trying to rebase a network disk using qemu-img raises
    a NovaException error because I can't test that it successfully works
    for every protocol (gluster, sheepdog, etc) that executes this code
    path. I successfully tested this with file-based disk.

    Change-Id: I0e6819a6c8dc70b9bd7d1a9c18dc185b4537a3e4
    Closes-Bug: #1444806
    Closes-Bug: #1465416
    (cherry picked from commit 1cf793df252756615dc01a105953346a8747e755)

This issue was fixed in the openstack/nova 12.0.1 release.

Matt Riedemann (mriedem) on 2016-03-04
tags: added: in-stable-liberty
removed: liberty-backport-potential
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related blueprints