os-assisted-volume-snapshots:delete doesn't work if instance is SHUTOFF

Bug #1465416 reported by Dmitry Guryanov
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Jordan Pittier
Liberty
Fix Released
Medium
Deepak C Shetty

Bug Description

If the instance is in SHUTOFF state, volume state is 'in-use', so a volume driver for a NAS storage decides to call os-assisted-volume-snapshots:delete.

The only driver, which supports this API is libvirt, so we go to LibvirtDriver.volume_snapshot_delete. Which in turn calls

            result = virt_dom.blockRebase(rebase_disk, rebase_base,
                                          rebase_bw, rebase_flags)

Which raises an exception if a domain is not running:

  volume_snapshot_delete: delete_info: {u'type': u'qcow2', u'merge_target_file': None, u'file_to_merge': None, u'volume_id': u'e650a0cb-abbf-4bb3-843e-9fb762953c7e'} from (pid=20313) _volume_snapshot_delete /opt/stack/nova/nova/virt/libvirt/driver.py:1826
  found device at vda from (pid=20313) _volume_snapshot_delete /opt/stack/nova/nova/virt/libvirt/driver.py:1875
  disk: vda, base: None, bw: 0, flags: 0 from (pid=20313) _volume_snapshot_delete /opt/stack/nova/nova/virt/libvirt/driver.py:1947
 Error occurred during volume_snapshot_delete, sending error status to Cinder.
 Traceback (most recent call last):
   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2020, in volume_snapshot_delete
     snapshot_id, delete_info=delete_info)
   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1950, in _volume_snapshot_delete
     rebase_bw, rebase_flags)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
     result = proxy_call(self._autowrap, f, *args, **kwargs)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
     rv = execute(f, *args, **kwargs)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
     six.reraise(c, e, tb)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
     rv = meth(*args, **kwargs)
   File "/usr/lib/python2.7/site-packages/libvirt.py", line 865, in blockRebase
     if ret == -1: raise libvirtError ('virDomainBlockRebase() failed', dom=self)
 libvirtError: Requested operation is not valid: domain is not running

I'm, using devstack, which checked out openstack's repos on 15.06.2015.
I'm experiencing the problem with my new volume driver https://review.openstack.org/#/c/188869/8 , but glusterfs and quobyte volume drivers are surely have the same bug.

description: updated
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Just wondering if the solution has to be part of Nova. Are you thinking of any check in Nova verifying that the instance is running before calling libvirt ?

Changed in nova:
status: New → Opinion
tags: added: libvirt
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Also, could you please tell us your libvirt version ?

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Just a small note on blockRebase API operation:

libvirt's 'blockRebase' API works only (that's not a bug) when the guest is online -- the main functionality of blockRebase is that it allows the guest to concurrently read/write while the copy is taking place. So the behavior of libvirt (blockRebase API, to be precise) throwing an error when it doesn't see a running domain (guest) is expected.

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

@Kashayp, fair point, we should then prevent Nova to call libvirt if the instance is not in a correct state.

Changed in nova:
status: Opinion → Triaged
importance: Undecided → Low
tags: added: low-hanging-fruit
Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

A good way to "fix" this is to switch to qemu-img tool if guest is not running as proposed here:

  https://review.openstack.org/#/c/192736/

Also as mentioned here [1] We can expect to see libvirt to handle blokc Rebase when guest is not running

 1. http://lists.openstack.org/pipermail/openstack-dev/2015-April/061013.htm

Revision history for this message
Jordan Pittier (jordan-pittier) wrote :

Small typo in the last comment, the correct link to the mailing list is http://lists.openstack.org/pipermail/openstack-dev/2015-April/061013.html (notice .html instead of .htm).

Changed in nova:
assignee: nobody → Jordan Pittier (jordan-pittier)
status: Triaged → In Progress
Revision history for this message
Silvan Kaiser (2-silvan) wrote :

Hi everyone,
acknowledged, Quobyte is hitting the same issue.
Example log can be found here: http://176.9.127.22:8081/refs-changes-29-198829-6/

Changed in nova:
assignee: Jordan Pittier (jordan-pittier) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/192736
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1cf793df252756615dc01a105953346a8747e755
Submitter: Jenkins
Branch: master

commit 1cf793df252756615dc01a105953346a8747e755
Author: Jordan Pittier <email address hidden>
Date: Wed Jul 15 14:07:29 2015 +0200

    libvirt:on snapshot delete, use qemu-img to blockRebase if VM is stopped

    Libvirt won't do a blockRebase on a domain that is not running. So,
    in that case, use "qemu-img rebase" instead.

    Note: For now, trying to rebase a network disk using qemu-img raises
    a NovaException error because I can't test that it successfully works
    for every protocol (gluster, sheepdog, etc) that executes this code
    path. I successfully tested this with file-based disk.

    Change-Id: I0e6819a6c8dc70b9bd7d1a9c18dc185b4537a3e4
    Closes-Bug: #1444806
    Closes-Bug: #1465416

Changed in nova:
status: In Progress → Fix Committed
tags: added: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/243028

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/nova 13.0.0.0b1

This issue was fixed in the openstack/nova 13.0.0.0b1 development milestone.

Changed in nova:
status: Fix Committed → Fix Released
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Jordan Pittier (jordan-pittier)
importance: Low → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/liberty)

Reviewed: https://review.openstack.org/243028
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aea87902180bebae93680e156a950f00a5f360fd
Submitter: Jenkins
Branch: stable/liberty

commit aea87902180bebae93680e156a950f00a5f360fd
Author: Jordan Pittier <email address hidden>
Date: Wed Jul 15 14:07:29 2015 +0200

    libvirt:on snapshot delete, use qemu-img to blockRebase if VM is stopped

    Libvirt won't do a blockRebase on a domain that is not running. So,
    in that case, use "qemu-img rebase" instead.

    Note: For now, trying to rebase a network disk using qemu-img raises
    a NovaException error because I can't test that it successfully works
    for every protocol (gluster, sheepdog, etc) that executes this code
    path. I successfully tested this with file-based disk.

    Change-Id: I0e6819a6c8dc70b9bd7d1a9c18dc185b4537a3e4
    Closes-Bug: #1444806
    Closes-Bug: #1465416
    (cherry picked from commit 1cf793df252756615dc01a105953346a8747e755)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 12.0.1

This issue was fixed in the openstack/nova 12.0.1 release.

Matt Riedemann (mriedem)
tags: added: in-stable-liberty
removed: liberty-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.