libvirt: nova's detach_volume silently fails sometimes

Bug #1452840 reported by Nicolas Simonds
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Libvirt Python
New
Undecided
Unassigned
OpenStack Compute (nova)
Confirmed
Low
Unassigned

Bug Description

This behavior has been observed on the following platforms:

* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
* Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest

Nova's "detach_volume" fires the detach method into libvirt, which claims success, but the device is still attached according to "virsh domblklist". Nova then finishes the teardown, releasing the resources, which then causes I/O errors in the guest, and subsequent volume_attach requests from Nova to fail spectacularly due to it trying to use an in-use resource.

This appears to be a race condition, in that it does occasionally work fine.

Steps to Reproduce:

This script will usually trigger the error condition:

    #!/bin/bash -vx

    : Setup
    img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
    vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
    sleep 5

    : Launch
    nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test

    : Measure
    nova show test | grep "volumes_attached.*$vol1_id"

    : Poke the bear
    nova volume-detach test "$vol1_id"
    sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
    sleep 10
    sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
    vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
    nova volume-attach test "$vol2_id"
    sleep 1

    : Measure again
    nova show test | grep "volumes_attached.*$vol2_id"

Expected behavior:

The volumes attach/detach/attach properly

Actual behavior:

The second attachment fails, and n-cpu throws the following exception:

    Failed to attach volume at mountpoint: /dev/vdb
    Traceback (most recent call last):
        File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
         virt_dom.attachDeviceFlags(conf.to_xml(), flags)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
         result = proxy_call(self._autowrap, f, *args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
         rv = execute(f, *args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
         six.reraise(c, e, tb)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
         rv = meth(*args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
         if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
     libvirtError: operation failed: target vdb already exists

Workaround:

"sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the guest to properly detach the device, and also seems to ward off whatever gremlins caused the problem in the first place; i.e., the problem gets much less likely to present itself after firing a virsh command.

Revision history for this message
Matt Riedemann (mriedem) wrote :

What version of libvirt/qemu used with master nova?

tags: added: libvirt volumes
Revision history for this message
Matt Riedemann (mriedem) wrote :

Oh nevermind, libvirt 1.2.2 with nova master on debian. Have you tried testing against newer/latest libvirt/qemu?

Revision history for this message
Nicolas Simonds (nicolas.simonds) wrote :

Addendum:

I between runs of the test script, clean up with:

    nova delete test ; cinder list | awk '/avail/ {print $2}' | xargs -r cinder delete

Revision history for this message
Nicolas Simonds (nicolas.simonds) wrote :

No, I'm testing with stock Ubuntu Trusty and devstack with no local.conf, i.e., all defaults, all the time.

description: updated
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Nicolas Simonds (nicolas.simonds) wrote :

In an attempt to gain insight, I altered Nova's detach_volume method to recheck+retry+log indefinitely, to see how many tries it would take for the detach to eventually succeed.

The answer is, "never, unless another request comes in on a different greenthread to alter the guest's configuration". The test provided script attaches another volume after ten seconds, so after futilely trying to detach the volume (/dev/vdb) for ten seconds, an attach request comes in, succeeds (on /dev/vdc), and unsticks libvirt with regards to detaching the volume, and cleans everything up.

jimmy.zhao (jimmy-zhao)
Changed in nova:
status: Confirmed → In Progress
Changed in nova:
status: In Progress → Confirmed
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version icehouse in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.icehouse
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.