libvirt: nova's detach_volume silently fails sometimes

Bug #1452840 reported by Nicolas Simonds
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Libvirt Python
New
Undecided
Unassigned
OpenStack Compute (nova)
Confirmed
Low
Unassigned

Bug Description

This behavior has been observed on the following platforms:

* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
* Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest

Nova's "detach_volume" fires the detach method into libvirt, which claims success, but the device is still attached according to "virsh domblklist". Nova then finishes the teardown, releasing the resources, which then causes I/O errors in the guest, and subsequent volume_attach requests from Nova to fail spectacularly due to it trying to use an in-use resource.

This appears to be a race condition, in that it does occasionally work fine.

Steps to Reproduce:

This script will usually trigger the error condition:

    #!/bin/bash -vx

    : Setup
    img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
    vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
    sleep 5

    : Launch
    nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test

    : Measure
    nova show test | grep "volumes_attached.*$vol1_id"

    : Poke the bear
    nova volume-detach test "$vol1_id"
    sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
    sleep 10
    sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
    vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
    nova volume-attach test "$vol2_id"
    sleep 1

    : Measure again
    nova show test | grep "volumes_attached.*$vol2_id"

Expected behavior:

The volumes attach/detach/attach properly

Actual behavior:

The second attachment fails, and n-cpu throws the following exception:

    Failed to attach volume at mountpoint: /dev/vdb
    Traceback (most recent call last):
        File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
         virt_dom.attachDeviceFlags(conf.to_xml(), flags)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
         result = proxy_call(self._autowrap, f, *args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
         rv = execute(f, *args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
         six.reraise(c, e, tb)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
         rv = meth(*args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
         if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
     libvirtError: operation failed: target vdb already exists

Workaround:

"sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the guest to properly detach the device, and also seems to ward off whatever gremlins caused the problem in the first place; i.e., the problem gets much less likely to present itself after firing a virsh command.

Revision history for this message
Matt Riedemann (mriedem) wrote :

What version of libvirt/qemu used with master nova?

tags: added: libvirt volumes
Revision history for this message
Matt Riedemann (mriedem) wrote :

Oh nevermind, libvirt 1.2.2 with nova master on debian. Have you tried testing against newer/latest libvirt/qemu?

Revision history for this message
Nicolas Simonds (nicolas.simonds) wrote :

Addendum:

I between runs of the test script, clean up with:

    nova delete test ; cinder list | awk '/avail/ {print $2}' | xargs -r cinder delete

Revision history for this message
Nicolas Simonds (nicolas.simonds) wrote :

No, I'm testing with stock Ubuntu Trusty and devstack with no local.conf, i.e., all defaults, all the time.

description: updated
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Nicolas Simonds (nicolas.simonds) wrote :

In an attempt to gain insight, I altered Nova's detach_volume method to recheck+retry+log indefinitely, to see how many tries it would take for the detach to eventually succeed.

The answer is, "never, unless another request comes in on a different greenthread to alter the guest's configuration". The test provided script attaches another volume after ten seconds, so after futilely trying to detach the volume (/dev/vdb) for ten seconds, an attach request comes in, succeeds (on /dev/vdc), and unsticks libvirt with regards to detaching the volume, and cleans everything up.

jimmy.zhao (jimmy-zhao)
Changed in nova:
status: Confirmed → In Progress
Changed in nova:
status: In Progress → Confirmed
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version icehouse in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.icehouse
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers