2015-05-07 18:38:20 |
Nicolas Simonds |
description |
This behavior has been observed on the following platforms:
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
* Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest
Nova's "detach_volume" fires the detach method into libvirt, which claims success, but the device is still attached according to "virsh domblklist". Nova then finishes the teardown, releasing the resources, which then causes
This appears to be a race condition, in that it does occasionally work fine.
Steps to Reproduce:
This script will usually trigger the error condition:
#!/bin/bash -vx
: Setup
img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
sleep 5
: Launch
nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test
: Measure
nova show test | grep "volumes_attached.*$vol1_id"
: Poke the bear
nova volume-detach test "$vol1_id"
sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
sleep 10
sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
nova volume-attach test "$vol2_id"
sleep 1
: Measure again
nova show test | grep "volumes_attached.*$vol2_id"
Expected behavior:
The volumes attach/detach/attach properly
Actual behavior:
The second attachment fails, and n-cpu throws the following exception:
Failed to attach volume at mountpoint: /dev/vdb
Traceback (most recent call last):
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
virt_dom.attachDeviceFlags(conf.to_xml(), flags)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
result = proxy_call(self._autowrap, f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
rv = execute(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
six.reraise(c, e, tb)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
rv = meth(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
libvirtError: operation failed: target vdb already exists
Workaround:
"sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the guest to properly detach the device, and also seems to ward off whatever gremlins caused the problem in the first place; i.e., the problem gets much less likely to present itself after firing a virsh command. |
This behavior has been observed on the following platforms:
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
* Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest
Nova's "detach_volume" fires the detach method into libvirt, which claims success, but the device is still attached according to "virsh domblklist". Nova then finishes the teardown, releasing the resources, which then causes I/O errors in the guest, and subsequent volume_attach requests from Nova to fail spectacularly due to it trying to use an in-use resource.
This appears to be a race condition, in that it does occasionally work fine.
Steps to Reproduce:
This script will usually trigger the error condition:
#!/bin/bash -vx
: Setup
img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
sleep 5
: Launch
nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test
: Measure
nova show test | grep "volumes_attached.*$vol1_id"
: Poke the bear
nova volume-detach test "$vol1_id"
sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
sleep 10
sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
nova volume-attach test "$vol2_id"
sleep 1
: Measure again
nova show test | grep "volumes_attached.*$vol2_id"
Expected behavior:
The volumes attach/detach/attach properly
Actual behavior:
The second attachment fails, and n-cpu throws the following exception:
Failed to attach volume at mountpoint: /dev/vdb
Traceback (most recent call last):
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
virt_dom.attachDeviceFlags(conf.to_xml(), flags)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
result = proxy_call(self._autowrap, f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
rv = execute(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
six.reraise(c, e, tb)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
rv = meth(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
libvirtError: operation failed: target vdb already exists
Workaround:
"sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the guest to properly detach the device, and also seems to ward off whatever gremlins caused the problem in the first place; i.e., the problem gets much less likely to present itself after firing a virsh command. |
|