Comment 6 for bug 1536453

Revision history for this message
Hiroyuki Eguchi (h-eguchi) wrote :

I faced a same error when executing 'nova volume-update' in mitaka version with LVMVolumeDriver and found out what root cause is.
To begin with, current code has a problem that there is no exception code.
So, we can not find out what root cause is.
I've modified nova/virt/libvirt/driver.py like this:

@@ -1394,6 +1395,9 @@
                 while dev.wait_for_job(wait_for_job_clean=True):
                     time.sleep(0.5)
                 dev.resize(resize_to * units.Gi / units.Ki)
+ except Exception:
+ with excutils.save_and_reraise_exception():
+ LOG.exception(_LE("Failed to swap volume"))
         finally:
             self._host.write_instance_config(xml)

<error logs after patching above>

 Traceback (most recent call last):
   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1389, in _swap_volume
     dev.abort_job(pivot=True)
   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 500, in abort_job
     self._guest._domain.blockJobAbort(self._disk, flags=flags)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
     result = proxy_call(self._autowrap, f, *args, **kwargs)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
     rv = execute(f, *args, **kwargs)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
     six.reraise(c, e, tb)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
     rv = meth(*args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort
     if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
 libvirtError: block copy still active: disk 'vdb' not ready for pivot yet

As you can see, a root cause is a abort_job method.
Before executing this method, nova executes a wait_for_job method to wait for job completion, but it seems that it doesn't work correctly.

I added a following sleep method as a test and confirmed a 'nova volume-update' command works correctly.

@@ -1386,6 +1386,7 @@
             while dev.wait_for_job():
                 time.sleep(0.5)

+ time.sleep(10)
             dev.abort_job(pivot=True)
             if resize_to:
                 # NOTE(alex_xu): domain.blockJobAbort isn't sync call. This

I don't know the reason why wait_for_job method doesn't work correctly.
Does anyone know it?

<my environment>
libvirt-1.2.17-13.el7_2.5.x86_64
openstack-nova-compute-13.1.1-2.el7ost.noarch
qemu-kvm-common-rhev-2.3.0-31.el7_2.21.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64