swap volume failed because block copy still active

Bug #1536453 reported by jingtao liang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

version : kilo

description:

I migrate a volume which is in in-use status on lvm to another VG . It will be error in most time . I don‘t know why .
Becasuse when I don't use the volume on lvm to migrate .It will be ok.

product steps:

1 create a volume on VG
2 create a vm
3 attach the volume to the vm
4 migrate the volume to another VG.

error logs:

2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5582, in _swap_volume
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] resize_to)
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1232, in swap_volume
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] self._swap_volume(virt_dom, disk_dev, conf.source_path, resize_to)
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1203, in _swap_volume
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] self._conn.defineXML(xml)
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] result = proxy_call(self._autowrap, f, *args, **kwargs)
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] rv = execute(f, *args, **kwargs)
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] six.reraise(c, e, tb)
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] rv = meth(*args, **kwargs)
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3632, in defineXML
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd] libvirtError: block copy still active: domain has active block job
2016-01-21 11:13:40.918 27863 TRACE nova.compute.manager [instance: 35158835-9b79-49e9-9a92-e6bbfaba50dd]
2016-01-21 11:13:41.686 27863 INFO nova.scheduler.client.report [req-c59264b3-9223-472e-b1c4-bfc3f5643393 9134a1a11e5441c29e37757231f36450 32bfe3124bb2478aad3e6aa1cee09f14 - - -] Compute_service record updated for ('2C5_19_CG2', '2C5_19_CG2')
2016-01-21 11:13:41.687 27863 ERROR oslo_messaging.rpc.dispatcher [req-c59264b3-9223-472e-b1c4-bfc3f5643393 9134a1a11e5441c29e37757231f36450 32bfe3124bb2478aad3e6aa1cee09f14 - - -] Exception during message handling: block copy still active: domain has active block job
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher executor_callback))
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher executor_callback)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 8262, in swap_volume
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher new_volume_id)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher payload)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 376, in decorated_function
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher LOG.warning(msg, e, instance_uuid=instance_uuid)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 347, in decorated_function
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 404, in decorated_function
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info())
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 392, in decorated_function
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5634, in swap_volume
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher new_volume_id)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5601, in _swap_volume
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher self.volume_api.unreserve_volume(context, new_volume_id)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5582, in _swap_volume
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher resize_to)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1232, in swap_volume
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher self._swap_volume(virt_dom, disk_dev, conf.source_path, resize_to)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1203, in _swap_volume
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher self._conn.defineXML(xml)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher result = proxy_call(self._autowrap, f, *args, **kwargs)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher rv = execute(f, *args, **kwargs)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher six.reraise(c, e, tb)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher rv = meth(*args, **kwargs)
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3632, in defineXML
2016-01-21 11:13:41.687 27863 TRACE oslo_messaging.rpc.dispatcher if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

Bug skimming result
===================
This report doesn't have enough information to allow us a proper
analysis of the issue you described. Please update the description
with the following information:

* The version of libvirt/kvm/qemu:
    $ virsh -c qemu:///system version --daemon
* The exact version of Nova:
       If this is from a distro please provide
           $ dpkg -l | grep nova
           or
           $ rpm -ql | grep nova
       If this is from git, please provide
           $ git log -1

There is a chance that you hit the libvirt bug [1].

I'm changing the status of this report to "Incomplete". When you have
provided the asked questions, please switch it back to "New".

If you have questions, I'm available in the IRC channel #openstack-nova
under the name "markus_z".

References
==========
[1] https://www.redhat.com/archives/libvirt-users/2015-January/msg00042.html

Changed in nova:
status: New → Incomplete
Revision history for this message
jingtao liang (liang-jingtao) wrote :

thank you, I have checked the version as following:

* The version of libvirt/kvm/qemu:

[root@2C5_10_DELL05 nova(keystone_admin)]# virsh -c qemu:///system version --daemon
Compiled against library: libvirt 1.2.15
Using library: libvirt 1.2.15
Using API: QEMU 1.2.15
Running hypervisor: QEMU 2.2.1
Running against daemon: 1.2.15

The exact version of Nova:

[root@2C5_10_DELL05 nova(keystone_admin)]# rpm -qa | grep nova
openstack-nova-compute-2015.1.0-1.1.393.noarch
openstack-nova-monitor-2015.1.0-1.1.393.noarch
openstack-nova-api-2015.1.0-1.1.393.noarch
python-novaclient-2.23.0-1.1.393.noarch

please help me check it

Changed in nova:
status: Incomplete → New
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Hi Jingtao Liang,

A few comments:

- Your test case isn't clear, can you include the *exact* commands you've
  invoked, so that I (or others) can try to reproduce the issue locally?

- Second, if you're able to consistenlty reproduce this issue, can you
  add these libvirt log filters (check what values you have currently,
  and replace them with below) in /etc/libvirtd.conf:

    log_filters="3:event 3:file 1:util 1:security 1:qemu 1:libvirt"
    log_outputs="1:file:/var/log/libvirt/libvirtd.log"

  Restart libvirt daemon, repeat your test, and attach the libvirtd.log
  to this bug.

- As additional information, when you see this error, find the block
  device (from `virsh domblklist $UUID`) that is being migrated (when
  you say "migrate the volume to another VG."), and run (assuming the
  block device name is 'vda') the below command & see if it spews out
  some information (it tells if you there really are any block
  operations in progress):

      $ virsh blockjob $UUID-of-NovaInstance vda --info.

  Another thing to try, to see what events libvirt is pusing out: (a)
  Before you perform the test, run this command in one shell:

    $ virsh qemu-monitor-event $UUID-of-NovaInstance --pretty --loop

  Which will spew out some events, (e.g. BLOCK_JOB_READY,
  BLOCK_JOB_COMPLETED (this one is unlikely in this bug, as your error
  still says there's an active block copy job), or BLOCK_JOB_CANCELLED
  (if the copy job is aborted with CTL+C)

- Lastly, if time permits, can you also try reproducing this with
  Liberty release (it's the latest stable)?

Changed in nova:
status: New → Incomplete
Revision history for this message
jingtao liang (liang-jingtao) wrote :

tks,I found lvm is esay to reproduce the bug. And our team is trying to find the solution.when I get the reason ,i will tell you .

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
Revision history for this message
Hiroyuki Eguchi (h-eguchi) wrote :

I faced a same error when executing 'nova volume-update' in mitaka version with LVMVolumeDriver and found out what root cause is.
To begin with, current code has a problem that there is no exception code.
So, we can not find out what root cause is.
I've modified nova/virt/libvirt/driver.py like this:

@@ -1394,6 +1395,9 @@
                 while dev.wait_for_job(wait_for_job_clean=True):
                     time.sleep(0.5)
                 dev.resize(resize_to * units.Gi / units.Ki)
+ except Exception:
+ with excutils.save_and_reraise_exception():
+ LOG.exception(_LE("Failed to swap volume"))
         finally:
             self._host.write_instance_config(xml)

<error logs after patching above>

 Traceback (most recent call last):
   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1389, in _swap_volume
     dev.abort_job(pivot=True)
   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 500, in abort_job
     self._guest._domain.blockJobAbort(self._disk, flags=flags)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
     result = proxy_call(self._autowrap, f, *args, **kwargs)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
     rv = execute(f, *args, **kwargs)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
     six.reraise(c, e, tb)
   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
     rv = meth(*args, **kwargs)
   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 733, in blockJobAbort
     if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
 libvirtError: block copy still active: disk 'vdb' not ready for pivot yet

As you can see, a root cause is a abort_job method.
Before executing this method, nova executes a wait_for_job method to wait for job completion, but it seems that it doesn't work correctly.

I added a following sleep method as a test and confirmed a 'nova volume-update' command works correctly.

@@ -1386,6 +1386,7 @@
             while dev.wait_for_job():
                 time.sleep(0.5)

+ time.sleep(10)
             dev.abort_job(pivot=True)
             if resize_to:
                 # NOTE(alex_xu): domain.blockJobAbort isn't sync call. This

I don't know the reason why wait_for_job method doesn't work correctly.
Does anyone know it?

<my environment>
libvirt-1.2.17-13.el7_2.5.x86_64
openstack-nova-compute-13.1.1-2.el7ost.noarch
qemu-kvm-common-rhev-2.3.0-31.el7_2.21.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64

Revision history for this message
Hiroyuki Eguchi (h-eguchi) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.