Comment 0 for bug 1929710

Revision history for this message
Lee Yarwood (lyarwood) wrote :

Description
===========

The error handling around swap_volume is missing the following failure when calling virDomainGetBlockJobInfo() after the entire device is detached by QEMU (?) after it encounters a job during the block copy job that at first pauses and then somehow resumes the job:

https://8a5fc27780098c5ee1bc-3ac81d180a9c011938b2cbb0293272f3.ssl.cf5.rackcdn.com/790660/5/gate/nova-next/e915ed4/controller/logs/screen-n-cpu.txt

May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver [None req-7cfcd661-29d4-4cc3-bc54-db0e7fed1a6e tempest-TestVolumeSwap-1841575704 tempest-TestVolumeSwap-1841575704-project-admin] Failure rebasing volume /dev/sdb on vdb.: libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver Traceback (most recent call last):
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2107, in _swap_volume
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver while not dev.is_job_complete():
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 800, in is_job_complete
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self.get_job_info()
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 707, in get_job_info
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self._guest._domain.blockJobInfo(self._disk, flags=0)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver result = proxy_call(self._autowrap, f, *args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = execute(f, *args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver six.reraise(c, e, tb)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise value
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = meth(*args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 985, in blockJobInfo
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise libvirtError('virDomainGetBlockJobInfo() failed')
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver

https://zuul.opendev.org/t/openstack/build/e915ed4aeb9346bba83910bd79e9502b/log/controller/logs/libvirt/libvirtd_log.txt

2021-05-26 09:49:40.189+0000: 79419: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-add","arguments":{"node-name":"libvirt-4-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-4-storage"},"id":"libvirt-375"}^M

2021-05-26 09:49:46.154+0000: 79422: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-add","arguments":{"node-name":"libvirt-5-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-5-storage"},"id":"libvirt-379"}^M

2021-05-26 09:49:46.165+0000: 79422: debug : qemuMonitorBlockdevMirror:3112 : jobname=copy-vdb-libvirt-4-format, persistjob=1, device=libvirt-4-format, target=libvirt-5-format, bandwidth=0, granularity=0, buf_size=0, shallow=0

2021-05-26 09:49:46.167+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'created'(1)

2021-05-26 09:49:46.167+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'running'(2)

2021-05-26 09:49:46.763+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'paused'(3)

2021-05-26 09:49:46.763+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'running'(2)

2021-05-26 09:49:46.841+0000: 79417: debug : qemuProcessHandleDeviceDeleted:1362 : Device virtio-disk1 removed from domain 0x7fc4b416b0e0 instance-0000000b

2021-05-26 09:49:47.457+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'aborting'(8)

2021-05-26 09:49:47.458+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'concluded'(9)

2021-05-26 09:49:47.459+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'null'(11)

Steps to reproduce
==================

$ cat queries/virDomainGetBlockJobInfo.yaml
query: >
 message:"virDomainGetBlockJobInfo() failed" AND
 tags:"screen-n-cpu.txt"

$ elastic-recheck-query queries/virDomainGetBlockJobInfo.yaml
total hits: 6
build_branch
  100% master
build_change
  50% 786588
  50% 792322
build_hostids
  50% 1b47a855be51bba01ac6d5e6fdc4859bc17ebe2c8faaeb83392f8ff3 79fb0487675c0137b7ac30f24b5de71c70afb836e46746de770fa0c0
  50% 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54
build_name
  100% nova-next
build_node
  100% ubuntu-focal
build_queue
  100% check
build_status
  100% FAILURE
build_zuul_url
  100% N/A
filename
  100% controller/logs/screen-n-cpu.txt
log_url
  50% https://89bc735e8a094e3d60b7-4f6db7cd5400cfa66e1c80fde6bd4076.ssl.cf1.rackcdn.com/792322/1/check/nova-next/de697b4/controller/logs/screen-n-cpu.txt
  50% https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_035/786588/6/check/nova-next/0357703/controller/logs/screen-n-cpu.txt
loglevel
  100% ERROR
module
  33% nova.compute.manager
  33% nova.virt.libvirt.driver
  33% oslo_messaging.rpc.server
node_provider
  50% ovh-bhs1
  50% rax-iad
port
  50% 48014
  50% 58238
project
  100% openstack/nova
syslog_pid
  50% 107528
  50% 108261
syslog_program
  50% ubuntu-focal-ovh-bhs1-0024748800 nova-compute
  50% ubuntu-focal-rax-iad-0024745546 nova-compute
tags
  100% screen-n-cpu.txt screen oslofmt
voting
  100% 1
zuul_attempts
  100% 1
zuul_executor
  50% ze01.opendev.org
  50% ze07.opendev.org

Expected result
===============
swap_volume at least fails correctly leaving the original device attached.

Actual result
=============
swap_volume fails and the original device appears detached from the device.

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

   master

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

   libvirt + QEMU (no KVM in the gate)

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

   images_type=default=qcow2

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

   N/A

Logs & Configs
==============