Activity log for bug #1929710

Date Who What changed Old value New value Message
2021-05-26 13:36:51 Lee Yarwood bug added bug
2021-05-27 08:19:51 Lee Yarwood description Description =========== The error handling around swap_volume is missing the following failure when calling virDomainGetBlockJobInfo() after the entire device is detached by QEMU (?) after it encounters a job during the block copy job that at first pauses and then somehow resumes the job: https://8a5fc27780098c5ee1bc-3ac81d180a9c011938b2cbb0293272f3.ssl.cf5.rackcdn.com/790660/5/gate/nova-next/e915ed4/controller/logs/screen-n-cpu.txt May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver [None req-7cfcd661-29d4-4cc3-bc54-db0e7fed1a6e tempest-TestVolumeSwap-1841575704 tempest-TestVolumeSwap-1841575704-project-admin] Failure rebasing volume /dev/sdb on vdb.: libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver Traceback (most recent call last): May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2107, in _swap_volume May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver while not dev.is_job_complete(): May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 800, in is_job_complete May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self.get_job_info() May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 707, in get_job_info May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self._guest._domain.blockJobInfo(self._disk, flags=0) May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver result = proxy_call(self._autowrap, f, *args, **kwargs) May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = execute(f, *args, **kwargs) May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver six.reraise(c, e, tb) May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise value May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = meth(*args, **kwargs) May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 985, in blockJobInfo May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise libvirtError('virDomainGetBlockJobInfo() failed') May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver https://zuul.opendev.org/t/openstack/build/e915ed4aeb9346bba83910bd79e9502b/log/controller/logs/libvirt/libvirtd_log.txt 2021-05-26 09:49:40.189+0000: 79419: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-add","arguments":{"node-name":"libvirt-4-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-4-storage"},"id":"libvirt-375"}^M 2021-05-26 09:49:46.154+0000: 79422: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-add","arguments":{"node-name":"libvirt-5-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-5-storage"},"id":"libvirt-379"}^M 2021-05-26 09:49:46.165+0000: 79422: debug : qemuMonitorBlockdevMirror:3112 : jobname=copy-vdb-libvirt-4-format, persistjob=1, device=libvirt-4-format, target=libvirt-5-format, bandwidth=0, granularity=0, buf_size=0, shallow=0 2021-05-26 09:49:46.167+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'created'(1) 2021-05-26 09:49:46.167+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'running'(2) 2021-05-26 09:49:46.763+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'paused'(3) 2021-05-26 09:49:46.763+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'running'(2) 2021-05-26 09:49:46.841+0000: 79417: debug : qemuProcessHandleDeviceDeleted:1362 : Device virtio-disk1 removed from domain 0x7fc4b416b0e0 instance-0000000b 2021-05-26 09:49:47.457+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'aborting'(8) 2021-05-26 09:49:47.458+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'concluded'(9) 2021-05-26 09:49:47.459+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'null'(11) Steps to reproduce ================== $ cat queries/virDomainGetBlockJobInfo.yaml query: > message:"virDomainGetBlockJobInfo() failed" AND tags:"screen-n-cpu.txt" $ elastic-recheck-query queries/virDomainGetBlockJobInfo.yaml total hits: 6 build_branch 100% master build_change 50% 786588 50% 792322 build_hostids 50% 1b47a855be51bba01ac6d5e6fdc4859bc17ebe2c8faaeb83392f8ff3 79fb0487675c0137b7ac30f24b5de71c70afb836e46746de770fa0c0 50% 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54 build_name 100% nova-next build_node 100% ubuntu-focal build_queue 100% check build_status 100% FAILURE build_zuul_url 100% N/A filename 100% controller/logs/screen-n-cpu.txt log_url 50% https://89bc735e8a094e3d60b7-4f6db7cd5400cfa66e1c80fde6bd4076.ssl.cf1.rackcdn.com/792322/1/check/nova-next/de697b4/controller/logs/screen-n-cpu.txt 50% https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_035/786588/6/check/nova-next/0357703/controller/logs/screen-n-cpu.txt loglevel 100% ERROR module 33% nova.compute.manager 33% nova.virt.libvirt.driver 33% oslo_messaging.rpc.server node_provider 50% ovh-bhs1 50% rax-iad port 50% 48014 50% 58238 project 100% openstack/nova syslog_pid 50% 107528 50% 108261 syslog_program 50% ubuntu-focal-ovh-bhs1-0024748800 nova-compute 50% ubuntu-focal-rax-iad-0024745546 nova-compute tags 100% screen-n-cpu.txt screen oslofmt voting 100% 1 zuul_attempts 100% 1 zuul_executor 50% ze01.opendev.org 50% ze07.opendev.org Expected result =============== swap_volume at least fails correctly leaving the original device attached. Actual result ============= swap_volume fails and the original device appears detached from the device. Environment =========== 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ master 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? libvirt + QEMU (no KVM in the gate) 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? images_type=default=qcow2 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs ============== Description =========== The error handling around swap_volume is missing the following failure when calling virDomainGetBlockJobInfo() after the entire device is detached by QEMU (?) after it encounters a failure during the block copy job that at first pauses and then somehow resumes: https://8a5fc27780098c5ee1bc-3ac81d180a9c011938b2cbb0293272f3.ssl.cf5.rackcdn.com/790660/5/gate/nova-next/e915ed4/controller/logs/screen-n-cpu.txt May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver [None req-7cfcd661-29d4-4cc3-bc54-db0e7fed1a6e tempest-TestVolumeSwap-1841575704 tempest-TestVolumeSwap-1841575704-project-admin] Failure rebasing volume /dev/sdb on vdb.: libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver Traceback (most recent call last): May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2107, in _swap_volume May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver while not dev.is_job_complete(): May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 800, in is_job_complete May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self.get_job_info() May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 707, in get_job_info May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self._guest._domain.blockJobInfo(self._disk, flags=0) May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver result = proxy_call(self._autowrap, f, *args, **kwargs) May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = execute(f, *args, **kwargs) May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver six.reraise(c, e, tb) May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise value May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = meth(*args, **kwargs) May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 985, in blockJobInfo May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise libvirtError('virDomainGetBlockJobInfo() failed') May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver https://zuul.opendev.org/t/openstack/build/e915ed4aeb9346bba83910bd79e9502b/log/controller/logs/libvirt/libvirtd_log.txt 2021-05-26 09:49:40.189+0000: 79419: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-add","arguments":{"node-name":"libvirt-4-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-4-storage"},"id":"libvirt-375"}^M 2021-05-26 09:49:46.154+0000: 79422: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-add","arguments":{"node-name":"libvirt-5-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-5-storage"},"id":"libvirt-379"}^M 2021-05-26 09:49:46.165+0000: 79422: debug : qemuMonitorBlockdevMirror:3112 : jobname=copy-vdb-libvirt-4-format, persistjob=1, device=libvirt-4-format, target=libvirt-5-format, bandwidth=0, granularity=0, buf_size=0, shallow=0 2021-05-26 09:49:46.167+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'created'(1) 2021-05-26 09:49:46.167+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'running'(2) 2021-05-26 09:49:46.763+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'paused'(3) 2021-05-26 09:49:46.763+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'running'(2) 2021-05-26 09:49:46.841+0000: 79417: debug : qemuProcessHandleDeviceDeleted:1362 : Device virtio-disk1 removed from domain 0x7fc4b416b0e0 instance-0000000b 2021-05-26 09:49:47.457+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'aborting'(8) 2021-05-26 09:49:47.458+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'concluded'(9) 2021-05-26 09:49:47.459+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'null'(11) Steps to reproduce ================== $ cat queries/virDomainGetBlockJobInfo.yaml query: >  message:"virDomainGetBlockJobInfo() failed" AND  tags:"screen-n-cpu.txt" $ elastic-recheck-query queries/virDomainGetBlockJobInfo.yaml total hits: 6 build_branch   100% master build_change   50% 786588   50% 792322 build_hostids   50% 1b47a855be51bba01ac6d5e6fdc4859bc17ebe2c8faaeb83392f8ff3 79fb0487675c0137b7ac30f24b5de71c70afb836e46746de770fa0c0   50% 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54 build_name   100% nova-next build_node   100% ubuntu-focal build_queue   100% check build_status   100% FAILURE build_zuul_url   100% N/A filename   100% controller/logs/screen-n-cpu.txt log_url   50% https://89bc735e8a094e3d60b7-4f6db7cd5400cfa66e1c80fde6bd4076.ssl.cf1.rackcdn.com/792322/1/check/nova-next/de697b4/controller/logs/screen-n-cpu.txt   50% https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_035/786588/6/check/nova-next/0357703/controller/logs/screen-n-cpu.txt loglevel   100% ERROR module   33% nova.compute.manager   33% nova.virt.libvirt.driver   33% oslo_messaging.rpc.server node_provider   50% ovh-bhs1   50% rax-iad port   50% 48014   50% 58238 project   100% openstack/nova syslog_pid   50% 107528   50% 108261 syslog_program   50% ubuntu-focal-ovh-bhs1-0024748800 nova-compute   50% ubuntu-focal-rax-iad-0024745546 nova-compute tags   100% screen-n-cpu.txt screen oslofmt voting   100% 1 zuul_attempts   100% 1 zuul_executor   50% ze01.opendev.org   50% ze07.opendev.org Expected result =============== swap_volume at least fails correctly leaving the original device attached. Actual result ============= swap_volume fails and the original device appears detached from the device. Environment =========== 1. Exact version of OpenStack you are running. See the following   list for all releases: http://docs.openstack.org/releases/    master 2. Which hypervisor did you use?    (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)    What's the version of that?    libvirt + QEMU (no KVM in the gate) 2. Which storage type did you use?    (For example: Ceph, LVM, GPFS, ...)    What's the version of that?    images_type=default=qcow2 3. Which networking type did you use?    (For example: nova-network, Neutron with OpenVSwitch, ...)    N/A Logs & Configs ==============
2021-05-27 08:20:23 Lee Yarwood bug task added qemu
2021-06-10 13:59:44 Lee Yarwood bug task deleted qemu
2021-06-10 14:01:12 Lee Yarwood nova: importance Undecided Medium
2021-07-09 10:01:25 Lee Yarwood nova: status New Triaged
2021-07-09 10:01:29 Lee Yarwood nova: assignee Lee Yarwood (lyarwood)
2022-05-09 22:23:53 melanie witt bug watch added https://gitlab.com/qemu-project/qemu/-/issues/287
2022-05-13 10:57:59 Lee Yarwood nova: assignee Lee Yarwood (lyarwood)