Activity log for bug #1914282

Date Who What changed Old value New value Message
2021-02-02 18:23:15 melanie witt bug added bug
2021-02-02 18:23:46 melanie witt summary block copy job sometimes hangs on the last block for several minutes block copy job sometimes hangs on the last block for minutes
2021-02-02 18:26:22 melanie witt description In openstack nova we use the block copy API to copy disks for volume swap requests. In our CI gate we observed that sometimes the block copy job progress will reach the last or next to last block and hang there for several minutes at a time, causing CI jobs to fail as jobs timeout. On the client (nova-compute) side, using the python bindings we see the following in the openstack nova logs: --------------- Jan 21 05:31:02.207785 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 0 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} Jan 21 05:31:55.688227 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 1049624576 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} [...] Jan 21 05:31:55.706698 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 1049624576 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} Jan 21 05:31:56.175248 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 1073741823 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} [...] ~2.5 minutes later, it's still going at current cursor: 1073741823 final cursor: 1073741824 Jan 21 05:34:30.952371 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 1073741823 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} then current cursor == final cursor at 05:34:31.460595 Jan 21 05:34:31.460595 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 1073741824 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} --------------- In this excerpt the cursor reaches the next to last block at Jan 21 05:31:56.175248 and hangs there repeating status at the next to last block until Jan 21 05:34:30.952371 (~2.5 minutes) and then the job shows current cursor == final cursor at Jan 21 05:34:31.460595. In the corresponding qemu log, we see the block copy job report being on the last block for minutes: --------------- 021-01-21 05:31:02.206+0000: 60630: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": true, "len": 1073741824, "offset": 0, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-501"}] 2021-01-21 05:31:02.206+0000: 60630: info : qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fd07813ec80 reply={"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": true, "len": 1073741824, "offset": 0, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-501"} [...] len == offset at 05:31:56.174 2021-01-21 05:31:56.174+0000: 60630: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": true, "len": 1073741824, "offset": 1073741824, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-581"}] 2021-01-21 05:31:56.174+0000: 60630: info : qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fd07813ec80 reply={"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": true, "len": 1073741824, "offset": 1073741824, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-581"} [...] ~2.5 minutes later, still len == offset but it's still going 2021-01-21 05:34:31.459+0000: 60630: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": false, "len": 1073741824, "offset": 1073741824, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}], "id": "libvirt-855"}] 2021-01-21 05:34:31.459+0000: 60630: info : qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fd07813ec80 reply={"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": false, "len": 1073741824, "offset": 1073741824, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}], "id": "libvirt-855"} and then the job finishes soon after 2021-01-21 05:34:31.467+0000: 60630: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-5-format'(domain: 0x7fd070075720,instance-00000013) state changed to 'waiting'(6) 2021-01-21 05:34:31.467+0000: 60630: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-5-format'(domain: 0x7fd070075720,instance-00000013) state changed to 'pending'(7) 2021-01-21 05:34:31.467+0000: 60630: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-5-format'(domain: 0x7fd070075720,instance-00000013) state changed to 'concluded'(9) 2021-01-21 05:34:31.468+0000: 60630: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-5-format'(domain: 0x7fd070075720,instance-00000013) state changed to 'null'(11) 2021-01-21 05:34:31.468+0000: 60634: debug : qemuBlockJobProcessEventConcludedCopyPivot:1221 : copy job 'copy-vdb-libvirt-5-format' on VM 'instance-00000013' pivoted 2021-01-21 05:34:32.292+0000: 60634: debug : qemuDomainObjEndJob:9746 : Stopping job: modify (async=none vm=0x7fd070075720 name=instance-00000013) --------------- Is this normal for a block copy job to hang on the last block like this for minutes at a time? Why doesn't the job close out once offset == len? Thanks for any help you can offer. In openstack nova we use the block copy API to copy disks for volume swap requests. In our CI gate we observed that sometimes the block copy job progress will reach the last or next to last block and hang there for minutes at a time, causing CI jobs to fail as jobs timeout. On the client (nova-compute) side, using the python bindings we see the following in the openstack nova logs: --------------- Jan 21 05:31:02.207785 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 0 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} Jan 21 05:31:55.688227 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 1049624576 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} [...] Jan 21 05:31:55.706698 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 1049624576 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} Jan 21 05:31:56.175248 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 1073741823 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} [...] ~2.5 minutes later, it's still going at current cursor: 1073741823 final cursor: 1073741824 Jan 21 05:34:30.952371 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 1073741823 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} then current cursor == final cursor at 05:34:31.460595 Jan 21 05:34:31.460595 ubuntu-focal-vexxhost-ca-ymq-1-0022641000 nova-compute[93823]: DEBUG nova.virt.libvirt.guest [None req-d6170fbb-e023-4cdb-93dc-a2e9ae9b0a56 tempest-TestVolumeSwap-1117975117 tempest-TestVolumeSwap-1117975117] COPY block job progress, current cursor: 1073741824 final cursor: 1073741824 {{(pid=93823) is_job_complete /opt/stack/nova/nova/virt/libvirt/guest.py:873}} --------------- In this excerpt the cursor reaches the next to last block at Jan 21 05:31:56.175248 and hangs there repeating status at the next to last block until Jan 21 05:34:30.952371 (~2.5 minutes) and then the job shows current cursor == final cursor at Jan 21 05:34:31.460595. In the corresponding qemu log, we see the block copy job report being on the last block for minutes: --------------- 021-01-21 05:31:02.206+0000: 60630: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": true, "len": 1073741824, "offset": 0, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-501"}] 2021-01-21 05:31:02.206+0000: 60630: info : qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fd07813ec80 reply={"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": true, "len": 1073741824, "offset": 0, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-501"} [...] len == offset at 05:31:56.174 2021-01-21 05:31:56.174+0000: 60630: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": true, "len": 1073741824, "offset": 1073741824, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-581"}] 2021-01-21 05:31:56.174+0000: 60630: info : qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fd07813ec80 reply={"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": true, "len": 1073741824, "offset": 1073741824, "status": "running", "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-581"} [...] ~2.5 minutes later, still len == offset but it's still going 2021-01-21 05:34:31.459+0000: 60630: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": false, "len": 1073741824, "offset": 1073741824, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}], "id": "libvirt-855"}] 2021-01-21 05:34:31.459+0000: 60630: info : qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7fd07813ec80 reply={"return": [{"auto-finalize": true, "io-status": "ok", "device": "copy-vdb-libvirt-5-format", "auto-dismiss": false, "busy": false, "len": 1073741824, "offset": 1073741824, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}], "id": "libvirt-855"} and then the job finishes soon after 2021-01-21 05:34:31.467+0000: 60630: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-5-format'(domain: 0x7fd070075720,instance-00000013) state changed to 'waiting'(6) 2021-01-21 05:34:31.467+0000: 60630: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-5-format'(domain: 0x7fd070075720,instance-00000013) state changed to 'pending'(7) 2021-01-21 05:34:31.467+0000: 60630: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-5-format'(domain: 0x7fd070075720,instance-00000013) state changed to 'concluded'(9) 2021-01-21 05:34:31.468+0000: 60630: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-5-format'(domain: 0x7fd070075720,instance-00000013) state changed to 'null'(11) 2021-01-21 05:34:31.468+0000: 60634: debug : qemuBlockJobProcessEventConcludedCopyPivot:1221 : copy job 'copy-vdb-libvirt-5-format' on VM 'instance-00000013' pivoted 2021-01-21 05:34:32.292+0000: 60634: debug : qemuDomainObjEndJob:9746 : Stopping job: modify (async=none vm=0x7fd070075720 name=instance-00000013) --------------- Is this normal for a block copy job to hang on the last block like this for minutes at a time? Why doesn't the job close out once offset == len? Thanks for any help you can offer. Additional log messages have been pasted here: https://pastebin.com/FQRu76Sn and for completeness, these logs were taken from the following openstack nova gate job: https://zuul.opendev.org/t/openstack/build/a078a17aa9924517b329cafc3f54fed4
2021-02-03 16:18:13 melanie witt attachment added bug1914282_log_excerpts.txt https://bugs.launchpad.net/qemu/+bug/1914282/+attachment/5459433/+files/bug1914282_log_excerpts.txt
2021-02-03 17:13:38 Kashyap Chamarthy attachment added QMP_exchange_libvirtd_22Jan2021_NovaCI_dest.tar.xz https://bugs.launchpad.net/qemu/+bug/1914282/+attachment/5459439/+files/QMP_exchange_libvirtd_22Jan2021_NovaCI_dest.tar.xz
2021-05-12 17:59:54 Thomas Huth qemu: status New Incomplete
2021-05-14 10:07:40 Thomas Huth bug watch added https://gitlab.com/qemu-project/qemu/-/issues/287
2021-05-14 10:07:44 Thomas Huth qemu: status Incomplete Invalid