Failing device detachments on Focal

Bug #1882521 reported by Dr. Jens Harbott on 2020-06-08
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Undecided
Unassigned
OpenStack Compute (nova)
High
Lee Yarwood

Bug Description

The following tests are failing consistently when deploying devstack on Focal in the CI, see https://review.opendev.org/734029 for detailed logs:

tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume
tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume
tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached
tearDownClass (tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest)

Sample extract from nova-compute log:

Jun 08 08:48:24.384559 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: DEBUG oslo.service.loopingcall [-] Exception which is in the suggested list of exceptions occurred while invoking function: nova.virt.libvirt.guest.Guest.detach_device_with_retry.<locals>._do_wait_and_retry_detach. {{(pid=82495) _func /usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:410}}
Jun 08 08:48:24.384862 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: DEBUG oslo.service.loopingcall [-] Cannot retry nova.virt.libvirt.guest.Guest.detach_device_with_retry.<locals>._do_wait_and_retry_detach upon suggested exception since retry count (7) reached max retry count (7). {{(pid=82495) _func /usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py:416}}
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall [-] Dynamic interval looping call 'oslo_service.loopingcall.RetryDecorator.__call__.<locals>._func' failed: nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to detach the device from the live config.
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall Traceback (most recent call last):
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 150, in _run_loop
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 428, in _func
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall return self._sleep_time
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall self.force_reraise()
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall six.reraise(self.type_, self.value, self.tb)
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall raise value
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/usr/local/lib/python3.8/dist-packages/oslo_service/loopingcall.py", line 407, in _func
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall result = f(*args, **kwargs)
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 453, in _do_wait_and_retry_detach
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall raise exception.DeviceDetachFailed(
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to detach the device from the live config.
Jun 08 08:48:24.388855 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: ERROR oslo.service.loopingcall
Jun 08 08:48:24.390684 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: WARNING nova.virt.block_device [None req-8af75b5f-2587-4ce7-9523-d2902eb45a38 tempest-ServerRescueNegativeTestJSON-1578800383 tempest-ServerRescueNegativeTestJSON-1578800383] [instance: 76f86b1f-8b11-44e6-b718-eda3e7e18937] Guest refused to detach volume 6b0cac03-d6d4-48ae-bf56-06de389c0869: nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to detach the device from the live config.

Dr. Jens Harbott (j-harbott) wrote :

There are also some warnings like this

Jun 08 08:37:13.280300 ubuntu-focal-rax-dfw-0017012548 nova-compute[82495]: DEBUG oslo_concurrency.processutils [None req-bc4635cd-81f9-435d-b60b-fdd64dffa958 None None] Running cmd (subprocess): /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /opt/stack/data/nova/instances/001ba6d1-94e8-4108-9b2c-5ee92df425b9/disk --force-share --output=json {{(pid=82495) execute /usr/local/lib/python3.8/dist-packages/oslo_concurrency/processutils.py:371}}
Jun 08 08:37:13.288934 ubuntu-focal-rax-dfw-0017012548 nova-compute[91401]: Exception ignored in: <function _after_fork at 0x7f0fb37e3b80>
Jun 08 08:37:13.288934 ubuntu-focal-rax-dfw-0017012548 nova-compute[91401]: Traceback (most recent call last):
Jun 08 08:37:13.288934 ubuntu-focal-rax-dfw-0017012548 nova-compute[91401]: File "/usr/lib/python3.8/threading.py", line 1454, in _after_fork
Jun 08 08:37:13.288934 ubuntu-focal-rax-dfw-0017012548 nova-compute[91401]: assert len(_active) == 1
Jun 08 08:37:13.288934 ubuntu-focal-rax-dfw-0017012548 nova-compute[91401]: AssertionError:

but I'm not sure whether they might be related, something different, or just purely cosmetical. They also appear for other services like c-vol, so not nova-specific. See also https://bugs.launchpad.net/bugs/1863021

Ghanshyam Mann (ghanshyammann) wrote :

I cannot spot the clear issue from nova or cinder flow of failed volume. Issue happening during the test cleanup when attached volume from rescued server is requested for detach

tempest log:

Test performed all operation till unrescue the server and server became Active with same attachment it had before rescue.

- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/job-output.txt#28627

Now cleanup started and test requested the volume detach. It waiting for volume status in 'detaching' and after lot of time volume
status changed to 'in-use'

- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/job-output.txt#29157-29168

Trying to find the flow of same volume in cinder log:

volume is created fine:

- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/controller/logs/screen-c-vol.txt#8021

iscsi target is set correctly

- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/controller/logs/screen-c-vol.txt#8150

attachment is updated fine
- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/controller/logs/screen-c-vol.txt#8268

Detach request

n-api log: req-baec6727-9502-40d0-b4c4-f4e0e56973e0
- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/controller/logs/screen-n-api.txt#32946

-c-api
- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/controller/logs/screen-c-api.txt#22785

detaching of volume is completed
- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/controller/logs/screen-c-api.txt#22798

removed the iscsi target
- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/controller/logs/screen-c-vol.txt#11495

In n-cpu log i am seeing another test trying to detach volume from mountpoint /dev/vdb and failing but it is different server so not sure
if that is any issue?

Another test req-4542a8aa-6deb-434c-9905-99d3eb44f029 tempest-AttachVolumeMultiAttachTest-1857486939

- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/controller/logs/screen-n-cpu.txt#49802

failing tests req-baec6727-9502-40d0-b4c4-f4e0e56973e0 tempest-ServerStableDeviceRescueTest-543821642
- https://zuul.opendev.org/t/openstack/build/9290c83e18a741a5bdab4e28de5eedb7/log/controller/logs/screen-n-cpu.txt#49785

Changed in nova:
importance: Undecided → High
Ghanshyam Mann (ghanshyammann) wrote :

adding cinder also if something from cinder side during detachment.

sean mooney (sean-k-mooney) wrote :
Download full text (5.4 KiB)

we have a similar bug downstream against queens in the other direction alos
https://bugzilla.redhat.com/show_bug.cgi?id=1838786

in the downstream bug the failure is on attach
libvirtError: Requested operation is not valid: target vdf already exists

where as upstream we have "nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to detach the device from the live config."

in both caes its as if the live domain view and nova's/cinder'd view or the domain are getting out of sync with each other

the call trace upstream traceback for detach looks like this

 Traceback (most recent call last):
   File "/opt/stack/nova/nova/virt/block_device.py", line 328, in driver_detach
     virt_driver.detach_volume(context, connection_info, instance, mp,
   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2004, in detach_volume
     wait_for_detach = guest.detach_device_with_retry(guest.get_disk,
   File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 425, in detach_device_with_retry
     _try_detach_device(conf, persistent, live)
   File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 414, in _try_detach_device
     ctx.reraise = True
   File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
     self.force_reraise()
   File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
     six.reraise(self.type_, self.value, self.tb)
   File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
     raise value
   File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 387, in _try_detach_device
     self.detach_device(conf, persistent=persistent, live=live)
   File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 475, in detach_device
     self._domain.detachDeviceFlags(device_xml, flags=flags)
   File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit
     result = proxy_call(self._autowrap, f, *args, **kwargs)
   File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call
     rv = execute(f, *args, **kwargs)
   File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute
     six.reraise(c, e, tb)
   File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
     raise value
   File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker
     rv = meth(*args, **kwargs)
   File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 1408, in detachDeviceFlags
     if ret == -1: raise libvirtError ('virDomainDetachDeviceFlags() failed', dom=self)
 libvirt.libvirtError: device not found: no target device vdb

and the down stream traceback for attach looks like

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5433, in _attach_volume
    do_driver_attach=True)
  File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 46, in wrapped
    ret_val = method(obj, context, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 624, in attach
    virt_driver, do_driver_attach)
  File "/usr/lib/python2.7/site-packages...

Read more...

Lee Yarwood (lyarwood) wrote :

I've marked this as confirmed and bumped the importance given the impending move to Focal for all testing in the Victoria release.

C#4 is unrelated, I've continued to see detach issues on Bionic while using QEMU 4.0.0 and libvirt v5.0.0 however I've been unable to reproduce outside of upstream CI.

Each time this has been hit however it appears that the Guest OS (cirros) isn't able to react to the ACPI request to detach the disk device. This could simply be another case of the instances needing to be given more resources for these requests to be served quickly enough to satisfy Nova or relaxing the timeout within Nova.

Changed in nova:
importance: High → Critical
assignee: nobody → Lee Yarwood (lyarwood)
status: New → Confirmed

Related fix proposed to branch: master
Review: https://review.opendev.org/749929

Lee Yarwood (lyarwood) wrote :

So I'm even more convinced this is a host resource (I'd assume vCPU?) issue as I can reproduce this consistently on a 20.04 virtual machine with the same resources as our CI instances when running `$ tox -e full` in tempest, `$ tox -e full-serial` doesn't reproduce the issue at all. I'm going to try limiting things with --concurrency and try a slightly larger guest to see if that helps.

Kashyap Chamarthy (kashyapc) wrote :

Some notes based my understanding (and from a brief chat with libvirt/QEMU developers):

- DEVICE_DELETED is the event that QEMU sends to libvirt, *once* the device was removed by the guest, so that libvirt can clean-up. So if we see DEVICE_DELETED that means the device was successfully detached from QEMU's point of view (therefore, from the guest's PoV, too)

- The presence of the '/sys/module/pci_hotplug/' directory in the guest confirms that it is capable of handling hotplug/hotunplug events. (And Lee confirmed on IRC that the CirrOS guest _does_ have this directory)

So, if you _can_ see DEVICE_DELETED, then it sounds like the problem is somewhere _else_ than the guest OS.

Kashyap Chamarthy (kashyapc) wrote :

Aside:

Also, a libvirt dev suggested to get the communication between libvirt and the QEMU monitor. We should have them in the CI by default.

(To manually enable: https://kashyapc.fedorapeople.org/virt/openstack/request-nova-libvirt-qemu-debug-logs.txt)

Kashyap Chamarthy (kashyapc) wrote :

Combing through the libvirtd.log (https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c3a/734029/2/check/devstack-platform-focal/c3ab542/controller/logs/libvirt/libvirtd_log.txt),

(1) QEMU sent the 'device_del' command (to detach the device):

---
2020-09-03 20:01:52.711+0000: 65330: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7f021c0f3b30 msg={"execute":"device_del","arguments":{"id":"virtio-disk1"},"id":"libvirt-372"}
---

(2) The reply was 'success'; good:

---
2020-09-03 20:01:52.714+0000: 65328: info : qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7f021c0f3b30 reply={"return": {}, "id": "libvirt-372"}
---

(3) And QEMU even emits the event DEVICE_DELETED:

---
2020-09-03 20:01:53.019+0000: 65328: info : qemuMonitorJSONIOProcessLine:234 : QEMU_MONITOR_RECV_EVENT: mon=0x7f021c0f3b30 event={"timestamp": {"seconds": 1599163313, "microseconds": 18874},
 "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-disk1/virtio-backend"}}
---

So far so good ...

(4) ... but then, we see this "missing device in device deleted event"

---
2020-09-03 20:01:53.019+0000: 65328: debug : qemuMonitorJSONIOProcessEvent:205 : handle DEVICE_DELETED handler=0x7f0230572840 data=0x55d556edf3c0
2020-09-03 20:01:53.019+0000: 65328: debug : qemuMonitorJSONHandleDeviceDeleted:1287 : missing device in device deleted event
---

I'm not entirely sure of the significance (or lack thereof) of the above.

Kashyap Chamarthy (kashyapc) wrote :

Ignore the less interesting comment#11, the more "interesting" bit from the libvirtd.log is here:

libvirt asks QEMU to execute 'device_del' (i.e. to detach the device from the guest):

---
2020-09-03 19:58:35.441+0000: 65331: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7f021c0b9c70 msg={"execute":"device_del","arguments":{"id":"virtio-disk1"},"id":"libvirt-399"}
 fd=-1
---

But the reply from QEMU is failure to detach the device:

---
2020-09-03 19:58:35.443+0000: 65328: info : qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7f021c0b9c70 reply={"id": "libvirt-399", "error": {"class": "DeviceNotFound", "d
esc": "Device 'virtio-disk1' not found"}}

[...]

93e20 name=instance-0000007a)
2020-09-03 19:58:35.443+0000: 65331: debug : qemuDomainDeleteDevice:128 : Detaching of device virtio-disk1 failed and no event arrived
---

I learn the above "... no event arrived" means QEMU didn't tell libvirt which device was deleted.

I still don't have a robust answer to the root cause yet.

Lee Yarwood (lyarwood) wrote :

After talking to libvirt/QEMU folks yesterday I've raised the following bug:

Second DEVICE_DELETED event missing during virtio-blk disk device detach
https://bugs.launchpad.net/qemu/+bug/1894804

I'm trying to reproduce this on Fedora today to also raise this downstream in bugzilla.

tags: added: gate-failure
tags: added: victoria-rc-potential
Lee Yarwood (lyarwood) on 2020-09-10
Changed in nova:
importance: Critical → High

Related fix proposed to branch: master
Review: https://review.opendev.org/752654

Reviewed: https://review.opendev.org/752654
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=57ac83d4d71c903bbaf88a9b6b88a86916c1767c
Submitter: Zuul
Branch: master

commit 57ac83d4d71c903bbaf88a9b6b88a86916c1767c
Author: Lee Yarwood <email address hidden>
Date: Fri Sep 18 10:45:21 2020 +0100

    releasenote: Add known issue for bug #1894804

    Related-Bug: #1882521
    Change-Id: Ib9059dde41b0a07144ffa26552577308b1ffc9e1

Fix proposed to branch: master
Review: https://review.opendev.org/755799

Changed in nova:
status: Confirmed → In Progress

Change abandoned by Lee Yarwood (<email address hidden>) on branch: master
Review: https://review.opendev.org/755526

Reviewed: https://review.opendev.org/755799
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dd1e6d4b0cee465fd89744e306fcd25228b3f7cc
Submitter: Zuul
Branch: master

commit dd1e6d4b0cee465fd89744e306fcd25228b3f7cc
Author: Lee Yarwood <email address hidden>
Date: Fri Oct 2 15:11:25 2020 +0100

    libvirt: Increase incremental and max sleep time during device detach

    Bug #1894804 outlines how DEVICE_DELETED events were often missing from
    QEMU on Focal based OpenStack CI hosts as originally seen in bug
     #1882521. This has eventually been tracked down to some undefined QEMU
    behaviour when a new device_del QMP command is received while another is
    still being processed, causing the original attempt to be aborted.

    We hit this race in slower OpenStack CI envs as n-cpu rather crudely
    retries attempts to detach devices using the RetryDecorator from
    oslo.service. The default incremental sleep time currently being tight
    enough to ensure QEMU is still processing the first device_del request
    on these slower CI hosts when n-cpu asks libvirt to retry the detach,
    sending another device_del to QEMU hitting the above behaviour.

    Additionally we have also seen the following check being hit when
    testing with QEMU >= v5.0.0. This check now rejects overlapping
    device_del requests in QEMU rather than aborting the original:

    https://github.com/qemu/qemu/commit/cce8944cc9efab47d4bf29cfffb3470371c3541b

    This change aims to avoid this situation entirely by raising the default
    incremental sleep time between detach requests from 2 seconds to 10,
    leaving enough time for the first attempt to complete. The overall
    maximum sleep time is also increased from 30 to 60 seconds.

    Future work will aim to entirely remove this retry logic with a libvirt
    event driven approach, polling for the the
    VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED and
    VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED events before retrying.

    Finally, the cleanup of unused arguments in detach_device_with_retry is
    left for a follow up change in order to keep this initial change small
    enough to quickly backport.

    Closes-Bug: #1882521
    Related-Bug: #1894804
    Change-Id: Ib9ed7069cef5b73033351f7a78a3fb566753970d

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/757305
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4819f694b2e2d5688fdac7e850f1a6c592253d6b
Submitter: Zuul
Branch: stable/victoria

commit 4819f694b2e2d5688fdac7e850f1a6c592253d6b
Author: Lee Yarwood <email address hidden>
Date: Fri Oct 2 15:11:25 2020 +0100

    libvirt: Increase incremental and max sleep time during device detach

    Bug #1894804 outlines how DEVICE_DELETED events were often missing from
    QEMU on Focal based OpenStack CI hosts as originally seen in bug
     #1882521. This has eventually been tracked down to some undefined QEMU
    behaviour when a new device_del QMP command is received while another is
    still being processed, causing the original attempt to be aborted.

    We hit this race in slower OpenStack CI envs as n-cpu rather crudely
    retries attempts to detach devices using the RetryDecorator from
    oslo.service. The default incremental sleep time currently being tight
    enough to ensure QEMU is still processing the first device_del request
    on these slower CI hosts when n-cpu asks libvirt to retry the detach,
    sending another device_del to QEMU hitting the above behaviour.

    Additionally we have also seen the following check being hit when
    testing with QEMU >= v5.0.0. This check now rejects overlapping
    device_del requests in QEMU rather than aborting the original:

    https://github.com/qemu/qemu/commit/cce8944cc9efab47d4bf29cfffb3470371c3541b

    This change aims to avoid this situation entirely by raising the default
    incremental sleep time between detach requests from 2 seconds to 10,
    leaving enough time for the first attempt to complete. The overall
    maximum sleep time is also increased from 30 to 60 seconds.

    Future work will aim to entirely remove this retry logic with a libvirt
    event driven approach, polling for the the
    VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED and
    VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED events before retrying.

    Finally, the cleanup of unused arguments in detach_device_with_retry is
    left for a follow up change in order to keep this initial change small
    enough to quickly backport.

    Closes-Bug: #1882521
    Related-Bug: #1894804
    Change-Id: Ib9ed7069cef5b73033351f7a78a3fb566753970d
    (cherry picked from commit dd1e6d4b0cee465fd89744e306fcd25228b3f7cc)

tags: added: in-stable-victoria
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers