Volume detach failure in devstack-platform-centos-9-stream job
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
High
|
Unassigned | ||
libvirt |
Unknown
|
High
|
|||
tempest |
Fix Released
|
High
|
Unassigned |
Bug Description
devstack-
traceback-1: {{{
Traceback (most recent call last):
File "/opt/stack/
raise lib_exc.
tempest.
Details: volume 70cedb4b-
}}}
Traceback (most recent call last):
File "/opt/stack/
raise lib_exc.
tempest.
Details: Volume 70cedb4b-
https:/
Ghanshyam Mann (ghanshyammann) wrote (last edit ): | #1 |
Ghanshyam Mann (ghanshyammann) wrote : | #2 |
anyone from centos9-stream expert/maintainers can help here?
affects: | devstack → nova |
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → High |
Kashyap Chamarthy (kashyapc) wrote : | #3 |
That's one of the affected QEMU instance (as noticed in the libvirtd debug log):
2022-02-08 15:24:23.722+0000: starting up libvirt version: 8.0.0, package: 2.el9 (<email address hidden>, 2022-01-
LC_ALL=C \
PATH=/usr/
HOME=/var/
XDG_DATA_
XDG_CACHE_
XDG_CONFIG_
/usr/libexec/
-name guest=instance-
-S \
-object '{"qom-
-machine pc-i440fx-
-accel tcg \
-cpu Nehalem \
-m 128 \
-object '{"qom-
-overcommit mem-lock=off \
-smp 1,sockets=
-uuid 62898a79-
-smbios 'type=1,
-no-user-config \
-nodefaults \
-chardev socket,
-mon chardev=
-rtc base=utc \
-no-shutdown \
-boot strict=on \
-blockdev '{"driver"
-blockdev '{"node-
-blockdev '{"driver"
-blockdev '{"node-
-device virtio-
-blockdev '{"driver"
-blockdev '{"node-
-device ide-cd,
-netdev tap,fd=
-device virtio-
-add-fd set=2,fd=33 \
-chardev pty,id=
-...
Kashyap Chamarthy (kashyapc) wrote : | #4 |
So DeviceDetachFailed is raised "if libvirt reported error during detaching from the live domain or we timed out waiting for libvirt events and run out of retries"
In this case, it raised the error because it ran out of retries. The default number of retries for CONF.libvirt.
I'm looking at the libvirt <-> QEMU interaction logs here. And I see several failures related to detach (in QEMU parlance "device_del" and "blockdev-del"):
-------
2022-02-08 15:16:14.532+0000: 107878: debug : qemuMonitorJSON
2022-02-08 15:16:14.532+0000: 107878: info : qemuMonitorJSON
2022-02-08 15:16:14.532+0000: 72484: debug : qemuMonitorJSON
ice virtio-disk1 is already in the process of unplug"}}
2022-02-08 15:16:14.532+0000: 72484: error : qemuMonitorJSON
[...]
2022-02-08 15:24:09.134+0000: 111449: debug : qemuMonitorJSON
2022-02-08 15:24:09.134+0000: 111449: info : qemuMonitorJSON
2022-02-08 15:24:09.134+0000: 72483: debug : qemuMonitorJSON
","desc":"Failed to find node with node-name=
2022-02-08 15:24:09.134+0000: 72483: error : qemuMonitorJSON
2022-02-08 15:24:09.134+0000: 72483: debug : qemuMonitorBloc
2022-02-08 15:24:09.134+0000: 72483: debug : qemuMonitorBloc
2022-02-08 15:24:09.134+0000: 72483: info : qemuMonitorSend:914 : QEMU_MONITOR_
fd=-1
2022-02-08 15:24:09.134+0000: 111449: info : qemuMonitorIOWr
len=93 ret=93 errno=0
2022-02-08 15:24:09.136+0000: ...
Balazs Gibizer (balazs-gibizer) wrote (last edit ): | #5 |
This is the filtered nova log from the test_rescued_
Nova boots a VM with a cinder volume then start a rescue, which destroys the domain and starts a new one with a modified disk config (boot it from a rescue image). Then tries to detach the volume
These are the domain xmls of the nova server during the test case https:/
Balazs Gibizer (balazs-gibizer) wrote : | #6 |
The logs a bit misleading as tempest reuses the nova instance for multiple test case so the compute log has more actions about this instance that can be connected to the tempest test case.
Here are a full mapping of compute reques IDs back to tempest actions:
req-5f5fdb2d-
req-9840039d-
req-63e1e0df-
req-f5ed05e5-
req-8fb1ec02-
req-f3015900-
req-f7fc5f0d-
req-e1add0c3-
req-f89675d8-
req-d02f5915-
req-a1de94e4-
This also shows that the actual tempest test case succeeds. The fauilure happens during the cleanup phase.
So the actually interesting sequence of events are:
req-f7fc5f0d-
--- test case passes, tempest starts cleaning up ---
req-e1add0c3-
req-f89675d8-
Balazs Gibizer (balazs-gibizer) wrote : | #7 |
Increasing the timeout value in nova from 20 to 60 sec before nova retries the detach did not helped. The detach still timeouts:
Kashyap Chamarthy (kashyapc) wrote : | #8 |
Some more debugging context based on Gibi's analysis on IRC (quoting Gibi generously):
In "rescue" mode we don't allow detach. And that part works -- i.e.
throws an error as expected. After that the test framework tries to
clean up. It does so by doing the actions in reverse to move back to
the starting state.
So as the server in RESCUE state it unrescues it. And as a volume was
attached to the server before rescue, it tries to detach the volume
after the unrescue. And that detach should remove the volume from the
domain but fails. So, during this detach:
(1) Nova first detaches the volume from the persistent domain that
succeeds
(2) Then nova issue the detach command from the live domain and waits
for the event.
(3) However, the event is not received in 20 seconds so it issues the
command again, which returns the error:
error message: internal error: unable to execute QEMU command
unplug
(4) Then Nova retries 6 more times, always getting the same error as
above, and then gives up.
Balazs Gibizer (balazs-gibizer) wrote : | #9 |
Here are the libvirtd logs for the first and second detach trial: https:/
What I can see is that after the first detach is accepted there is no DEVICE_DELETED event received from QEMU. Probably this is why libvirt does not send the corresponding event to nova and this is why nova times out waiting for the event and eventually retry the detach
Balazs Gibizer (balazs-gibizer) wrote : | #10 |
There is a reproduction where all the other test cases are turned off so the logs are a lot smaller and less noisy https:/
Kashyap Chamarthy (kashyapc) wrote : | #11 |
Based on a chat with libvirt developers, there seem to be two possibilities:
1) the guest OS didn't confirm the detach
2) there was a recent bug in QEMU triggered by using JSON syntax for `-device`
I wonder we can rule out the possibility (2) here. The QEMU and libvirt
versions in the CI seem to be:
qemu-
libvirt version: 8.0.0, package: 2.el9
However, based on the QEMU and libvirt RHEL 9 bugs[1][2], they're
already fixed in:
qemu-
libvirt-
* * *
To _really_ rule out second possibility above, I wonder if we can
try the workaround for the missing DEVICE_DELETED event:
In /etc/libvit/
[1] https:/
DEVICE_DELETED event is not delivered for device frontend if -device
is configured via JSON
[2] https:/
removed from the definition after hot-unplug when JSON syntax for
-device is used
Balazs Gibizer (balazs-gibizer) wrote (last edit ): | #12 |
I'm looking into both possibilities:
1) the current test grabs the guest VM consol log after the detach fails. You can see it here https:/
I see nothing obviously wrong in it. But the guest is in a state of getting IP from the DHCP so maybe the guest was not fully booted yet when the detach was requested.
2) I've pushed PS6 of [1] to test the WA.
[1] https:/
Balazs Gibizer (balazs-gibizer) wrote (last edit ): | #13 |
We can rule out #2) I see the same failure happening even after the WA is applied
* qemu.conf with the WA: https:/
* nova-compute issuing the detach: https:/
* libvirtd issuing the device_del command: https:/
* nova-compute fails as detach fails: https:/
Balazs Gibizer (balazs-gibizer) wrote : | #14 |
I'v pushed PS7 of [1] depending on a tempest change [2] to i) grab the guest console log before tempest issues the detach ii) add an extra delay before the detach to let the VM fully boot. Let's see if this helps.
If the issue is that the guest is not fully booted and the extra sleep will make the detach work then we need Lee's tempest change series where the test waits until VM is accessible via ssh [3]
[1] https:/
[2] https:/
[3] https:/
Balazs Gibizer (balazs-gibizer) wrote : | #15 |
After adding an extra 120 sec of delay in the test before issue the detach request the volume is detached successfully. This confirms that the issue is #1). The guest OS did not release the device probably because the guest OS hasn't booted fully up yet when the unplug comes.
As I mentioned above Lee has a tempest change series[3] adding extra waiters before these operations that checks that the VM can be reached via ssh. That series probably solves this issue as well.
[3] https:/
Changed in nova: | |
status: | Confirmed → Triaged |
Ghanshyam Mann (ghanshyammann) wrote : | #16 |
adding tempest for https:/
Changed in tempest: | |
status: | New → Triaged |
importance: | Undecided → High |
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master) | #17 |
Related fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tempest (master) | #18 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 952fe9c76deaa16
Author: Ghanshyam Mann <email address hidden>
Date: Wed Mar 2 14:43:34 2022 -0600
Add tempest-
centos 9 stream is testing runtime for Yoga so
let's add it in tempest gate. This job is failing
due to bug#1960346 and it will be helpful to know the
job status when we add the fixes with below series:
- https:/
Related-Bug: #1960346
Change-Id: Ib91f67fb9a592e
OpenStack Infra (hudson-openstack) wrote : | #19 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit d548e7a8fb22f4c
Author: Lee Yarwood <email address hidden>
Date: Fri Nov 12 12:59:22 2021 +0000
compute: Move volume attach tests to use wait_until=SSHABLE
Waiting until the test instance is SSH-able before continuing
the test will help us avoid failures to detach a volume from
server, more info in the related bug.
Related-Bug: #1960346
Change-Id: I5ad4aa04f02001
OpenStack Infra (hudson-openstack) wrote : | #20 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 9ba15f64bca77f8
Author: Lee Yarwood <email address hidden>
Date: Fri Nov 12 13:01:11 2021 +0000
compute: Move device tagging tests to use wait_until=SSHABLE
Waiting until the test instance is SSH-able before continuing
the test will help us avoid failures to detach a volume from
server, more info in the related bug.
Related-Bug: #1960346
Change-Id: Id5496572ce6cef
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master) | #21 |
Fix proposed to branch: master
Review: https:/
Changed in tempest: | |
status: | Triaged → In Progress |
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tempest (master) | #22 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 6fa213cc0fcab74
Author: Ghanshyam Mann <email address hidden>
Date: Thu Mar 17 23:45:41 2022 -0500
Make rescue, volume attachment compute tests to use SSH-able server
As you can see the details in bug#1960346, volume detach fails
on centos 9 stream while server is not fully booted. This
commit makes sure that erver creation as well as after the unrescue
server test wait for the server to be SSH-able before test start
performing the detach operation in cleanup.
Related-Bug: #1960346
Change-Id: Ib21a764e3cf81d
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master) | #23 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 7304e3ac8973a42
Author: Ghanshyam Mann <email address hidden>
Date: Fri Mar 18 13:58:25 2022 -0500
Move ServerStableDev
ServerStabl
on rescue server and in cleanup detach_volume. As described in
the bug#1960346 we need to wait for server readiness before
detach volume called.
Also making centos stream 9 job as voting.
Closes-Bug: #1960346
Change-Id: Ia213297b13f42d
Changed in tempest: | |
status: | In Progress → Fix Released |
Ghanshyam Mann (ghanshyammann) wrote : | #24 |
this issue is fixed by making the server ready and SSH-able before detach volume operation is performed.
This series: https:/
and now job is also passing and voting
- https:/
- https:/
Ghanshyam Mann (ghanshyammann) wrote : | #25 |
Closing it for nova as it is fixed on the tempest side.
Changed in nova: | |
status: | Triaged → Invalid |
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tempest 30.1.0 | #26 |
This issue was fixed in the openstack/tempest 30.1.0 release.
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master) | #27 |
Related fix proposed to branch: master
Review: https:/
Balazs Gibizer (balazs-gibizer) wrote : | #28 |
Note that an upstream libvirt bug is opened as the issue was reproduced with pure libvirt / qemu and with Ubuntu 22.04 guest: https:/
Balazs Gibizer (balazs-gibizer) wrote : | #29 |
https:/
In Red Hat Bugzilla #2087047, bgibizer (bgibizer-redhat-bugs) wrote : | #39 |
This bug report is based on the upstream bug: https:/
Description of problem:
If disk is detached from a guest while the guest OS is still booting then that disk get stuck. It seems that the detach succeeds from virsh perspective. But the disk is still visible both from the guest and from virsh as attached. However when the detach is retried, even after the guest OS is fully booted, it fails with "Failed to detach disk
error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug".
This was observed in OpenStack upstream CI with cirros 0.5.2 guest OS. But now reproduced without OpenStack with a more normal guest (Ubuntu 22.04). The OpenStack bug is being worked around by changing the test in the CI to wait until the guest is fully booted before trying to attach the volume.
Version-Release number of selected component (if applicable):
Host:
* Operating system: Debian sid
* Architecture: x86_64
* kernel version: 5.17.0-1-amd64 #1 (closed) SMP PREEMPT Debian 5.17.3-1 (2022-04-18) x86_64 GNU/Linux
* libvirt version: 8.2.0-1
* Hypervisor and version: qemu-system-x86_64 1:7.0+dfsg-1
Guest:
* Operating system: Ubuntu 22.04 (cloud image)
How reproducible:
If the guest OS boot is slowed down it is 100% reporducible
Steps to Reproduce:
1. Modify the Ubuntu cloud guest image to have the boot_delay=100 added to the kernel args to simulate a slow host
2. Start the Ubuntu domain and connect to the serial console to see it boot
3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start.
4. From a second terminal attach an additional disk to the guest. It succeeds.
5. Wait a second
6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds.
7. Check the domain XML, the disk is still attached
8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached.
9. Check the virsh domblklist output. The disk is still attached.
10. Try to detach the disk again. It fails with "Failed to detach disk error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug".
Actual results:
The disk cannot be detached even after the guest OS is fully booted. Retrying the detach always fails.
Expected results:
Either the disk is eventually detach from the guest after it is fully booted.
Or the detach can be successfully retried from via libvirt / virsh
Additional info:
Please see the debug logs and detailed reproduction sequence in the upstream bug https:/
In Red Hat Bugzilla #2087047, mkletzan (mkletzan-redhat-bugs) wrote : | #40 |
This to me looks like a thing that needs some work in QEMU since libvirt is trying to detach the device again, as requested. Looking at the linked issue it confirms my speculations. Therefore I am moving this to QEMU to further triage this.
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master) | #30 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : | #31 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master) | #32 |
Related fix proposed to branch: master
Review: https:/
In Red Hat Bugzilla #2087047, qinwang (qinwang-redhat-bugs) wrote : | #41 |
Reproduce it on
Red Hat Enterprise Linux release 9.0 (Plow)
5.14.0-
qemu-kvm-
seabios-
edk2-ovmf-
Test steps:
1.Create image file if need
qemu-img create -f qcow2 /home/kvm_
2.Boot vm
/usr/libexec/
-name 'avocado-vt-vm1' \
-sandbox on \
-machine q35,memory-
-device pcie-root-
-device pcie-pci-
-nodefaults \
-device VGA,bus=
-m 8G \
-object memory-
-smp 2 \
-cpu host,vmx,
-device pcie-root-
-device qemu-xhci,
-device usb-tablet,
-blockdev node-name=
-blockdev node-name=
-device pcie-root-
-device virtio-
\
-blockdev node-name=
-blockdev node-name=
-device pcie-root-
\
-device pcie-root-
-device virtio-
-netdev tap,id=
-vnc :5 \
-monitor stdio \
-qmp tcp:0:5955,
-rtc base=localtime,
-boot menu=off,
-enable-kvm \
-device pcie-root-
-chardev socket,
-device isa-serial,
3.Sleep 3 seconds
4.execute qmp command to hot-plug/unplug disk
{"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg1", "drive": "drive_stg1", "write-cache": "on", "bus": "pcie-root-
{"execute"
No any error on qmp command
5.wait for guest finish booting the login and check disk
lsblk
there is new disk found in guest. It expect the disk non-exist in guest
6.execute qmp command to unplug disk again
{"execute"
it get error return
{"error": {"class": "GenericError", "desc": ...
In Red Hat Bugzilla #2087047, yiwei (yiwei-redhat-bugs) wrote : | #42 |
Can reproduce this bug with virtio-net-pci and virtio-blk-pci device on the latest rhel9.1.0 host with the test steps of Comment 2.
host version:
qemu-kvm-
kernel-
seabios-
guest: rhel9.1.0
Test result:
hot-plug/unplug virtio-net-pci device in qmp:
{ "execute": "netdev_
{ "execute": "device_
{ "execute": "device_del", "arguments": { "id": "net1" } }{"return": {}}
{"return": {}}
{"return": {}}
{ "execute": "device_del", "arguments": { "id": "net1" } }
{"error": {"class": "GenericError", "desc": "Device net1 is already in the process of unplug"}}
hot-plug/unplug virtio-blk-pci device in qmp:
{"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg1", "drive": "drive_stg1", "write-cache": "on", "bus": "pcie-root-
{"execute"
{"return": {}}
{"return": {}}
{"execute"
{"error": {"class": "GenericError", "desc": "Device stg1 is already in the process of unplug"}}
Boot a guest with cmd:
/usr/libexec/
-name 'avocado-vt-vm1' \
-sandbox on \
-machine q35,memory-
-device pcie-root-
-device pcie-pci-
-nodefaults \
-device VGA,bus=
-m 16G \
-object memory-
-smp 6,maxcpus=
-cpu Icelake-
-device pcie-root-
-device qemu-xhci,
-device usb-tablet,
-device pcie-root-
-device virtio-
-blockdev node-name=
-blockdev node-name=
-device scsi-hd,
-device pcie-root-
-device virtio-
-netdev tap,id=
-vnc :0 \
-rtc base=utc,
-boot menu=off,
-enable-kvm \
-monitor stdio \
-S \
-qmp tcp:0:4444,
-device pcie-root-
-blockdev node-name=
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master) | #33 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit d8bbaba415bc5cc
Author: Balazs Gibizer <email address hidden>
Date: Tue May 17 17:15:40 2022 +0200
Wait for guest after resize
To stabilize test_resize_
wait for the guest OS to fully boot after the resize and before the test
attempts to detach the volume.
Closes-Bug #1960346
Change-Id: I85ee21868c9281
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tempest (master) | #34 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 53cd6880d732265
Author: Balazs Gibizer <email address hidden>
Date: Mon May 23 10:16:55 2022 +0200
Wait for guest to boot before attach a volume
This patch ensures test under the api.compute.admin package wait until
the VM is sshable before attaches a volume to it.
Related-Bug: #1960346
Change-Id: I5f93effa280725
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master) | #35 |
Change abandoned by "Balazs Gibizer <email address hidden>" on branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tempest (master) | #36 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit db2f561cdf8fb51
Author: Lee Yarwood <email address hidden>
Date: Fri Nov 12 13:03:57 2021 +0000
Create router and dhcp when create_
We are trying to make server SSH-able before detach volume
is performed (details in bug mentioned below). Creating
router and dhcp is needed to setup the proper network path
for server otherwise it fail
- https:/
Related-Bug: #1960346
Change-Id: I18eff5a4216d56
Balazs Gibizer (balazs-gibizer) wrote : | #37 |
Re-opening this for Nova as we expect a fix from QEMU. The QEMU bug is being tracked in [1]. When the QEMU fix is implemented we have to check if Nova needs to be adapted to the fix or not and we should remove some of the tempest workaround we applied to test the fix.
Making this Medium for nova now and adding the gate-failure tag.
Changed in nova: | |
status: | Invalid → Triaged |
importance: | High → Medium |
tags: | added: gate-failure |
In Red Hat Bugzilla #2087047, afazekas (afazekas-redhat-bugs) wrote : | #43 |
IMHO The issue is the guest simply ignores the release requests for devices which he never learned it exists,
it was added and remove requested before it initialized all devices+hotplug.
The hotplug what we are using everywhere is just emulating the hotplug devs meant to be used for physical machines, where
people are not expected to plug and remove a device at the first ms of the boot.
If we really want to solve these kind of issues for once and for all,
probably we should invent a new "cloud-plug" device named hotplug device for virtual machines.
However some mitigation might be possible in same cases.
- guest OS should acknowledge releasing devices what he never initialized (guest kernel modification)
- guest kernel (requested by the init system?) should do another pci rescan to avoid not detected devices from the blind spot.
The blind spot is between the pciscan and the hotplug initialization
The feature expected from the cloud-plug device, if the guest os is not booted (yet) it simply allows to remove devices. The virtualization layer would know it is safe.
So the guest os is expected to claim a device from the cloudplug in order to prevent removal, proper handshaking needed.
The challenge here, is what to do with guests which does not supports the new "cloud-plug",
probably we should just wait 3+/5+/.. years before we dare to try making it default expected.
In Red Hat Bugzilla #2087047, asyedham (asyedham-redhat-bugs) wrote : | #44 |
*** Bug 2080893 has been marked as a duplicate of this bug. ***
In Red Hat Bugzilla #2087047, smooney (smooney-redhat-bugs) wrote : | #45 |
We have been discusssing this regression upstream in the virtual OpenStack project team gathering (vPTG)
i just wanted to pass on the feedback that this is still a pain point for us both upstream and in our downstream product.
hopefully this is something that can be addressed with a higher priority.
fell free to reach out to me as the User Advocate for the OpenStack compute team or to our pm Erwan Gallen <email address hidden> if you need
additional information but this is still impacting our downstream product and affecting our upstream si stability.
sean mooney (sean-k-mooney) wrote : | #38 |
setting this to invlaid for nova as multiple attempt to work around the issue have show we cannot fix it in nova.
Changed in nova: | |
importance: | Medium → High |
status: | Triaged → Invalid |
Changed in libvirt: | |
importance: | Unknown → Low |
status: | Unknown → Confirmed |
In Red Hat Bugzilla #2087047, imammedo (imammedo-redhat-bugs) wrote : | #47 |
Fix posted upstream:
https://<email address hidden>
it's too late for merging into this release, but it should make into the next one.
In nutshell, it was regression introduced in QEMU
* v5.0
* 'pc' machine with ACPI hotplug
* 'q35' native PCIe hotplug
* v6.1
* + 'q35' with ACPI hotplug (default)
Fixed in:
* 6.2 'q35' native PCIe hotplug
* TBD (8.1?): 'q35' and 'pc' ACPI hotplug
(once it's merged upstream we can backport it)
Need to look into SHPC one, which seems to be broken as well.
In Red Hat Bugzilla #2087047, yfu (yfu-redhat-bugs) wrote : | #48 |
QE bot(pre verify): Set 'Verified:
Alan Pevec (apevec) wrote : | #46 |
Fixed in qemu-kvm CS9 qemu-kvm-
and RHEL 9.2 async update qemu-kvm-
Changed in libvirt: | |
importance: | Low → High |
status: | Confirmed → Unknown |
gibi debugged and found that nova fails to detach the volume from the guest via libvirt even after retries
https:/ /zuul.openstack .org/build/ 3e24d977991d453 6b6279afd7f3b5d 56/log/ controller/ logs/screen- n-cpu.txt? severity= 4#49433
<gibi>
about the rescue issue in the centos 9 stream job. In that job I see libvirt 8.0.0 and qemu 6.2.0 that is way newer than what we use in the ubuntu jobs (libvirt 6.0.0 and qemu 4.2.0)
and nova fails to detach the volume from the guest via libvirt even after retries
so if I have to guess then something is changed in the libvirt between 6.0.0 and 8.0.0 that makes this faulty