Activity log for bug #2018733

Date Who What changed Old value New value Message
2023-05-08 12:39:36 Christian Ehrhardt  bug added bug
2023-05-08 12:39:45 Christian Ehrhardt  bug added subscriber Kashyap Chamarthy
2023-05-08 12:39:53 Christian Ehrhardt  bug added subscriber Ubuntu Server
2023-05-08 12:40:02 Christian Ehrhardt  nominated for series Ubuntu Lunar
2023-05-08 12:40:02 Christian Ehrhardt  bug task added qemu (Ubuntu Lunar)
2023-05-08 12:40:02 Christian Ehrhardt  nominated for series Ubuntu Jammy
2023-05-08 12:40:02 Christian Ehrhardt  bug task added qemu (Ubuntu Jammy)
2023-05-08 12:40:02 Christian Ehrhardt  nominated for series Ubuntu Kinetic
2023-05-08 12:40:02 Christian Ehrhardt  bug task added qemu (Ubuntu Kinetic)
2023-05-08 12:40:02 Christian Ehrhardt  nominated for series Ubuntu Mantic
2023-05-08 12:40:02 Christian Ehrhardt  bug task added qemu (Ubuntu Mantic)
2023-05-08 12:40:11 Christian Ehrhardt  tags server-todo
2023-05-08 12:40:55 Christian Ehrhardt  description This was kindly reported by ~kashyapc I only convert this into a bug for tracking. This [1] QEMU patch solves a genuine bug [2] involving disk hot- unplug. More details in the commit message, and also in the bug that is linked here[2]. I have also flagged the fix for QEMU 8.0 stable[3], and tested that the fix itself works[4]. Please pick up the fix[1] once it merges. [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html — acpi: pcihp: allow repeating hot-unplug requests [2] https://gitlab.com/libvirt/libvirt/-/issues/309 [3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html --- ^^ report --- vv context Note: - [2] + [4] have tests steps we can use for SRU verification. - This has landed upstream by now https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4 - Also landed in 8.0 stable staging as https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 This was kindly reported by ~kashyapc I only convert this into a bug for tracking. --- Report: This [1] QEMU patch solves a genuine bug [2] involving disk hot- unplug. More details in the commit message, and also in the bug that is linked here[2]. I have also flagged the fix for QEMU 8.0 stable[3], and tested that the fix itself works[4]. Please pick up the fix[1] once it merges. [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html     — acpi: pcihp: allow repeating hot-unplug requests [2] https://gitlab.com/libvirt/libvirt/-/issues/309 [3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html --- ^^ report --- vv extra context Note: - [2] + [4] have tests steps we can use for SRU verification. - This has landed upstream by now   https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4 - Also landed in 8.0 stable staging as   https://gitlab.com/qemu-project/qemu/-/commit/76326210e439
2023-05-08 12:47:46 Christian Ehrhardt  description This was kindly reported by ~kashyapc I only convert this into a bug for tracking. --- Report: This [1] QEMU patch solves a genuine bug [2] involving disk hot- unplug. More details in the commit message, and also in the bug that is linked here[2]. I have also flagged the fix for QEMU 8.0 stable[3], and tested that the fix itself works[4]. Please pick up the fix[1] once it merges. [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html     — acpi: pcihp: allow repeating hot-unplug requests [2] https://gitlab.com/libvirt/libvirt/-/issues/309 [3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html --- ^^ report --- vv extra context Note: - [2] + [4] have tests steps we can use for SRU verification. - This has landed upstream by now   https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4 - Also landed in 8.0 stable staging as   https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 [ Impact ] * In the past one could unplug a device, but if that didn't work it could be tried again. Changes in q35 backend for hotplug now will only queue one. But if that unplug was very early the guest will clean the GPEx.status and thereby never see the event. * The fix makes ACPI PCI behave the same as pcie which means allowing to requeue them, but under a rate controlling limit. [ Test Plan ] 1. Modify the Ubuntu cloudguest image to have the boot_delay=100 added to the kernel args to simulate a slowly host 2. Start the Ubuntu domain and connect to the serial console to see it boot 3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. But note that the guest boot is slowed down with the boot_delay=100 kernel arg. 4. From a second terminal attach an additional disk to the guest. It succeeds. 5. Wait a second 6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds. 7. Check the domain XML, the disk is still attached 8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached. 9. Check the virsh domblklist output. The disk is still attached. 10. Try to detach the disk again. It fails with "error: device not found: no target device" An flow of these with commands and example output can be seen at: https://gitlab.com/libvirt/libvirt/uploads/651523ecd79419a6dac574504fb4d531/reproduction-logs.txt [ Where problems could occur ] * Depending how far we backport this (at least v6.2 in Jammy) we need to double check if the used callbacks and settings work the same back then. While this can be just "tested" it should also get a review of related changes to be sure. * The change and thereby regressions are limited to acpi PCI hotplug and for a software so complex as qemu it is always good to be able to clearly point to a small subset of the use cases to know what to look out for. [ Other Info ] * n/a ----------- This was kindly reported by ~kashyapc I only convert this into a bug for tracking. --- Report: This [1] QEMU patch solves a genuine bug [2] involving disk hot- unplug. More details in the commit message, and also in the bug that is linked here[2]. I have also flagged the fix for QEMU 8.0 stable[3], and tested that the fix itself works[4]. Please pick up the fix[1] once it merges. [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html     — acpi: pcihp: allow repeating hot-unplug requests [2] https://gitlab.com/libvirt/libvirt/-/issues/309 [3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html --- ^^ report --- vv extra context Note: - [2] + [4] have tests steps we can use for SRU verification. - This has landed upstream by now   https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4 - Also landed in 8.0 stable staging as   https://gitlab.com/qemu-project/qemu/-/commit/76326210e439
2023-05-10 15:13:55 Christian Ehrhardt  qemu (Ubuntu Mantic): assignee Sergio Durigan Junior (sergiodj)
2023-05-17 15:28:06 Christian Ehrhardt  qemu (Ubuntu Lunar): assignee Sergio Durigan Junior (sergiodj)
2023-05-17 15:28:12 Christian Ehrhardt  qemu (Ubuntu Jammy): assignee Sergio Durigan Junior (sergiodj)
2023-05-17 15:28:22 Christian Ehrhardt  qemu (Ubuntu Kinetic): assignee Sergio Durigan Junior (sergiodj)
2023-05-18 23:00:02 Launchpad Janitor merge proposal linked https://code.launchpad.net/~sergiodj/ubuntu/+source/qemu/+git/qemu/+merge/443231
2023-05-19 08:07:03 Launchpad Janitor qemu (Ubuntu): status New Confirmed
2023-05-19 08:07:03 Launchpad Janitor qemu (Ubuntu Jammy): status New Confirmed
2023-05-19 08:07:03 Launchpad Janitor qemu (Ubuntu Kinetic): status New Confirmed
2023-05-19 08:07:03 Launchpad Janitor qemu (Ubuntu Lunar): status New Confirmed
2023-05-25 10:49:00 Launchpad Janitor qemu (Ubuntu Mantic): status Confirmed Fix Released
2023-05-29 21:10:07 Launchpad Janitor merge proposal linked https://code.launchpad.net/~sergiodj/ubuntu/+source/qemu/+git/qemu/+merge/443753
2023-05-30 03:35:36 Sergio Durigan Junior description [ Impact ] * In the past one could unplug a device, but if that didn't work it could be tried again. Changes in q35 backend for hotplug now will only queue one. But if that unplug was very early the guest will clean the GPEx.status and thereby never see the event. * The fix makes ACPI PCI behave the same as pcie which means allowing to requeue them, but under a rate controlling limit. [ Test Plan ] 1. Modify the Ubuntu cloudguest image to have the boot_delay=100 added to the kernel args to simulate a slowly host 2. Start the Ubuntu domain and connect to the serial console to see it boot 3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. But note that the guest boot is slowed down with the boot_delay=100 kernel arg. 4. From a second terminal attach an additional disk to the guest. It succeeds. 5. Wait a second 6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds. 7. Check the domain XML, the disk is still attached 8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached. 9. Check the virsh domblklist output. The disk is still attached. 10. Try to detach the disk again. It fails with "error: device not found: no target device" An flow of these with commands and example output can be seen at: https://gitlab.com/libvirt/libvirt/uploads/651523ecd79419a6dac574504fb4d531/reproduction-logs.txt [ Where problems could occur ] * Depending how far we backport this (at least v6.2 in Jammy) we need to double check if the used callbacks and settings work the same back then. While this can be just "tested" it should also get a review of related changes to be sure. * The change and thereby regressions are limited to acpi PCI hotplug and for a software so complex as qemu it is always good to be able to clearly point to a small subset of the use cases to know what to look out for. [ Other Info ] * n/a ----------- This was kindly reported by ~kashyapc I only convert this into a bug for tracking. --- Report: This [1] QEMU patch solves a genuine bug [2] involving disk hot- unplug. More details in the commit message, and also in the bug that is linked here[2]. I have also flagged the fix for QEMU 8.0 stable[3], and tested that the fix itself works[4]. Please pick up the fix[1] once it merges. [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html     — acpi: pcihp: allow repeating hot-unplug requests [2] https://gitlab.com/libvirt/libvirt/-/issues/309 [3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html --- ^^ report --- vv extra context Note: - [2] + [4] have tests steps we can use for SRU verification. - This has landed upstream by now   https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4 - Also landed in 8.0 stable staging as   https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 [ Impact ]  * In the past one could unplug a device, but if that didn't work    it could be tried again. Changes in q35 backend for hotplug    now will only queue one. But if that unplug was very early    the guest will clean the GPEx.status and thereby never see    the event.  * The fix makes ACPI PCI behave the same as pcie which means    allowing to requeue them, but under a rate controlling limit. [ Test Plan ] First, let's prepare an LXD VM to serve as our testbed. In this Test Plan we'll be using a Jammy VM. physical-machine$ lxc launch ubuntu:jammy qemu-bug2018733-jammy --vm -c limits.memory=8GB physical-machine$ lxc shell qemu-bug2018733-jammy host-vm# apt update host-vm# apt install -y libvirt-daemon-system libguestfs-tools host-vm# usermod -a -G libvirt ubuntu host-vm# usermod -a -G kvm ubuntu host-vm# su - ubuntu In order to reproduce the issue, we will need to quickly attach and detach a disk into/from a VM. To do that, let's use an Ubuntu Cloud image and adjust its kernel's "boot_delay" parameter to give us time to perform the necessary operations. host-vm$ wget https://cloud-images.ubuntu.com/lunar/current/lunar-server-cloudimg-amd64.img host-vm$ qemu-img create disk.img 1G host-vm$ sudo chown libvirt-qemu:kvm lunar-server-cloudimg-amd64.img disk.img host-vm$ sudo chmod +x /home/ubuntu host-vm$ sudo virt-customize -a lunar-server-cloudimg-amd64.img --root-password password:1234 host-vm$ cat > test-vm.xml << __EOF__ <domain type='kvm' id='3'> <name>test-vm</name> <memory unit='GiB'>1</memory> <currentMemory unit='GiB'>1</currentMemory> <vcpu placement='static'>1</vcpu> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <vmcoreinfo state='on'/> </features> <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Nehalem</model> <topology sockets='1' cores='1' threads='1'/> <feature policy='require' name='vme'/> <feature policy='require' name='x2apic'/> <feature policy='require' name='hypervisor'/> </cpu> <clock offset='utc'> <timer name='pit' tickpolicy='delay'/> <timer name='rtc' tickpolicy='catchup'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/home/ubuntu/lunar-server-cloudimg-amd64.img' index='2'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> <controller type='usb' index='0' model='none'> <alias name='usb'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='ide' index='0'> <alias name='ide'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <input type='mouse' bus='ps2'> <alias name='input0'/> </input> <input type='keyboard' bus='ps2'> <alias name='input1'/> </input> <serial type='pty'> <source path='/dev/pts/0'/> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/0'> <source path='/dev/pts/0'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <memballoon model='virtio'> <stats period='10'/> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memballoon> <rng model='virtio'> <backend model='random'>/dev/urandom</backend> <alias name='rng0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </rng> </devices> </domain> __EOF__ host-vm$ virsh define test-vm.xml host-vm$ virsh start test-vm host-vm$ virsh console test-vm Wait for the VM to boot, log into it (user is "root", password is "1234"), and execute: nested-vm# sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT="\(.*\)"$/GRUB_CMDLINE_LINUX_DEFAULT="\1 boot_delay=100"/' /etc/default/grub.d/50-cloudimg-settings.cfg nested-vm# update-grub nested-vm# reboot Keep this terminal open, and quickly open another terminal, log into the host VM (running on LXD, named qemu-bug2018733-jammy in this Test Plan). physical-machine$ lxc shell qemu-bug2018733-jammy host-vm# su - ubuntu Closely monitor the reboot process of the nested VM on the first terminal. When the VM starts booting again, switch to the second terminal (inside the host VM) and issue: host-vm$ virsh attach-disk test-vm /home/ubuntu/disk.img vdx --live --persistent && sleep 1 && virsh detach-disk test-vm --live vdx You will notice that the detach operation apparently succeeded, but you can confirm that it did not by doing: host-vm$ virsh domblklist test-vm You will notice that the new disk (named "vpx") is still attached to the VM. If you try to detach it again, you will get an error: host-vm$ virsh detach-disk test-vm --live vdx error: Failed to detach disk error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug [ Previous Test Plan ] 1. Modify the Ubuntu cloudguest image to have the boot_delay=100 added to the kernel args to simulate a slowly host 2. Start the Ubuntu domain and connect to the serial console to see it boot 3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. But note that the guest boot is slowed down with the boot_delay=100 kernel arg. 4. From a second terminal attach an additional disk to the guest. It succeeds. 5. Wait a second 6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds. 7. Check the domain XML, the disk is still attached 8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached. 9. Check the virsh domblklist output. The disk is still attached. 10. Try to detach the disk again. It fails with "error: device not found: no target device" An flow of these with commands and example output can be seen at: https://gitlab.com/libvirt/libvirt/uploads/651523ecd79419a6dac574504fb4d531/reproduction-logs.txt [ Where problems could occur ]  * Depending how far we backport this (at least v6.2 in Jammy)    we need to double check if the used callbacks and settings work    the same back then. While this can be just "tested" it should    also get a review of related changes to be sure.  * The change and thereby regressions are limited to acpi PCI    hotplug and for a software so complex as qemu it is always good    to be able to clearly point to a small subset of the use cases    to know what to look out for. [ Other Info ]  * n/a ----------- This was kindly reported by ~kashyapc I only convert this into a bug for tracking. --- Report: This [1] QEMU patch solves a genuine bug [2] involving disk hot- unplug. More details in the commit message, and also in the bug that is linked here[2]. I have also flagged the fix for QEMU 8.0 stable[3], and tested that the fix itself works[4]. Please pick up the fix[1] once it merges. [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html     — acpi: pcihp: allow repeating hot-unplug requests [2] https://gitlab.com/libvirt/libvirt/-/issues/309 [3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html --- ^^ report --- vv extra context Note: - [2] + [4] have tests steps we can use for SRU verification. - This has landed upstream by now   https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4 - Also landed in 8.0 stable staging as   https://gitlab.com/qemu-project/qemu/-/commit/76326210e439
2023-05-30 18:56:23 Launchpad Janitor merge proposal linked https://code.launchpad.net/~sergiodj/ubuntu/+source/qemu/+git/qemu/+merge/443831
2023-05-30 18:58:30 Sergio Durigan Junior description [ Impact ]  * In the past one could unplug a device, but if that didn't work    it could be tried again. Changes in q35 backend for hotplug    now will only queue one. But if that unplug was very early    the guest will clean the GPEx.status and thereby never see    the event.  * The fix makes ACPI PCI behave the same as pcie which means    allowing to requeue them, but under a rate controlling limit. [ Test Plan ] First, let's prepare an LXD VM to serve as our testbed. In this Test Plan we'll be using a Jammy VM. physical-machine$ lxc launch ubuntu:jammy qemu-bug2018733-jammy --vm -c limits.memory=8GB physical-machine$ lxc shell qemu-bug2018733-jammy host-vm# apt update host-vm# apt install -y libvirt-daemon-system libguestfs-tools host-vm# usermod -a -G libvirt ubuntu host-vm# usermod -a -G kvm ubuntu host-vm# su - ubuntu In order to reproduce the issue, we will need to quickly attach and detach a disk into/from a VM. To do that, let's use an Ubuntu Cloud image and adjust its kernel's "boot_delay" parameter to give us time to perform the necessary operations. host-vm$ wget https://cloud-images.ubuntu.com/lunar/current/lunar-server-cloudimg-amd64.img host-vm$ qemu-img create disk.img 1G host-vm$ sudo chown libvirt-qemu:kvm lunar-server-cloudimg-amd64.img disk.img host-vm$ sudo chmod +x /home/ubuntu host-vm$ sudo virt-customize -a lunar-server-cloudimg-amd64.img --root-password password:1234 host-vm$ cat > test-vm.xml << __EOF__ <domain type='kvm' id='3'> <name>test-vm</name> <memory unit='GiB'>1</memory> <currentMemory unit='GiB'>1</currentMemory> <vcpu placement='static'>1</vcpu> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <vmcoreinfo state='on'/> </features> <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Nehalem</model> <topology sockets='1' cores='1' threads='1'/> <feature policy='require' name='vme'/> <feature policy='require' name='x2apic'/> <feature policy='require' name='hypervisor'/> </cpu> <clock offset='utc'> <timer name='pit' tickpolicy='delay'/> <timer name='rtc' tickpolicy='catchup'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/home/ubuntu/lunar-server-cloudimg-amd64.img' index='2'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> <controller type='usb' index='0' model='none'> <alias name='usb'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='ide' index='0'> <alias name='ide'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <input type='mouse' bus='ps2'> <alias name='input0'/> </input> <input type='keyboard' bus='ps2'> <alias name='input1'/> </input> <serial type='pty'> <source path='/dev/pts/0'/> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/0'> <source path='/dev/pts/0'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <memballoon model='virtio'> <stats period='10'/> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memballoon> <rng model='virtio'> <backend model='random'>/dev/urandom</backend> <alias name='rng0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </rng> </devices> </domain> __EOF__ host-vm$ virsh define test-vm.xml host-vm$ virsh start test-vm host-vm$ virsh console test-vm Wait for the VM to boot, log into it (user is "root", password is "1234"), and execute: nested-vm# sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT="\(.*\)"$/GRUB_CMDLINE_LINUX_DEFAULT="\1 boot_delay=100"/' /etc/default/grub.d/50-cloudimg-settings.cfg nested-vm# update-grub nested-vm# reboot Keep this terminal open, and quickly open another terminal, log into the host VM (running on LXD, named qemu-bug2018733-jammy in this Test Plan). physical-machine$ lxc shell qemu-bug2018733-jammy host-vm# su - ubuntu Closely monitor the reboot process of the nested VM on the first terminal. When the VM starts booting again, switch to the second terminal (inside the host VM) and issue: host-vm$ virsh attach-disk test-vm /home/ubuntu/disk.img vdx --live --persistent && sleep 1 && virsh detach-disk test-vm --live vdx You will notice that the detach operation apparently succeeded, but you can confirm that it did not by doing: host-vm$ virsh domblklist test-vm You will notice that the new disk (named "vpx") is still attached to the VM. If you try to detach it again, you will get an error: host-vm$ virsh detach-disk test-vm --live vdx error: Failed to detach disk error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug [ Previous Test Plan ] 1. Modify the Ubuntu cloudguest image to have the boot_delay=100 added to the kernel args to simulate a slowly host 2. Start the Ubuntu domain and connect to the serial console to see it boot 3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. But note that the guest boot is slowed down with the boot_delay=100 kernel arg. 4. From a second terminal attach an additional disk to the guest. It succeeds. 5. Wait a second 6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds. 7. Check the domain XML, the disk is still attached 8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached. 9. Check the virsh domblklist output. The disk is still attached. 10. Try to detach the disk again. It fails with "error: device not found: no target device" An flow of these with commands and example output can be seen at: https://gitlab.com/libvirt/libvirt/uploads/651523ecd79419a6dac574504fb4d531/reproduction-logs.txt [ Where problems could occur ]  * Depending how far we backport this (at least v6.2 in Jammy)    we need to double check if the used callbacks and settings work    the same back then. While this can be just "tested" it should    also get a review of related changes to be sure.  * The change and thereby regressions are limited to acpi PCI    hotplug and for a software so complex as qemu it is always good    to be able to clearly point to a small subset of the use cases    to know what to look out for. [ Other Info ]  * n/a ----------- This was kindly reported by ~kashyapc I only convert this into a bug for tracking. --- Report: This [1] QEMU patch solves a genuine bug [2] involving disk hot- unplug. More details in the commit message, and also in the bug that is linked here[2]. I have also flagged the fix for QEMU 8.0 stable[3], and tested that the fix itself works[4]. Please pick up the fix[1] once it merges. [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html     — acpi: pcihp: allow repeating hot-unplug requests [2] https://gitlab.com/libvirt/libvirt/-/issues/309 [3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html --- ^^ report --- vv extra context Note: - [2] + [4] have tests steps we can use for SRU verification. - This has landed upstream by now   https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4 - Also landed in 8.0 stable staging as   https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 [ Impact ]  * In the past one could unplug a device, but if that didn't work    it could be tried again. Changes in q35 backend for hotplug    now will only queue one. But if that unplug was very early    the guest will clean the GPEx.status and thereby never see    the event.  * The fix makes ACPI PCI behave the same as pcie which means    allowing to requeue them, but under a rate controlling limit. [ Test Plan ] First, let's prepare an LXD VM to serve as our testbed. In this Test Plan we'll be using a Jammy VM. physical-machine$ lxc launch ubuntu:jammy qemu-bug2018733-jammy --vm -c limits.memory=8GB physical-machine$ lxc shell qemu-bug2018733-jammy host-vm# apt update host-vm# apt install -y libvirt-daemon-system libguestfs-tools host-vm# usermod -a -G libvirt,kvm ubuntu host-vm# su - ubuntu In order to reproduce the issue, we will need to quickly attach and detach a disk into/from a VM. To do that, let's use an Ubuntu Cloud image and adjust its kernel's "boot_delay" parameter to give us time to perform the necessary operations. host-vm$ wget https://cloud-images.ubuntu.com/lunar/current/lunar-server-cloudimg-amd64.img host-vm$ qemu-img create disk.img 1G host-vm$ sudo chown libvirt-qemu:kvm lunar-server-cloudimg-amd64.img disk.img host-vm$ sudo chmod +x /home/ubuntu host-vm$ sudo virt-customize -a lunar-server-cloudimg-amd64.img --root-password password:1234 host-vm$ cat > test-vm.xml << __EOF__ <domain type='kvm' id='3'>   <name>test-vm</name>   <memory unit='GiB'>1</memory>   <currentMemory unit='GiB'>1</currentMemory>   <vcpu placement='static'>1</vcpu>   <resource>     <partition>/machine</partition>   </resource>   <os>     <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>     <boot dev='hd'/>   </os>   <features>     <acpi/>     <apic/>     <vmcoreinfo state='on'/>   </features>   <cpu mode='custom' match='exact' check='full'>     <model fallback='forbid'>Nehalem</model>     <topology sockets='1' cores='1' threads='1'/>     <feature policy='require' name='vme'/>     <feature policy='require' name='x2apic'/>     <feature policy='require' name='hypervisor'/>   </cpu>   <clock offset='utc'>     <timer name='pit' tickpolicy='delay'/>     <timer name='rtc' tickpolicy='catchup'/>     <timer name='hpet' present='no'/>   </clock>   <on_poweroff>destroy</on_poweroff>   <on_reboot>restart</on_reboot>   <on_crash>destroy</on_crash>   <devices>     <emulator>/usr/bin/qemu-system-x86_64</emulator>     <disk type='file' device='disk'>       <driver name='qemu' type='qcow2' cache='none'/>       <source file='/home/ubuntu/lunar-server-cloudimg-amd64.img' index='2'/>       <target dev='vda' bus='virtio'/>       <alias name='virtio-disk0'/>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>     </disk>     <controller type='usb' index='0' model='none'>       <alias name='usb'/>     </controller>     <controller type='pci' index='0' model='pci-root'>       <alias name='pci.0'/>     </controller>     <controller type='ide' index='0'>       <alias name='ide'/>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>     </controller>     <input type='mouse' bus='ps2'>       <alias name='input0'/>     </input>     <input type='keyboard' bus='ps2'>       <alias name='input1'/>     </input>     <serial type='pty'>       <source path='/dev/pts/0'/>       <target type='isa-serial' port='0'>         <model name='isa-serial'/>       </target>       <alias name='serial0'/>     </serial>     <console type='pty' tty='/dev/pts/0'>       <source path='/dev/pts/0'/>       <target type='serial' port='0'/>       <alias name='serial0'/>     </console>     <memballoon model='virtio'>       <stats period='10'/>       <alias name='balloon0'/>       <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>     </memballoon>     <rng model='virtio'>       <backend model='random'>/dev/urandom</backend>       <alias name='rng0'/>       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>     </rng>   </devices> </domain> __EOF__ host-vm$ virsh define test-vm.xml host-vm$ virsh start test-vm host-vm$ virsh console test-vm Wait for the VM to boot, log into it (user is "root", password is "1234"), and execute: nested-vm# sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT="\(.*\)"$/GRUB_CMDLINE_LINUX_DEFAULT="\1 boot_delay=100"/' /etc/default/grub.d/50-cloudimg-settings.cfg nested-vm# update-grub nested-vm# reboot Keep this terminal open, and quickly open another terminal, log into the host VM (running on LXD, named qemu-bug2018733-jammy in this Test Plan). physical-machine$ lxc shell qemu-bug2018733-jammy host-vm# su - ubuntu Closely monitor the reboot process of the nested VM on the first terminal. When the VM starts booting again, switch to the second terminal (inside the host VM) and issue: host-vm$ virsh attach-disk test-vm /home/ubuntu/disk.img vdx --live --persistent && sleep 1 && virsh detach-disk test-vm --live vdx You will notice that the detach operation apparently succeeded, but you can confirm that it did not by doing: host-vm$ virsh domblklist test-vm You will notice that the new disk (named "vpx") is still attached to the VM. If you try to detach it again, you will get an error: host-vm$ virsh detach-disk test-vm --live vdx error: Failed to detach disk error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug [ Previous Test Plan ] 1. Modify the Ubuntu cloudguest image to have the boot_delay=100 added to the kernel args to simulate a slowly host 2. Start the Ubuntu domain and connect to the serial console to see it boot 3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. But note that the guest boot is slowed down with the boot_delay=100 kernel arg. 4. From a second terminal attach an additional disk to the guest. It succeeds. 5. Wait a second 6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds. 7. Check the domain XML, the disk is still attached 8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached. 9. Check the virsh domblklist output. The disk is still attached. 10. Try to detach the disk again. It fails with "error: device not found: no target device" An flow of these with commands and example output can be seen at: https://gitlab.com/libvirt/libvirt/uploads/651523ecd79419a6dac574504fb4d531/reproduction-logs.txt [ Where problems could occur ]  * Depending how far we backport this (at least v6.2 in Jammy)    we need to double check if the used callbacks and settings work    the same back then. While this can be just "tested" it should    also get a review of related changes to be sure.  * The change and thereby regressions are limited to acpi PCI    hotplug and for a software so complex as qemu it is always good    to be able to clearly point to a small subset of the use cases    to know what to look out for. [ Other Info ]  * n/a ----------- This was kindly reported by ~kashyapc I only convert this into a bug for tracking. --- Report: This [1] QEMU patch solves a genuine bug [2] involving disk hot- unplug. More details in the commit message, and also in the bug that is linked here[2]. I have also flagged the fix for QEMU 8.0 stable[3], and tested that the fix itself works[4]. Please pick up the fix[1] once it merges. [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html     — acpi: pcihp: allow repeating hot-unplug requests [2] https://gitlab.com/libvirt/libvirt/-/issues/309 [3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html --- ^^ report --- vv extra context Note: - [2] + [4] have tests steps we can use for SRU verification. - This has landed upstream by now   https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4 - Also landed in 8.0 stable staging as   https://gitlab.com/qemu-project/qemu/-/commit/76326210e439
2023-05-30 19:12:03 Launchpad Janitor merge proposal linked https://code.launchpad.net/~sergiodj/ubuntu/+source/qemu/+git/qemu/+merge/443832
2023-05-31 18:20:12 Sergio Durigan Junior qemu (Ubuntu Jammy): status Confirmed In Progress
2023-05-31 18:20:14 Sergio Durigan Junior qemu (Ubuntu Kinetic): status Confirmed In Progress
2023-05-31 18:20:15 Sergio Durigan Junior qemu (Ubuntu Lunar): status Confirmed In Progress
2023-06-01 16:04:58 Ubuntu Archive Robot bug added subscriber Sergio Durigan Junior
2023-06-02 17:52:03 Steve Langasek qemu (Ubuntu Lunar): status In Progress Fix Committed
2023-06-02 17:52:05 Steve Langasek bug added subscriber Ubuntu Stable Release Updates Team
2023-06-02 17:52:19 Steve Langasek bug added subscriber SRU Verification
2023-06-02 17:52:23 Steve Langasek tags server-todo server-todo verification-needed verification-needed-lunar
2023-06-05 10:12:49 Łukasz Zemczak qemu (Ubuntu Jammy): status In Progress Fix Committed
2023-06-05 10:12:51 Łukasz Zemczak tags server-todo verification-needed verification-needed-lunar server-todo verification-needed verification-needed-jammy verification-needed-lunar
2023-06-05 10:13:36 Łukasz Zemczak qemu (Ubuntu Kinetic): status In Progress Fix Committed
2023-06-05 10:13:39 Łukasz Zemczak tags server-todo verification-needed verification-needed-jammy verification-needed-lunar server-todo verification-needed verification-needed-jammy verification-needed-kinetic verification-needed-lunar
2023-06-06 02:22:59 Sergio Durigan Junior tags server-todo verification-needed verification-needed-jammy verification-needed-kinetic verification-needed-lunar server-todo verification-done verification-done-jammy verification-done-kinetic verification-done-lunar
2023-06-15 20:15:19 Launchpad Janitor qemu (Ubuntu Lunar): status Fix Committed Fix Released
2023-06-15 20:15:24 Andreas Hasenack removed subscriber Ubuntu Stable Release Updates Team
2023-06-15 20:15:40 Launchpad Janitor qemu (Ubuntu Kinetic): status Fix Committed Fix Released
2023-06-15 20:15:56 Launchpad Janitor qemu (Ubuntu Jammy): status Fix Committed Fix Released
2023-09-07 00:40:25 Launchpad Janitor merge proposal linked https://code.launchpad.net/~mitchdz/ubuntu/+source/qemu/+git/qemu/+merge/450830
2023-09-07 00:41:45 Mitchell Dzurick merge proposal unlinked https://code.launchpad.net/~mitchdz/ubuntu/+source/qemu/+git/qemu/+merge/450830