2023-05-08 12:39:36 |
Christian Ehrhardt |
bug |
|
|
added bug |
2023-05-08 12:39:45 |
Christian Ehrhardt |
bug |
|
|
added subscriber Kashyap Chamarthy |
2023-05-08 12:39:53 |
Christian Ehrhardt |
bug |
|
|
added subscriber Ubuntu Server |
2023-05-08 12:40:02 |
Christian Ehrhardt |
nominated for series |
|
Ubuntu Lunar |
|
2023-05-08 12:40:02 |
Christian Ehrhardt |
bug task added |
|
qemu (Ubuntu Lunar) |
|
2023-05-08 12:40:02 |
Christian Ehrhardt |
nominated for series |
|
Ubuntu Jammy |
|
2023-05-08 12:40:02 |
Christian Ehrhardt |
bug task added |
|
qemu (Ubuntu Jammy) |
|
2023-05-08 12:40:02 |
Christian Ehrhardt |
nominated for series |
|
Ubuntu Kinetic |
|
2023-05-08 12:40:02 |
Christian Ehrhardt |
bug task added |
|
qemu (Ubuntu Kinetic) |
|
2023-05-08 12:40:02 |
Christian Ehrhardt |
nominated for series |
|
Ubuntu Mantic |
|
2023-05-08 12:40:02 |
Christian Ehrhardt |
bug task added |
|
qemu (Ubuntu Mantic) |
|
2023-05-08 12:40:11 |
Christian Ehrhardt |
tags |
|
server-todo |
|
2023-05-08 12:40:55 |
Christian Ehrhardt |
description |
This was kindly reported by ~kashyapc
I only convert this into a bug for tracking.
This [1] QEMU patch solves a genuine bug [2] involving disk hot-
unplug. More details in the commit message, and also in the bug that is
linked here[2].
I have also flagged the fix for QEMU 8.0 stable[3], and tested that the
fix itself works[4].
Please pick up the fix[1] once it merges.
[1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html
— acpi: pcihp: allow repeating hot-unplug requests
[2] https://gitlab.com/libvirt/libvirt/-/issues/309
[3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html
[4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html
--- ^^ report
--- vv context
Note:
- [2] + [4] have tests steps we can use for SRU verification.
- This has landed upstream by now
https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4
- Also landed in 8.0 stable staging as
https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 |
This was kindly reported by ~kashyapc
I only convert this into a bug for tracking.
---
Report:
This [1] QEMU patch solves a genuine bug [2] involving disk hot-
unplug. More details in the commit message, and also in the bug that is
linked here[2].
I have also flagged the fix for QEMU 8.0 stable[3], and tested that the
fix itself works[4].
Please pick up the fix[1] once it merges.
[1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html
— acpi: pcihp: allow repeating hot-unplug requests
[2] https://gitlab.com/libvirt/libvirt/-/issues/309
[3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html
[4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html
--- ^^ report
--- vv extra context
Note:
- [2] + [4] have tests steps we can use for SRU verification.
- This has landed upstream by now
https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4
- Also landed in 8.0 stable staging as
https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 |
|
2023-05-08 12:47:46 |
Christian Ehrhardt |
description |
This was kindly reported by ~kashyapc
I only convert this into a bug for tracking.
---
Report:
This [1] QEMU patch solves a genuine bug [2] involving disk hot-
unplug. More details in the commit message, and also in the bug that is
linked here[2].
I have also flagged the fix for QEMU 8.0 stable[3], and tested that the
fix itself works[4].
Please pick up the fix[1] once it merges.
[1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html
— acpi: pcihp: allow repeating hot-unplug requests
[2] https://gitlab.com/libvirt/libvirt/-/issues/309
[3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html
[4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html
--- ^^ report
--- vv extra context
Note:
- [2] + [4] have tests steps we can use for SRU verification.
- This has landed upstream by now
https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4
- Also landed in 8.0 stable staging as
https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 |
[ Impact ]
* In the past one could unplug a device, but if that didn't work
it could be tried again. Changes in q35 backend for hotplug
now will only queue one. But if that unplug was very early
the guest will clean the GPEx.status and thereby never see
the event.
* The fix makes ACPI PCI behave the same as pcie which means
allowing to requeue them, but under a rate controlling limit.
[ Test Plan ]
1. Modify the Ubuntu cloudguest image to have the boot_delay=100 added to the kernel args to simulate a slowly host
2. Start the Ubuntu domain and connect to the serial console to see it boot
3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. But note that the guest boot is slowed down with the boot_delay=100 kernel arg.
4. From a second terminal attach an additional disk to the guest. It succeeds.
5. Wait a second
6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds.
7. Check the domain XML, the disk is still attached
8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached.
9. Check the virsh domblklist output. The disk is still attached.
10. Try to detach the disk again. It fails with "error: device not found: no target device"
An flow of these with commands and example output can be seen at:
https://gitlab.com/libvirt/libvirt/uploads/651523ecd79419a6dac574504fb4d531/reproduction-logs.txt
[ Where problems could occur ]
* Depending how far we backport this (at least v6.2 in Jammy)
we need to double check if the used callbacks and settings work
the same back then. While this can be just "tested" it should
also get a review of related changes to be sure.
* The change and thereby regressions are limited to acpi PCI
hotplug and for a software so complex as qemu it is always good
to be able to clearly point to a small subset of the use cases
to know what to look out for.
[ Other Info ]
* n/a
-----------
This was kindly reported by ~kashyapc
I only convert this into a bug for tracking.
---
Report:
This [1] QEMU patch solves a genuine bug [2] involving disk hot-
unplug. More details in the commit message, and also in the bug that is
linked here[2].
I have also flagged the fix for QEMU 8.0 stable[3], and tested that the
fix itself works[4].
Please pick up the fix[1] once it merges.
[1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html
— acpi: pcihp: allow repeating hot-unplug requests
[2] https://gitlab.com/libvirt/libvirt/-/issues/309
[3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html
[4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html
--- ^^ report
--- vv extra context
Note:
- [2] + [4] have tests steps we can use for SRU verification.
- This has landed upstream by now
https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4
- Also landed in 8.0 stable staging as
https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 |
|
2023-05-10 15:13:55 |
Christian Ehrhardt |
qemu (Ubuntu Mantic): assignee |
|
Sergio Durigan Junior (sergiodj) |
|
2023-05-17 15:28:06 |
Christian Ehrhardt |
qemu (Ubuntu Lunar): assignee |
|
Sergio Durigan Junior (sergiodj) |
|
2023-05-17 15:28:12 |
Christian Ehrhardt |
qemu (Ubuntu Jammy): assignee |
|
Sergio Durigan Junior (sergiodj) |
|
2023-05-17 15:28:22 |
Christian Ehrhardt |
qemu (Ubuntu Kinetic): assignee |
|
Sergio Durigan Junior (sergiodj) |
|
2023-05-18 23:00:02 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~sergiodj/ubuntu/+source/qemu/+git/qemu/+merge/443231 |
|
2023-05-19 08:07:03 |
Launchpad Janitor |
qemu (Ubuntu): status |
New |
Confirmed |
|
2023-05-19 08:07:03 |
Launchpad Janitor |
qemu (Ubuntu Jammy): status |
New |
Confirmed |
|
2023-05-19 08:07:03 |
Launchpad Janitor |
qemu (Ubuntu Kinetic): status |
New |
Confirmed |
|
2023-05-19 08:07:03 |
Launchpad Janitor |
qemu (Ubuntu Lunar): status |
New |
Confirmed |
|
2023-05-25 10:49:00 |
Launchpad Janitor |
qemu (Ubuntu Mantic): status |
Confirmed |
Fix Released |
|
2023-05-29 21:10:07 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~sergiodj/ubuntu/+source/qemu/+git/qemu/+merge/443753 |
|
2023-05-30 03:35:36 |
Sergio Durigan Junior |
description |
[ Impact ]
* In the past one could unplug a device, but if that didn't work
it could be tried again. Changes in q35 backend for hotplug
now will only queue one. But if that unplug was very early
the guest will clean the GPEx.status and thereby never see
the event.
* The fix makes ACPI PCI behave the same as pcie which means
allowing to requeue them, but under a rate controlling limit.
[ Test Plan ]
1. Modify the Ubuntu cloudguest image to have the boot_delay=100 added to the kernel args to simulate a slowly host
2. Start the Ubuntu domain and connect to the serial console to see it boot
3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. But note that the guest boot is slowed down with the boot_delay=100 kernel arg.
4. From a second terminal attach an additional disk to the guest. It succeeds.
5. Wait a second
6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds.
7. Check the domain XML, the disk is still attached
8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached.
9. Check the virsh domblklist output. The disk is still attached.
10. Try to detach the disk again. It fails with "error: device not found: no target device"
An flow of these with commands and example output can be seen at:
https://gitlab.com/libvirt/libvirt/uploads/651523ecd79419a6dac574504fb4d531/reproduction-logs.txt
[ Where problems could occur ]
* Depending how far we backport this (at least v6.2 in Jammy)
we need to double check if the used callbacks and settings work
the same back then. While this can be just "tested" it should
also get a review of related changes to be sure.
* The change and thereby regressions are limited to acpi PCI
hotplug and for a software so complex as qemu it is always good
to be able to clearly point to a small subset of the use cases
to know what to look out for.
[ Other Info ]
* n/a
-----------
This was kindly reported by ~kashyapc
I only convert this into a bug for tracking.
---
Report:
This [1] QEMU patch solves a genuine bug [2] involving disk hot-
unplug. More details in the commit message, and also in the bug that is
linked here[2].
I have also flagged the fix for QEMU 8.0 stable[3], and tested that the
fix itself works[4].
Please pick up the fix[1] once it merges.
[1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html
— acpi: pcihp: allow repeating hot-unplug requests
[2] https://gitlab.com/libvirt/libvirt/-/issues/309
[3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html
[4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html
--- ^^ report
--- vv extra context
Note:
- [2] + [4] have tests steps we can use for SRU verification.
- This has landed upstream by now
https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4
- Also landed in 8.0 stable staging as
https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 |
[ Impact ]
* In the past one could unplug a device, but if that didn't work
it could be tried again. Changes in q35 backend for hotplug
now will only queue one. But if that unplug was very early
the guest will clean the GPEx.status and thereby never see
the event.
* The fix makes ACPI PCI behave the same as pcie which means
allowing to requeue them, but under a rate controlling limit.
[ Test Plan ]
First, let's prepare an LXD VM to serve as our testbed. In this Test Plan we'll be using a Jammy VM.
physical-machine$ lxc launch ubuntu:jammy qemu-bug2018733-jammy --vm -c limits.memory=8GB
physical-machine$ lxc shell qemu-bug2018733-jammy
host-vm# apt update
host-vm# apt install -y libvirt-daemon-system libguestfs-tools
host-vm# usermod -a -G libvirt ubuntu
host-vm# usermod -a -G kvm ubuntu
host-vm# su - ubuntu
In order to reproduce the issue, we will need to quickly attach and detach a disk into/from a VM. To do that, let's use an Ubuntu Cloud image and adjust its kernel's "boot_delay" parameter to give us time to perform the necessary operations.
host-vm$ wget https://cloud-images.ubuntu.com/lunar/current/lunar-server-cloudimg-amd64.img
host-vm$ qemu-img create disk.img 1G
host-vm$ sudo chown libvirt-qemu:kvm lunar-server-cloudimg-amd64.img disk.img
host-vm$ sudo chmod +x /home/ubuntu
host-vm$ sudo virt-customize -a lunar-server-cloudimg-amd64.img --root-password password:1234
host-vm$ cat > test-vm.xml << __EOF__
<domain type='kvm' id='3'>
<name>test-vm</name>
<memory unit='GiB'>1</memory>
<currentMemory unit='GiB'>1</currentMemory>
<vcpu placement='static'>1</vcpu>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<vmcoreinfo state='on'/>
</features>
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>Nehalem</model>
<topology sockets='1' cores='1' threads='1'/>
<feature policy='require' name='vme'/>
<feature policy='require' name='x2apic'/>
<feature policy='require' name='hypervisor'/>
</cpu>
<clock offset='utc'>
<timer name='pit' tickpolicy='delay'/>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/home/ubuntu/lunar-server-cloudimg-amd64.img' index='2'/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
<controller type='usb' index='0' model='none'>
<alias name='usb'/>
</controller>
<controller type='pci' index='0' model='pci-root'>
<alias name='pci.0'/>
</controller>
<controller type='ide' index='0'>
<alias name='ide'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<input type='mouse' bus='ps2'>
<alias name='input0'/>
</input>
<input type='keyboard' bus='ps2'>
<alias name='input1'/>
</input>
<serial type='pty'>
<source path='/dev/pts/0'/>
<target type='isa-serial' port='0'>
<model name='isa-serial'/>
</target>
<alias name='serial0'/>
</serial>
<console type='pty' tty='/dev/pts/0'>
<source path='/dev/pts/0'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
</console>
<memballoon model='virtio'>
<stats period='10'/>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</memballoon>
<rng model='virtio'>
<backend model='random'>/dev/urandom</backend>
<alias name='rng0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</rng>
</devices>
</domain>
__EOF__
host-vm$ virsh define test-vm.xml
host-vm$ virsh start test-vm
host-vm$ virsh console test-vm
Wait for the VM to boot, log into it (user is "root", password is "1234"), and execute:
nested-vm# sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT="\(.*\)"$/GRUB_CMDLINE_LINUX_DEFAULT="\1 boot_delay=100"/' /etc/default/grub.d/50-cloudimg-settings.cfg
nested-vm# update-grub
nested-vm# reboot
Keep this terminal open, and quickly open another terminal, log into the host VM (running on LXD, named qemu-bug2018733-jammy in this Test Plan).
physical-machine$ lxc shell qemu-bug2018733-jammy
host-vm# su - ubuntu
Closely monitor the reboot process of the nested VM on the first terminal. When the VM starts booting again, switch to the second terminal (inside the host VM) and issue:
host-vm$ virsh attach-disk test-vm /home/ubuntu/disk.img vdx --live --persistent && sleep 1 && virsh detach-disk test-vm --live vdx
You will notice that the detach operation apparently succeeded, but you can confirm that it did not by doing:
host-vm$ virsh domblklist test-vm
You will notice that the new disk (named "vpx") is still attached to the VM. If you try to detach it again, you will get an error:
host-vm$ virsh detach-disk test-vm --live vdx
error: Failed to detach disk
error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug
[ Previous Test Plan ]
1. Modify the Ubuntu cloudguest image to have the boot_delay=100 added to the kernel args to simulate a slowly host
2. Start the Ubuntu domain and connect to the serial console to see it boot
3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. But note that the guest boot is slowed down with the boot_delay=100 kernel arg.
4. From a second terminal attach an additional disk to the guest. It succeeds.
5. Wait a second
6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds.
7. Check the domain XML, the disk is still attached
8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached.
9. Check the virsh domblklist output. The disk is still attached.
10. Try to detach the disk again. It fails with "error: device not found: no target device"
An flow of these with commands and example output can be seen at:
https://gitlab.com/libvirt/libvirt/uploads/651523ecd79419a6dac574504fb4d531/reproduction-logs.txt
[ Where problems could occur ]
* Depending how far we backport this (at least v6.2 in Jammy)
we need to double check if the used callbacks and settings work
the same back then. While this can be just "tested" it should
also get a review of related changes to be sure.
* The change and thereby regressions are limited to acpi PCI
hotplug and for a software so complex as qemu it is always good
to be able to clearly point to a small subset of the use cases
to know what to look out for.
[ Other Info ]
* n/a
-----------
This was kindly reported by ~kashyapc
I only convert this into a bug for tracking.
---
Report:
This [1] QEMU patch solves a genuine bug [2] involving disk hot-
unplug. More details in the commit message, and also in the bug that is
linked here[2].
I have also flagged the fix for QEMU 8.0 stable[3], and tested that the
fix itself works[4].
Please pick up the fix[1] once it merges.
[1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html
— acpi: pcihp: allow repeating hot-unplug requests
[2] https://gitlab.com/libvirt/libvirt/-/issues/309
[3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html
[4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html
--- ^^ report
--- vv extra context
Note:
- [2] + [4] have tests steps we can use for SRU verification.
- This has landed upstream by now
https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4
- Also landed in 8.0 stable staging as
https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 |
|
2023-05-30 18:56:23 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~sergiodj/ubuntu/+source/qemu/+git/qemu/+merge/443831 |
|
2023-05-30 18:58:30 |
Sergio Durigan Junior |
description |
[ Impact ]
* In the past one could unplug a device, but if that didn't work
it could be tried again. Changes in q35 backend for hotplug
now will only queue one. But if that unplug was very early
the guest will clean the GPEx.status and thereby never see
the event.
* The fix makes ACPI PCI behave the same as pcie which means
allowing to requeue them, but under a rate controlling limit.
[ Test Plan ]
First, let's prepare an LXD VM to serve as our testbed. In this Test Plan we'll be using a Jammy VM.
physical-machine$ lxc launch ubuntu:jammy qemu-bug2018733-jammy --vm -c limits.memory=8GB
physical-machine$ lxc shell qemu-bug2018733-jammy
host-vm# apt update
host-vm# apt install -y libvirt-daemon-system libguestfs-tools
host-vm# usermod -a -G libvirt ubuntu
host-vm# usermod -a -G kvm ubuntu
host-vm# su - ubuntu
In order to reproduce the issue, we will need to quickly attach and detach a disk into/from a VM. To do that, let's use an Ubuntu Cloud image and adjust its kernel's "boot_delay" parameter to give us time to perform the necessary operations.
host-vm$ wget https://cloud-images.ubuntu.com/lunar/current/lunar-server-cloudimg-amd64.img
host-vm$ qemu-img create disk.img 1G
host-vm$ sudo chown libvirt-qemu:kvm lunar-server-cloudimg-amd64.img disk.img
host-vm$ sudo chmod +x /home/ubuntu
host-vm$ sudo virt-customize -a lunar-server-cloudimg-amd64.img --root-password password:1234
host-vm$ cat > test-vm.xml << __EOF__
<domain type='kvm' id='3'>
<name>test-vm</name>
<memory unit='GiB'>1</memory>
<currentMemory unit='GiB'>1</currentMemory>
<vcpu placement='static'>1</vcpu>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<vmcoreinfo state='on'/>
</features>
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>Nehalem</model>
<topology sockets='1' cores='1' threads='1'/>
<feature policy='require' name='vme'/>
<feature policy='require' name='x2apic'/>
<feature policy='require' name='hypervisor'/>
</cpu>
<clock offset='utc'>
<timer name='pit' tickpolicy='delay'/>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/home/ubuntu/lunar-server-cloudimg-amd64.img' index='2'/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
<controller type='usb' index='0' model='none'>
<alias name='usb'/>
</controller>
<controller type='pci' index='0' model='pci-root'>
<alias name='pci.0'/>
</controller>
<controller type='ide' index='0'>
<alias name='ide'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<input type='mouse' bus='ps2'>
<alias name='input0'/>
</input>
<input type='keyboard' bus='ps2'>
<alias name='input1'/>
</input>
<serial type='pty'>
<source path='/dev/pts/0'/>
<target type='isa-serial' port='0'>
<model name='isa-serial'/>
</target>
<alias name='serial0'/>
</serial>
<console type='pty' tty='/dev/pts/0'>
<source path='/dev/pts/0'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
</console>
<memballoon model='virtio'>
<stats period='10'/>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</memballoon>
<rng model='virtio'>
<backend model='random'>/dev/urandom</backend>
<alias name='rng0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</rng>
</devices>
</domain>
__EOF__
host-vm$ virsh define test-vm.xml
host-vm$ virsh start test-vm
host-vm$ virsh console test-vm
Wait for the VM to boot, log into it (user is "root", password is "1234"), and execute:
nested-vm# sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT="\(.*\)"$/GRUB_CMDLINE_LINUX_DEFAULT="\1 boot_delay=100"/' /etc/default/grub.d/50-cloudimg-settings.cfg
nested-vm# update-grub
nested-vm# reboot
Keep this terminal open, and quickly open another terminal, log into the host VM (running on LXD, named qemu-bug2018733-jammy in this Test Plan).
physical-machine$ lxc shell qemu-bug2018733-jammy
host-vm# su - ubuntu
Closely monitor the reboot process of the nested VM on the first terminal. When the VM starts booting again, switch to the second terminal (inside the host VM) and issue:
host-vm$ virsh attach-disk test-vm /home/ubuntu/disk.img vdx --live --persistent && sleep 1 && virsh detach-disk test-vm --live vdx
You will notice that the detach operation apparently succeeded, but you can confirm that it did not by doing:
host-vm$ virsh domblklist test-vm
You will notice that the new disk (named "vpx") is still attached to the VM. If you try to detach it again, you will get an error:
host-vm$ virsh detach-disk test-vm --live vdx
error: Failed to detach disk
error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug
[ Previous Test Plan ]
1. Modify the Ubuntu cloudguest image to have the boot_delay=100 added to the kernel args to simulate a slowly host
2. Start the Ubuntu domain and connect to the serial console to see it boot
3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. But note that the guest boot is slowed down with the boot_delay=100 kernel arg.
4. From a second terminal attach an additional disk to the guest. It succeeds.
5. Wait a second
6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds.
7. Check the domain XML, the disk is still attached
8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached.
9. Check the virsh domblklist output. The disk is still attached.
10. Try to detach the disk again. It fails with "error: device not found: no target device"
An flow of these with commands and example output can be seen at:
https://gitlab.com/libvirt/libvirt/uploads/651523ecd79419a6dac574504fb4d531/reproduction-logs.txt
[ Where problems could occur ]
* Depending how far we backport this (at least v6.2 in Jammy)
we need to double check if the used callbacks and settings work
the same back then. While this can be just "tested" it should
also get a review of related changes to be sure.
* The change and thereby regressions are limited to acpi PCI
hotplug and for a software so complex as qemu it is always good
to be able to clearly point to a small subset of the use cases
to know what to look out for.
[ Other Info ]
* n/a
-----------
This was kindly reported by ~kashyapc
I only convert this into a bug for tracking.
---
Report:
This [1] QEMU patch solves a genuine bug [2] involving disk hot-
unplug. More details in the commit message, and also in the bug that is
linked here[2].
I have also flagged the fix for QEMU 8.0 stable[3], and tested that the
fix itself works[4].
Please pick up the fix[1] once it merges.
[1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html
— acpi: pcihp: allow repeating hot-unplug requests
[2] https://gitlab.com/libvirt/libvirt/-/issues/309
[3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html
[4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html
--- ^^ report
--- vv extra context
Note:
- [2] + [4] have tests steps we can use for SRU verification.
- This has landed upstream by now
https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4
- Also landed in 8.0 stable staging as
https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 |
[ Impact ]
* In the past one could unplug a device, but if that didn't work
it could be tried again. Changes in q35 backend for hotplug
now will only queue one. But if that unplug was very early
the guest will clean the GPEx.status and thereby never see
the event.
* The fix makes ACPI PCI behave the same as pcie which means
allowing to requeue them, but under a rate controlling limit.
[ Test Plan ]
First, let's prepare an LXD VM to serve as our testbed. In this Test Plan we'll be using a Jammy VM.
physical-machine$ lxc launch ubuntu:jammy qemu-bug2018733-jammy --vm -c limits.memory=8GB
physical-machine$ lxc shell qemu-bug2018733-jammy
host-vm# apt update
host-vm# apt install -y libvirt-daemon-system libguestfs-tools
host-vm# usermod -a -G libvirt,kvm ubuntu
host-vm# su - ubuntu
In order to reproduce the issue, we will need to quickly attach and detach a disk into/from a VM. To do that, let's use an Ubuntu Cloud image and adjust its kernel's "boot_delay" parameter to give us time to perform the necessary operations.
host-vm$ wget https://cloud-images.ubuntu.com/lunar/current/lunar-server-cloudimg-amd64.img
host-vm$ qemu-img create disk.img 1G
host-vm$ sudo chown libvirt-qemu:kvm lunar-server-cloudimg-amd64.img disk.img
host-vm$ sudo chmod +x /home/ubuntu
host-vm$ sudo virt-customize -a lunar-server-cloudimg-amd64.img --root-password password:1234
host-vm$ cat > test-vm.xml << __EOF__
<domain type='kvm' id='3'>
<name>test-vm</name>
<memory unit='GiB'>1</memory>
<currentMemory unit='GiB'>1</currentMemory>
<vcpu placement='static'>1</vcpu>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<vmcoreinfo state='on'/>
</features>
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>Nehalem</model>
<topology sockets='1' cores='1' threads='1'/>
<feature policy='require' name='vme'/>
<feature policy='require' name='x2apic'/>
<feature policy='require' name='hypervisor'/>
</cpu>
<clock offset='utc'>
<timer name='pit' tickpolicy='delay'/>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/home/ubuntu/lunar-server-cloudimg-amd64.img' index='2'/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
<controller type='usb' index='0' model='none'>
<alias name='usb'/>
</controller>
<controller type='pci' index='0' model='pci-root'>
<alias name='pci.0'/>
</controller>
<controller type='ide' index='0'>
<alias name='ide'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<input type='mouse' bus='ps2'>
<alias name='input0'/>
</input>
<input type='keyboard' bus='ps2'>
<alias name='input1'/>
</input>
<serial type='pty'>
<source path='/dev/pts/0'/>
<target type='isa-serial' port='0'>
<model name='isa-serial'/>
</target>
<alias name='serial0'/>
</serial>
<console type='pty' tty='/dev/pts/0'>
<source path='/dev/pts/0'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
</console>
<memballoon model='virtio'>
<stats period='10'/>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</memballoon>
<rng model='virtio'>
<backend model='random'>/dev/urandom</backend>
<alias name='rng0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</rng>
</devices>
</domain>
__EOF__
host-vm$ virsh define test-vm.xml
host-vm$ virsh start test-vm
host-vm$ virsh console test-vm
Wait for the VM to boot, log into it (user is "root", password is "1234"), and execute:
nested-vm# sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT="\(.*\)"$/GRUB_CMDLINE_LINUX_DEFAULT="\1 boot_delay=100"/' /etc/default/grub.d/50-cloudimg-settings.cfg
nested-vm# update-grub
nested-vm# reboot
Keep this terminal open, and quickly open another terminal, log into the host VM (running on LXD, named qemu-bug2018733-jammy in this Test Plan).
physical-machine$ lxc shell qemu-bug2018733-jammy
host-vm# su - ubuntu
Closely monitor the reboot process of the nested VM on the first terminal. When the VM starts booting again, switch to the second terminal (inside the host VM) and issue:
host-vm$ virsh attach-disk test-vm /home/ubuntu/disk.img vdx --live --persistent && sleep 1 && virsh detach-disk test-vm --live vdx
You will notice that the detach operation apparently succeeded, but you can confirm that it did not by doing:
host-vm$ virsh domblklist test-vm
You will notice that the new disk (named "vpx") is still attached to the VM. If you try to detach it again, you will get an error:
host-vm$ virsh detach-disk test-vm --live vdx
error: Failed to detach disk
error: internal error: unable to execute QEMU command 'device_del': Device virtio-disk23 is already in the process of unplug
[ Previous Test Plan ]
1. Modify the Ubuntu cloudguest image to have the boot_delay=100 added to the kernel args to simulate a slowly host
2. Start the Ubuntu domain and connect to the serial console to see it boot
3. Wait until the first messages appear in the console. This is around T+50sec from the virsh start. But note that the guest boot is slowed down with the boot_delay=100 kernel arg.
4. From a second terminal attach an additional disk to the guest. It succeeds.
5. Wait a second
6. Detach the additional disk from the guest. The virsh command hangs for couple of seconds, but then succeeds.
7. Check the domain XML, the disk is still attached
8. Check the lsblk command from the guest (after it is fully booted). The disk is still attached.
9. Check the virsh domblklist output. The disk is still attached.
10. Try to detach the disk again. It fails with "error: device not found: no target device"
An flow of these with commands and example output can be seen at:
https://gitlab.com/libvirt/libvirt/uploads/651523ecd79419a6dac574504fb4d531/reproduction-logs.txt
[ Where problems could occur ]
* Depending how far we backport this (at least v6.2 in Jammy)
we need to double check if the used callbacks and settings work
the same back then. While this can be just "tested" it should
also get a review of related changes to be sure.
* The change and thereby regressions are limited to acpi PCI
hotplug and for a software so complex as qemu it is always good
to be able to clearly point to a small subset of the use cases
to know what to look out for.
[ Other Info ]
* n/a
-----------
This was kindly reported by ~kashyapc
I only convert this into a bug for tracking.
---
Report:
This [1] QEMU patch solves a genuine bug [2] involving disk hot-
unplug. More details in the commit message, and also in the bug that is
linked here[2].
I have also flagged the fix for QEMU 8.0 stable[3], and tested that the
fix itself works[4].
Please pick up the fix[1] once it merges.
[1] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02473.html
— acpi: pcihp: allow repeating hot-unplug requests
[2] https://gitlab.com/libvirt/libvirt/-/issues/309
[3] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg04994.html
[4] https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg01070.html
--- ^^ report
--- vv extra context
Note:
- [2] + [4] have tests steps we can use for SRU verification.
- This has landed upstream by now
https://gitlab.com/qemu-project/qemu/-/commit/0f689cf5ada4
- Also landed in 8.0 stable staging as
https://gitlab.com/qemu-project/qemu/-/commit/76326210e439 |
|
2023-05-30 19:12:03 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~sergiodj/ubuntu/+source/qemu/+git/qemu/+merge/443832 |
|
2023-05-31 18:20:12 |
Sergio Durigan Junior |
qemu (Ubuntu Jammy): status |
Confirmed |
In Progress |
|
2023-05-31 18:20:14 |
Sergio Durigan Junior |
qemu (Ubuntu Kinetic): status |
Confirmed |
In Progress |
|
2023-05-31 18:20:15 |
Sergio Durigan Junior |
qemu (Ubuntu Lunar): status |
Confirmed |
In Progress |
|
2023-06-01 16:04:58 |
Ubuntu Archive Robot |
bug |
|
|
added subscriber Sergio Durigan Junior |
2023-06-02 17:52:03 |
Steve Langasek |
qemu (Ubuntu Lunar): status |
In Progress |
Fix Committed |
|
2023-06-02 17:52:05 |
Steve Langasek |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2023-06-02 17:52:19 |
Steve Langasek |
bug |
|
|
added subscriber SRU Verification |
2023-06-02 17:52:23 |
Steve Langasek |
tags |
server-todo |
server-todo verification-needed verification-needed-lunar |
|
2023-06-05 10:12:49 |
Łukasz Zemczak |
qemu (Ubuntu Jammy): status |
In Progress |
Fix Committed |
|
2023-06-05 10:12:51 |
Łukasz Zemczak |
tags |
server-todo verification-needed verification-needed-lunar |
server-todo verification-needed verification-needed-jammy verification-needed-lunar |
|
2023-06-05 10:13:36 |
Łukasz Zemczak |
qemu (Ubuntu Kinetic): status |
In Progress |
Fix Committed |
|
2023-06-05 10:13:39 |
Łukasz Zemczak |
tags |
server-todo verification-needed verification-needed-jammy verification-needed-lunar |
server-todo verification-needed verification-needed-jammy verification-needed-kinetic verification-needed-lunar |
|
2023-06-06 02:22:59 |
Sergio Durigan Junior |
tags |
server-todo verification-needed verification-needed-jammy verification-needed-kinetic verification-needed-lunar |
server-todo verification-done verification-done-jammy verification-done-kinetic verification-done-lunar |
|
2023-06-15 20:15:19 |
Launchpad Janitor |
qemu (Ubuntu Lunar): status |
Fix Committed |
Fix Released |
|
2023-06-15 20:15:24 |
Andreas Hasenack |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2023-06-15 20:15:40 |
Launchpad Janitor |
qemu (Ubuntu Kinetic): status |
Fix Committed |
Fix Released |
|
2023-06-15 20:15:56 |
Launchpad Janitor |
qemu (Ubuntu Jammy): status |
Fix Committed |
Fix Released |
|
2023-09-07 00:40:25 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~mitchdz/ubuntu/+source/qemu/+git/qemu/+merge/450830 |
|
2023-09-07 00:41:45 |
Mitchell Dzurick |
merge proposal unlinked |
https://code.launchpad.net/~mitchdz/ubuntu/+source/qemu/+git/qemu/+merge/450830 |
|
|