[UBUNTU 21.04] s390x/s390-virtio-ccw: Reset PCI devices during subsystem reset
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
| Ubuntu on IBM z Systems |
High
|
Skipper Bug Screeners | |||
qemu (Ubuntu) | Status tracked in Hirsute | |||||
| Focal |
Undecided
|
Unassigned | |||
| Groovy |
Undecided
|
Unassigned | |||
| Hirsute |
Undecided
|
Canonical Server Team |
Bug Description
[Impact]
Symptom: PCI devices are unavailable after a subsystem reset
Problem: When a subsystem reset event occurs (e.g. via kexec) PCI
[Test Case]
# Prep a guest and wait until it booted
$ apt install uvtool-libvirt
$ uvt-simplestrea
$ uvt-kvm create --disk 5 --password=ubuntu testguest release=focal arch=s390x label=daily
$ virsh console testguest
# lspci in guest shows nothing yet (expected)
# Add virtio device
$ cat > virtio-pci.xml << EOF
<interface type='network'>
<source network='default'/>
<model type='virtio'/>
<address type='pci'/>
<rom bar='off' file=''/>
</interface>
EOF
$ virsh attach-device testguest virtio-pci.xml
# lspci in guest now shows the device
ubuntu@testguest:~$ lspci
0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
# verify that a "normal" reboot does not loose the device
ubuntu@testguest:~$ sudo reboot
...
ubuntu@testguest:~$ lspci
0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
# Kexec into a kernel (can be the same)
ubuntu@testguest:~$ sudo apt install kexec-tools
ubuntu@testguest:~$ sudo kexec --load /boot/vmlinuz --initrd=
ubuntu@testguest:~$ sudo kexec --exec
# Log in and recheck lspci - it will be empty (wrong)
# With the Fix that will show the pci device again
ubuntu@testguest:~$ lspci
# Note: A Reboot will get the device back (in old and new case)
[Where problems could occur]
* The patch is gladly small - it affects the list of devices that will
be reset them. By extending this list obivously more devices will be
reset - therefore the activity of a "subsystem_reset" will cover more
devices.
Regressions (let us hope not) would happen there. For example think
there is a buggy PCI device that no one cared about before. Formerly it
would not have been reset, but now it is. If that reset fails badly you
have a regression.
Fortunately PCI devices are still uncommon on s390x, so even if (I
doubt) there is a regression it would affect a small fraqction of users
only.
These kind of resets happen on load (kexec, reboot, start) and that is
the place to look out for regressions.
[Other Info]
* n/a
---
Description: s390x/s390-
Symptom: PCI devices are unavailable after a subsystem reset
Problem: When a subsystem reset event occurs (e.g. via kexec) PCI
Solution: Add the s390 PCI host bridge to the list of devices to be
Reproduction: kexec on an s390x guest with PCI devices
db08244a3a7e s390x/s390-
This fix need to be applied to qemu for focal (20.04) and groovy (20.10).
Related branches
- Sergio Durigan Junior: Approve on 2021-01-05
- Canonical Server Team: Pending requested 2021-01-05
- Ubuntu Server Dev import team: Pending requested 2021-01-05
-
Diff: 73 lines (+51/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/series (+1/-0)
debian/patches/ubuntu/lp-1907656-s390x-s390-virtio-ccw-Reset-PCI-devices-during-subsy.patch (+43/-0)
- Sergio Durigan Junior: Needs Fixing on 2021-01-05
- Canonical Server Team: Pending requested 2021-01-05
- Ubuntu Server Dev import team: Pending requested 2021-01-05
-
Diff: 115 lines (+57/-15)4 files modifieddebian/changelog (+8/-0)
debian/patches/series (+1/-0)
debian/patches/ubuntu/lp-1907656-s390x-s390-virtio-ccw-Reset-PCI-devices-during-subsy.patch (+43/-0)
debian/rules (+5/-15)
tags: | added: architecture-s39064 bugnameltc-190224 severity-high targetmilestone-inin--- |
Changed in ubuntu: | |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
affects: | ubuntu → qemu (Ubuntu) |
Changed in qemu (Ubuntu): | |
assignee: | Skipper Bug Screeners (skipper-screen-team) → Canonical Server Team (canonical-server) |
Changed in ubuntu-z-systems: | |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
importance: | Undecided → High |
status: | New → Triaged |
tags: | added: qemu-21.04 |
Christian Ehrhardt (paelzer) wrote : | #1 |
Changed in qemu (Ubuntu Hirsute): | |
status: | New → In Progress |
------- Comment From <email address hidden> 2020-12-10 10:05 EDT-------
(In reply to comment #10)
> This is in qemu 5.2 which I'm already working on for hirsute.
> So -devel should be fixed soon (althrough testing on 5.2 will consume a few
> days).
>
> Three questions for the following SRU as this was flagged for Focal and
> Groovy as well.
>
> 1. How urgent/severe is it, do we need to move heaven and earth to get this
> completed before the Christmas downtime or can this be SRU released in
> January?
No, this can be released in January.
>
> 2. I see you said for repro "kexec on an s390x guest with PCI devices". But
> I'm sure you already have a script and or guest xmls and whatever else that
> is related. Anything I don't have to come up from-scratch will make handling
> this faster.
>
> 3. Do I need any special HW and/or configuration to achieve "s390x guest
> with PCI devices" like real PCI ?!? - or is it enough to try to force e.g.
> virtio-net-pci in? Again sample XMls and commands will help.
The issue was originally hit with vfio-pci passthrough (which would indeed require special hardware), but can also be recreated using emulated devices such as virtio-net-pci. I just did so on focal using an XML entry that looks like this for a guest:
<interface type='network'>
<source network='default'/>
<model type='virtio'/>
<address type='pci'/>
<rom bar='off'/>
</interface>
I will also attach a minimal guest XML entry that I used, you should be able to re-use just this with a different boot disk.
I don't have a script, but the process to reproduce is short and straightforward:
1) start the guest with the virtio-net-pci device
2) In the guest, 'lspci' to view the available PCI devices, you should see:
0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
3) In the guest, kexec to a different kernel, example:
kexec -l /path/to/image --initrd=
kexec -e
4) lspci after to view the available PCI devices - this time, there will be none listed; reboot of the guest is required to restore PCI devices. With the fix applied, lspci would show the same results as in step 2 above.
bugproxy (bugproxy) wrote : Sample Guest XML | #3 |
------- Comment (attachment only) From <email address hidden> 2020-12-10 10:07 EDT-------
Changed in ubuntu-z-systems: | |
status: | Triaged → In Progress |
Christian Ehrhardt (paelzer) wrote : | #4 |
Thank you, that contains all I need for the SRU later on!
The work on qemu 5.2 for hirsute will take a bit more time, but I've added the bug to the changelog.
So once it completes this bug will be auto-updated.
Changed in qemu (Ubuntu Groovy): | |
status: | New → Triaged |
Changed in qemu (Ubuntu Focal): | |
status: | New → Triaged |
Christian Ehrhardt (paelzer) wrote : | #5 |
FYI proper migration into 21.04 of qemu 5.2 is held back by systemd bug 1908259
Launchpad Janitor (janitor) wrote : | #6 |
This bug was fixed in the package qemu - 1:5.2+dfsg-2ubuntu1
---------------
qemu (1:5.2+
* Merge with Debian unstable
- includes fix for CVE-2020-17380
- includes a fix for s390x PCI device reset (LP: #1907656)
Remaining changes:
- qemu-kvm to systemd unit
- d/qemu-kvm-init: script for QEMU KVM preparation modules, ksm,
hugepages and architecture specifics
- d/qemu-
- d/qemu-
- d/qemu-
- d/rules: call dh_installinit and dh_installsystemd for qemu-kvm
- Distribution specific machine type (LP: 1304107 1621042)
- d/p/ubuntu/
- d/qemu-
for host-phys-bits=true (LP: 1776189)
- add an info about -hpb machine type in debian/
- provide pseries-
- ubuntu-q35 alias added to auto-select the most recent q35 ubuntu type
- Enable nesting by default
- d/p/ubuntu/
in qemu64 on amd
[ No more strictly needed, but required for backward compatibility ]
- improved dependencies
- Make qemu-system-common depend on qemu-block-extra
- Make qemu-utils depend on qemu-block-extra
- let qemu-utils recommend sharutils
- tolerate ipxe size change on migrations to >=18.04 (LP: 1713490)
- d/p/ubuntu/
reference 256k path
- d/control-in: depend on ipxe-qemu-
handle incoming migrations from former releases.
- d/control-in: Disable capstone disassembler library support (universe)
- d/qemu-
- d/control*, d/rules: disable xen by default, but provide universe
package qemu-system-x86-xen as alternative
[includes compat links changes of 5.0-5ubuntu4]
- allow qemu to load old modules post upgrade (LP 1847361)
- Drop d/qemu-
- d/rules: Drop generating package version into maintainer scripts
- d/qemu-
the bad old prerm (LP 1906245 1905377)
* Dropped Changes:
- d/control, d/rules: build with gcc-9 on armhf as workaround until
resolved in gcc-10 (LP: 1890435) [it is flaky still, but no more 100%
fails]
* Added Changes:
- Refreshed ubuntu machine types for hirsute@5.2
- d/control: regenerated from d/control-in
- d/p/ubuntu/
ld usage of -no-pie (LP: #1907789)
qemu (1:5.2+dfsg-2) unstable; urgency=medium
* move ui-opengl.so module from qemu-system-gui to qemu-system-common,
as other ...
Changed in qemu (Ubuntu Hirsute): | |
status: | In Progress → Fix Released |
Christian Ehrhardt (paelzer) wrote : | #7 |
SRU prepared (template in bug description) and tested against the PPA (https:/
I will upload that to -unapproved soon
description: | updated |
Christian Ehrhardt (paelzer) wrote : | #8 |
Uploaded and ready for the review by the SRU Team
Changed in qemu (Ubuntu Groovy): | |
status: | Triaged → In Progress |
Changed in qemu (Ubuntu Focal): | |
status: | Triaged → In Progress |
Hello bugproxy, or anyone else affected,
Accepted qemu into groovy-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-
Further information regarding the verification process can be found at https:/
N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.
Changed in qemu (Ubuntu Groovy): | |
status: | In Progress → Fix Committed |
tags: | added: verification-needed verification-needed-groovy |
Changed in qemu (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
tags: | added: verification-needed-focal |
Timo Aaltonen (tjaalton) wrote : | #10 |
Hello bugproxy, or anyone else affected,
Accepted qemu into focal-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-
Further information regarding the verification process can be found at https:/
N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.
Changed in ubuntu-z-systems: | |
status: | In Progress → Fix Committed |
------- Comment From <email address hidden> 2021-01-08 15:26 EDT-------
I installed the focal qemu-system-s390x package (1:4.2-3ubuntu6.11) as well as the groovy package (1:5.0-5ubuntu9.3) via the -proposed repositories. In both cases, I tested using a vfio-pci passthrough device on a single guest and the same method described above (kexec in the guest) to trigger the subsystem reset event in QEMU -- With both of these QEMU versions, I verified that the PCI device is now appropriately available after the reset event.
Frank Heimes (fheimes) wrote : | #12 |
Many thx for the verification on both, focal and groovy!
I'm adjusting the tags accordingly.
tags: |
added: verification-done verification-done-focal verification-done-groovy removed: verification-needed verification-needed-focal verification-needed-groovy |
Launchpad Janitor (janitor) wrote : | #13 |
This bug was fixed in the package qemu - 1:5.0-5ubuntu9.3
---------------
qemu (1:5.0-5ubuntu9.3) groovy; urgency=medium
* d/p/ubuntu/
avoid PCI devices to become unavailable on reset (LP: #1907656)
* d/rules: fix qemu-user-static to really be static (LP: #1908331)
-- Christian Ehrhardt <email address hidden> Tue, 05 Jan 2021 15:46:16 +0100
Changed in qemu (Ubuntu Groovy): | |
status: | Fix Committed → Fix Released |
The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Launchpad Janitor (janitor) wrote : | #15 |
This bug was fixed in the package qemu - 1:4.2-3ubuntu6.11
---------------
qemu (1:4.2-3ubuntu6.11) focal; urgency=medium
* d/p/ubuntu/
avoid PCI devices to become unavailable on reset (LP: #1907656)
-- Christian Ehrhardt <email address hidden> Tue, 05 Jan 2021 15:52:00 +0100
Changed in qemu (Ubuntu Focal): | |
status: | Fix Committed → Fix Released |
Changed in ubuntu-z-systems: | |
status: | Fix Committed → Fix Released |
------- Comment From <email address hidden> 2021-01-19 04:52 EDT-------
IBM Bugzilla status-> closed, Fix Released with all requested distros
tags: |
added: targetmilestone-inin2104 removed: targetmilestone-inin--- |
This is in qemu 5.2 which I'm already working on for hirsute.
So -devel should be fixed soon (althrough testing on 5.2 will consume a few days).
Three questions for the following SRU as this was flagged for Focal and Groovy as well.
1. How urgent/severe is it, do we need to move heaven and earth to get this completed before the Christmas downtime or can this be SRU released in January?
2. I see you said for repro "kexec on an s390x guest with PCI devices". But I'm sure you already have a script and or guest xmls and whatever else that is related. Anything I don't have to come up from-scratch will make handling this faster.
3. Do I need any special HW and/or configuration to achieve "s390x guest with PCI devices" like real PCI ?!? - or is it enough to try to force e.g. virtio-net-pci in? Again sample XMls and commands will help.