[UBUNTU 21.04] s390x/s390-virtio-ccw: Reset PCI devices during subsystem reset

Bug #1907656 reported by bugproxy on 2020-12-10
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
High
Skipper Bug Screeners
qemu (Ubuntu)
Status tracked in Hirsute
Focal
Undecided
Unassigned
Groovy
Undecided
Unassigned
Hirsute
Undecided
Canonical Server Team

Bug Description

[Impact]

Symptom: PCI devices are unavailable after a subsystem reset
Problem: When a subsystem reset event occurs (e.g. via kexec) PCI
               devices are not being reset and are therefore in an
               unexpected state when the guest attempts to enable them
               again after the subsystem reset. This results in the

[Test Case]

# Prep a guest and wait until it booted
$ apt install uvtool-libvirt
$ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily "arch=s390x" label=daily "release=focal"
$ uvt-kvm create --disk 5 --password=ubuntu testguest release=focal arch=s390x label=daily
$ virsh console testguest
# lspci in guest shows nothing yet (expected)

# Add virtio device
$ cat > virtio-pci.xml << EOF
<interface type='network'>
<source network='default'/>
<model type='virtio'/>
<address type='pci'/>
<rom bar='off' file=''/>
</interface>
EOF
$ virsh attach-device testguest virtio-pci.xml

# lspci in guest now shows the device
ubuntu@testguest:~$ lspci
0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device

# verify that a "normal" reboot does not loose the device
ubuntu@testguest:~$ sudo reboot
...
ubuntu@testguest:~$ lspci
0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device

# Kexec into a kernel (can be the same)
ubuntu@testguest:~$ sudo apt install kexec-tools
ubuntu@testguest:~$ sudo kexec --load /boot/vmlinuz --initrd=/boot/initrd.img --append="$(cat /proc/cmdline)"
ubuntu@testguest:~$ sudo kexec --exec

# Log in and recheck lspci - it will be empty (wrong)
# With the Fix that will show the pci device again
ubuntu@testguest:~$ lspci

# Note: A Reboot will get the device back (in old and new case)

[Where problems could occur]

 * The patch is gladly small - it affects the list of devices that will
   be reset them. By extending this list obivously more devices will be
   reset - therefore the activity of a "subsystem_reset" will cover more
   devices.
   Regressions (let us hope not) would happen there. For example think
   there is a buggy PCI device that no one cared about before. Formerly it
   would not have been reset, but now it is. If that reset fails badly you
   have a regression.
   Fortunately PCI devices are still uncommon on s390x, so even if (I
   doubt) there is a regression it would affect a small fraqction of users
   only.
   These kind of resets happen on load (kexec, reboot, start) and that is
   the place to look out for regressions.

[Other Info]

 * n/a

---

Description: s390x/s390-virtio-ccw: Reset PCI devices during subsystem reset
Symptom: PCI devices are unavailable after a subsystem reset
Problem: When a subsystem reset event occurs (e.g. via kexec) PCI
               devices are not being reset and are therefore in an
               unexpected state when the guest attempts to enable them
               again after the subsystem reset. This results in the devices
               being unavailable to the guest until a reboot.
Solution: Add the s390 PCI host bridge to the list of devices to be
               reset during a subsystem reset event
Reproduction: kexec on an s390x guest with PCI devices

db08244a3a7e s390x/s390-virtio-ccw: Reset PCI devices during subsystem reset

This fix need to be applied to qemu for focal (20.04) and groovy (20.10).

Related branches

bugproxy (bugproxy) on 2020-12-10
tags: added: architecture-s39064 bugnameltc-190224 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → qemu (Ubuntu)
Frank Heimes (fheimes) on 2020-12-10
Changed in qemu (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Server Team (canonical-server)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
importance: Undecided → High
status: New → Triaged
tags: added: qemu-21.04

This is in qemu 5.2 which I'm already working on for hirsute.
So -devel should be fixed soon (althrough testing on 5.2 will consume a few days).

Three questions for the following SRU as this was flagged for Focal and Groovy as well.

1. How urgent/severe is it, do we need to move heaven and earth to get this completed before the Christmas downtime or can this be SRU released in January?

2. I see you said for repro "kexec on an s390x guest with PCI devices". But I'm sure you already have a script and or guest xmls and whatever else that is related. Anything I don't have to come up from-scratch will make handling this faster.

3. Do I need any special HW and/or configuration to achieve "s390x guest with PCI devices" like real PCI ?!? - or is it enough to try to force e.g. virtio-net-pci in? Again sample XMls and commands will help.

Changed in qemu (Ubuntu Hirsute):
status: New → In Progress

------- Comment From <email address hidden> 2020-12-10 10:05 EDT-------
(In reply to comment #10)
> This is in qemu 5.2 which I'm already working on for hirsute.
> So -devel should be fixed soon (althrough testing on 5.2 will consume a few
> days).
>
> Three questions for the following SRU as this was flagged for Focal and
> Groovy as well.
>
> 1. How urgent/severe is it, do we need to move heaven and earth to get this
> completed before the Christmas downtime or can this be SRU released in
> January?

No, this can be released in January.

>
> 2. I see you said for repro "kexec on an s390x guest with PCI devices". But
> I'm sure you already have a script and or guest xmls and whatever else that
> is related. Anything I don't have to come up from-scratch will make handling
> this faster.
>
> 3. Do I need any special HW and/or configuration to achieve "s390x guest
> with PCI devices" like real PCI ?!? - or is it enough to try to force e.g.
> virtio-net-pci in? Again sample XMls and commands will help.

The issue was originally hit with vfio-pci passthrough (which would indeed require special hardware), but can also be recreated using emulated devices such as virtio-net-pci. I just did so on focal using an XML entry that looks like this for a guest:

<interface type='network'>
<source network='default'/>
<model type='virtio'/>
<address type='pci'/>
<rom bar='off'/>
</interface>

I will also attach a minimal guest XML entry that I used, you should be able to re-use just this with a different boot disk.

I don't have a script, but the process to reproduce is short and straightforward:

1) start the guest with the virtio-net-pci device
2) In the guest, 'lspci' to view the available PCI devices, you should see:
0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
3) In the guest, kexec to a different kernel, example:
kexec -l /path/to/image --initrd=/path/to/initrd --append="$(cat /proc/cmdline)"
kexec -e
4) lspci after to view the available PCI devices - this time, there will be none listed; reboot of the guest is required to restore PCI devices. With the fix applied, lspci would show the same results as in step 2 above.

------- Comment (attachment only) From <email address hidden> 2020-12-10 10:07 EDT-------

Frank Heimes (fheimes) on 2020-12-10
Changed in ubuntu-z-systems:
status: Triaged → In Progress

Thank you, that contains all I need for the SRU later on!

The work on qemu 5.2 for hirsute will take a bit more time, but I've added the bug to the changelog.
So once it completes this bug will be auto-updated.

Changed in qemu (Ubuntu Groovy):
status: New → Triaged
Changed in qemu (Ubuntu Focal):
status: New → Triaged

FYI proper migration into 21.04 of qemu 5.2 is held back by systemd bug 1908259

Launchpad Janitor (janitor) wrote :
Download full text (5.2 KiB)

This bug was fixed in the package qemu - 1:5.2+dfsg-2ubuntu1

---------------
qemu (1:5.2+dfsg-2ubuntu1) hirsute; urgency=medium

  * Merge with Debian unstable
    - includes fix for CVE-2020-17380
    - includes a fix for s390x PCI device reset (LP: #1907656)
    Remaining changes:
    - qemu-kvm to systemd unit
      - d/qemu-kvm-init: script for QEMU KVM preparation modules, ksm,
        hugepages and architecture specifics
      - d/qemu-system-common.qemu-kvm.service: systemd unit to call
        qemu-kvm-init
      - d/qemu-system-common.install: install helper script
      - d/qemu-system-common.qemu-kvm.default: defaults for
        /etc/default/qemu-kvm
      - d/rules: call dh_installinit and dh_installsystemd for qemu-kvm
    - Distribution specific machine type (LP: 1304107 1621042)
      - d/p/ubuntu/define-ubuntu-machine-types.patch: distro machine types
      - d/qemu-system-x86.NEWS Info on fixed machine type definitions
        for host-phys-bits=true (LP: 1776189)
      - add an info about -hpb machine type in debian/qemu-system-x86.NEWS
      - provide pseries-bionic-2.11-sxxm type as convenience with all
        meltdown/spectre workarounds enabled by default. (LP: 1761372).
      - ubuntu-q35 alias added to auto-select the most recent q35 ubuntu type
    - Enable nesting by default
      - d/p/ubuntu/enable-svm-by-default.patch: Enable nested svm by default
        in qemu64 on amd
        [ No more strictly needed, but required for backward compatibility ]
    - improved dependencies
      - Make qemu-system-common depend on qemu-block-extra
      - Make qemu-utils depend on qemu-block-extra
      - let qemu-utils recommend sharutils
    - tolerate ipxe size change on migrations to >=18.04 (LP: 1713490)
      - d/p/ubuntu/pre-bionic-256k-ipxe-efi-roms.patch: old machine types
        reference 256k path
      - d/control-in: depend on ipxe-qemu-256k-compat-efi-roms to be able to
        handle incoming migrations from former releases.
    - d/control-in: Disable capstone disassembler library support (universe)
    - d/qemu-system-x86.README.Debian: add info about updated nesting changes
    - d/control*, d/rules: disable xen by default, but provide universe
      package qemu-system-x86-xen as alternative
      [includes compat links changes of 5.0-5ubuntu4]
    - allow qemu to load old modules post upgrade (LP 1847361)
      - Drop d/qemu-block-extra.*.in, d/qemu-system-gui.*.in
      - d/rules: Drop generating package version into maintainer scripts
      - d/qemu-system-gui.prerm: add no-op prerm to overcome upgrade issues on
        the bad old prerm (LP 1906245 1905377)
  * Dropped Changes:
    - d/control, d/rules: build with gcc-9 on armhf as workaround until
      resolved in gcc-10 (LP: 1890435) [it is flaky still, but no more 100%
      fails]
  * Added Changes:
    - Refreshed ubuntu machine types for hirsute@5.2
    - d/control: regenerated from d/control-in
    - d/p/ubuntu/lp-1907789-build-no-pie-is-no-functional-liker-flag.patch: fix
      ld usage of -no-pie (LP: #1907789)

qemu (1:5.2+dfsg-2) unstable; urgency=medium

  * move ui-opengl.so module from qemu-system-gui to qemu-system-common,
    as other ...

Read more...

Changed in qemu (Ubuntu Hirsute):
status: In Progress → Fix Released
description: updated

Uploaded and ready for the review by the SRU Team

Frank Heimes (fheimes) on 2021-01-06
Changed in qemu (Ubuntu Groovy):
status: Triaged → In Progress
Changed in qemu (Ubuntu Focal):
status: Triaged → In Progress

Hello bugproxy, or anyone else affected,

Accepted qemu into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:5.0-5ubuntu9.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Groovy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-groovy
Changed in qemu (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed-focal
Timo Aaltonen (tjaalton) wrote :

Hello bugproxy, or anyone else affected,

Accepted qemu into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:4.2-3ubuntu6.11 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Frank Heimes (fheimes) on 2021-01-08
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed

------- Comment From <email address hidden> 2021-01-08 15:26 EDT-------
I installed the focal qemu-system-s390x package (1:4.2-3ubuntu6.11) as well as the groovy package (1:5.0-5ubuntu9.3) via the -proposed repositories. In both cases, I tested using a vfio-pci passthrough device on a single guest and the same method described above (kexec in the guest) to trigger the subsystem reset event in QEMU -- With both of these QEMU versions, I verified that the PCI device is now appropriately available after the reset event.

Frank Heimes (fheimes) wrote :

Many thx for the verification on both, focal and groovy!
I'm adjusting the tags accordingly.

tags: added: verification-done verification-done-focal verification-done-groovy
removed: verification-needed verification-needed-focal verification-needed-groovy
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:5.0-5ubuntu9.3

---------------
qemu (1:5.0-5ubuntu9.3) groovy; urgency=medium

  * d/p/ubuntu/lp-1907656-s390x-s390-virtio-ccw-Reset-PCI-devices-during-subsy:
    avoid PCI devices to become unavailable on reset (LP: #1907656)
  * d/rules: fix qemu-user-static to really be static (LP: #1908331)

 -- Christian Ehrhardt <email address hidden> Tue, 05 Jan 2021 15:46:16 +0100

Changed in qemu (Ubuntu Groovy):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:4.2-3ubuntu6.11

---------------
qemu (1:4.2-3ubuntu6.11) focal; urgency=medium

  * d/p/ubuntu/lp-1907656-s390x-s390-virtio-ccw-Reset-PCI-devices-during-subsy:
    avoid PCI devices to become unavailable on reset (LP: #1907656)

 -- Christian Ehrhardt <email address hidden> Tue, 05 Jan 2021 15:52:00 +0100

Changed in qemu (Ubuntu Focal):
status: Fix Committed → Fix Released
Frank Heimes (fheimes) on 2021-01-18
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released

------- Comment From <email address hidden> 2021-01-19 04:52 EDT-------
IBM Bugzilla status-> closed, Fix Released with all requested distros

tags: added: targetmilestone-inin2104
removed: targetmilestone-inin---
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers