[UBUNTU 20.04] zPCI attach/detach issues with PF/VF linking support

Bug #1892849 reported by bugproxy on 2020-08-25
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Skipper Bug Screeners
linux (Ubuntu)

Bug Description

SRU Justification:


* There is a zPCI attach/detach issue in combination with PF/VF linking support.

* On IBM Z with zPCI, VFs can be enabled/disabled individually with /sys/bus/pci/slots/<vf_fid>/power.

* If this was done with a VF that is linked to a parent PF, the PF symlink would become stale while the VF is disabled and
when turned back to the VF, it would not be properly linked back to the PF.

* The PF link is removed together with the whole VF directory.

* Hence for example qemu cannot be used since it relies on such links.

* Additionally there is a missing pci_dev_put() when searching for the parent PF. This potentially results in a reference count of the parent PFs that is becoming too high.


* 3cddb79afc60bcdb5fd9dd7a1c64a8d03bdd460f 3cddb79afc60 "s390/pci: fix zpci_bus_link_virtfn()"

* 2f0230b2f2d5fd287a85583eefb5aed35b6fe510 2f0230b2f2d5 "390/pci: re-introduce zpci_remove_device()"

* b97bf44f99155e57088e16974afb1f2d7b5287aa b97bf44f9915 "s390/pci: fix PF/VF linking on hot plug"

[Test Case]

* Assign a zPCI device, that is capable of handling PFs/VFs (like a RoCE adapter / Connect-X5) to a z15 or LinuxONE III LPAR (usually using the HMC).

* Enable/disable a VF with /sys/bus/pci/slots/<vf_fid>/power

* Try to pass it through to qemu.

* The test needs to be done at IBM due to the special hardware requirements.

[Regression Potential]

* A larger subset of the zPCI files in arch/s390/pci is touched (pci_bus.{h,c}, pci_bus.c, pci.c, s390_pci_hpc.c and pci_event.c), hence there is a certain risk for regressions.

* zPCI is the s390x-specific PCI implementation and (aot) less wide spread compared to the traditional CCW hardware on s390x - no code is touched outside of arch/s390/.

* The modifications are mainly in the IOV and hotplug area of zPCI.

* So SR-IOV, like RoCE adapters, may be harmed by bugs or issues with hot-plugging PCI hardware on s390x.

* But on the other hand side that is the area where these patches fix existing problems.

* In worst case PCI events can be impacted as well, that may harm control and communication

* or changes in base pci_bus/pci code may even break zPCI entirely.

* Right now regular RoCE adapters (like Connect-X5) are currently not handled as real SR-IOV VFs in zPCI, but are treated as normal PCI devices.

* Hence these zPCI SR-IOV setup changes currently apply to PFs with SR-IOV capability only, and those are currently not yet available to customers outside IBM.

* The modifications were tested at IBM in house and a patched Ubuntu kernel was created and shared for further testing and got successfully verified (LP 1892849, comment #3).


* This SRU depends on the SRU submitted for LP 1891437, which got already ACKed. So LP 1891437 is a prerequisite and needs to be applied before this one!

* The patches of the depending SRU and the ones here were successfully tested based on a patched Ubuntu test kernel (LP 1892849, comment #3).

* Since the above three patches got upstream accepted with 5.9-rc2, this SRU request is for Focal and Groovy.

Problem description:

When a NVMe drive is assigned/hotplugged to a Linux LPAR then
a bug is hit in lib/list_debug.c. And the device is not accessible, there is no /dev/ file
and lspci does not report it also.

[ 1681.564462] list_add double add: new=00000000eed0f808, prev=00000000eed0f808, next=000000004070a300.
[ 1681.564489] ------------[ cut here ]------------
[ 1681.564490] kernel BUG at lib/list_debug.c:31!
[ 1681.564504] monitor event: 0040 ilc:2 [#1] SMP
[ 1681.564507] Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter s390_trng ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 vfio_ccw sha1_s390 vfio_mdev mdev chsc_sch vfio_iommu_type1 eadm_sch vfio ip_tables dm_service_time nvme crc32_vx_s390 sha256_s390 sha_common nvme_core qeth_l2 zfcp qeth scsi_transport_fc qdio ccwgroup dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkey zcrypt
[ 1681.564534] CPU: 6 PID: 139 Comm: kmcheck Not tainted 5.8.0-rc1+ #2
[ 1681.564535] Hardware name: IBM 8561 T01 701 (LPAR)
[ 1681.564536] Krnl PSW : 0704c00180000000 000000003ffcadb8 (__list_add_valid+0x70/0xa8)
[ 1681.564544] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 1681.564545] Krnl GPRS: 0000000000000040 0000000000000027 0000000000000058 0000000000000007
[ 1681.564546] 000000003ffcadb4 0000000000000000 0000000000000000 000003e0051a7ce0
[ 1681.564547] 000000004070a300 00000000eed0f808 00000000eed0f808 000000004070a300
[ 1681.564548] 00000000f56a2000 0000000040c2c788 000000003ffcadb4 000003e0051a7bc8
[ 1681.564583] Krnl Code: 000000003ffcada8: c02000302b09 larl %r2,00000000405d03ba
                          000000003ffcadae: c0e5ffdd30b1 brasl %r14,000000003fb70f10
                         #000000003ffcadb4: af000000 mc 0,0
                         >000000003ffcadb8: b9040054 lgr %r5,%r4
                          000000003ffcadbc: c02000302aad larl %r2,00000000405d0316
                          000000003ffcadc2: b9040041 lgr %r4,%r1
                          000000003ffcadc6: c0e5ffdd30a5 brasl %r14,000000003fb70f10
                          000000003ffcadcc: af000000 mc 0,0
[ 1681.564592] Call Trace:
[ 1681.564594] [<000000003ffcadb8>] __list_add_valid+0x70/0xa8
[ 1681.564596] ([<000000003ffcadb4>] __list_add_valid+0x6c/0xa8)
[ 1681.564599] [<000000003faf2920>] zpci_create_device+0x60/0x1b0
[ 1681.564601] [<000000003faf704a>] zpci_event_availability+0x282/0x2f0
[ 1681.564605] [<0000000040367848>] chsc_process_crw+0x2b8/0xa18
[ 1681.564607] [<000000004036f35c>] crw_collect_info+0x254/0x348
[ 1681.564610] [<000000003fb2a6ea>] kthread+0x14a/0x168
[ 1681.564613] [<00000000403a55c0>] ret_from_fork+0x24/0x2c
[ 1681.564614] Last Breaking-Event-Address:
[ 1681.564618] [<000000003fb70f62>] printk+0x52/0x58
[ 1681.564620] ---[ end trace 7ea67c348aa67e14 ]---

Linux t83lp49.lnxne.boe 5.8.0-rc1+ #2 SMP Thu Jun 18 12:38:02 CEST 2020 s390x s390x s390x GNU/Linux

How to reproduce:
1. Unassign a NVMe drive in HMC from your LPAR
2. Reassign it to your LPAR again
3. dmesg

========================== Solution

The issue with VF attach/detach is with the fact that
on IBM Z VFs can be enabled/disabled individually using

echo 0 > /sys/bus/pci/slots/<vf_fid>/power

If this was done with a VF linked to a parent PF the
symlink in the parent (/sys/bus/pci/devices/<pf>/virtfnX)
would become stale while the VF is disabled and
when turned back on the VF would not get linked to the PF
again and so could not be used e.g. with QEMU which
relies on the links.
Similarly stale virtfn links could remain after
removing VFs through.

echo 0 > /sys/bus/pci/devices/<pf>/sriov_numvfs

Furthermore there was a missing pci_dev_put() when
searching for the parent PF potentially resulting
in a too high reference count of the parent PFs.

This has been fixed upstream and in 5.8 stable
with the following 3 upstream commits:

3cddb79afc60bcdb5fd9dd7a1c64a8d03bdd460f s390/pci: fix zpci_bus_link_virtfn()
2f0230b2f2d5fd287a85583eefb5aed35b6fe510 s390/pci: re-introduce zpci_remove_device()
b97bf44f99155e57088e16974afb1f2d7b5287aa s390/pci: fix PF/VF linking on hot plug

These should apply cleanly after applying

b76fee1bc56c31a9d2a49592810eba30cc06d61a s390/pci: ignore stale configuration request event

from Bug 1891437.

CVE References

bugproxy (bugproxy) on 2020-08-25
tags: added: architecture-s39064 bugnameltc-186885 severity-medium targetmilestone-inin20041
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)

------- Comment From <email address hidden> 2020-08-25 07:26 EDT-------
Please ignore the Stacktrace and entire first part of the Bug Description.
We originally had this as one Bug together with what is now
LaunchPad 1891437 and this was erroneously mirrored.

The Bug is entirely described in my previous comment.

Frank Heimes (fheimes) wrote :

I made builds of patched focal (master-next) kernel packages available here for further testing:

Changed in ubuntu-z-systems:
importance: Undecided → Medium
Frank Heimes (fheimes) on 2020-08-25
Changed in ubuntu-z-systems:
status: New → Triaged
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Frank Heimes (fheimes)
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-08-26 04:59 EDT-------
(In reply to comment #8)
> I made builds of patched focal (master-next) kernel packages available here
> for further testing:
> https://people.canonical.com/~fheimes/lp1891437+lp1892849/

I've tested this build as KVM host (LPAR), KVM guest and z/VM guest
with the usual PCI handling tests and of course also that the
behavior around disabling and reenabling a VF works as expected now.

Frank Heimes (fheimes) on 2020-08-26
description: updated
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted:
Updating status to 'In Progress'.

Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Groovy):
status: New → In Progress
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-09-01 04:30 EDT-------
Ok I've tested the proposed kernel and also looked at the sources in the focal Kernel repository. Sadly it looks like we missed adding the following upstream commit.

b76fee1bc56c31a9d2a49592810eba30cc06d61a s390/pci: ignore stale configuration request event

I mentioned this commit in my original request but erroneously stated that it was part of Bug 1891437. Sadly that is not correct as while that commit fixes a related issue it actually came a bit later. Would it be possible to still add it as part of this Bug or should I create a new one?

The problem description in this bug is actually fixed without that commit, and I verified this with the 5.4.0-46-generic kernel from proposed. The same is true for the problem described in 1891437 but it's closely related to both of them and I actually didn't expect the commits to even apply without it.
As with the others the missed commit is also CC stable of course.

Frank Heimes (fheimes) wrote :

Hi Niklas,
since the problem description in this bug is actually fixed with the kernel currently proposed, I will first of all adjust the tag accordingly and mark this as successfully verified (as you stated in #6).

For the additional commit, please open a separate ticket, since this one here was already addressed in a kernel SRU submission that belongs to an SRU cycle that is now closed. Therefore any additional commit needs to be handled in the next/upcoming SRU cycle. Thx

tags: added: verification-done-focal
removed: verification-needed-focal
Launchpad Janitor (janitor) wrote :
Download full text (42.6 KiB)

This bug was fixed in the package linux - 5.4.0-48.52

linux (5.4.0-48.52) focal; urgency=medium

  * focal/linux: 5.4.0-48.52 -proposed tracker (LP: #1894654)

  * mm/slub kernel oops on focal kernel 5.4.0-45 (LP: #1895109)
    - SAUCE: Revert "mm/slub: fix a memory leak in sysfs_slab_add()"

  * Packaging resync (LP: #1786013)
    - update dkms package versions
    - update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
    - [packaging] add signed modules for nvidia 450 and 450-server

  * [UBUNTU 20.04] zPCI attach/detach issues with PF/VF linking support
    (LP: #1892849)
    - s390/pci: fix zpci_bus_link_virtfn()
    - s390/pci: re-introduce zpci_remove_device()
    - s390/pci: fix PF/VF linking on hot plug

  * [UBUNTU 20.04] kernel: s390/cpum_cf,perf: changeDFLT_CCERROR counter name
    (LP: #1891454)
    - s390/cpum_cf, perf: change DFLT_CCERROR counter name

  * [UBUNTU 20.04] zPCI: Enabling of a reserved PCI function regression
    introduced by multi-function support (LP: #1891437)
    - s390/pci: fix enabling a reserved PCI function

  * CVE-2020-12888
    - vfio/type1: Support faulting PFNMAP vmas
    - vfio-pci: Fault mmaps to enable vma tracking
    - vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

  * [Hyper-V] VSS and File Copy daemons intermittently fails to start
    (LP: #1891224)
    - [Packaging] Bind hv_vss_daemon startup to hv_vss device
    - [Packaging] bind hv_fcopy_daemon startup to hv_fcopy device

  * alsa/hdmi: support nvidia mst hdmi/dp audio (LP: #1867704)
    - ALSA: hda - Rename snd_hda_pin_sense to snd_hda_jack_pin_sense
    - ALSA: hda - Add DP-MST jack support
    - ALSA: hda - Add DP-MST support for non-acomp codecs
    - ALSA: hda - Add DP-MST support for NVIDIA codecs
    - ALSA: hda: hdmi - fix regression in connect list handling
    - ALSA: hda: hdmi - fix kernel oops caused by invalid PCM idx
    - ALSA: hda: hdmi - preserve non-MST PCM routing for Intel platforms
    - ALSA: hda: hdmi - Keep old slot assignment behavior for Intel platforms
    - ALSA: hda - Fix DP-MST support for NVIDIA codecs

  * Focal update: v5.4.60 upstream stable release (LP: #1892899)
    - smb3: warn on confusing error scenario with sec=krb5
    - genirq/affinity: Make affinity setting if activated opt-in
    - genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()
    - PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context()
    - PCI: Add device even if driver attach failed
    - PCI: qcom: Define some PARF params needed for ipq8064 SoC
    - PCI: qcom: Add support for tx term offset for rev 2.1.0
    - btrfs: allow use of global block reserve for balance item deletion
    - btrfs: free anon block device right after subvolume deletion
    - btrfs: don't allocate anonymous block device for user invisible roots
    - btrfs: ref-verify: fix memory leak in add_block_entry
    - btrfs: stop incremening log_batch for the log root tree when syncing log
    - btrfs: remove no longer needed use of log_writers for the log root tree
    - btrfs: don't traverse into the seed devices in show_devname
    - btrfs: open device...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Frank Heimes (fheimes) wrote :

I just verified if the patches/commits are in groovy master-next and they are.
Hence updating the groovy entry to Fix Released as well - and with that the entire case.

Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Released
Changed in ubuntu-z-systems:
status: In Progress → Fix Released
Changed in linux (Ubuntu Groovy):
assignee: Frank Heimes (fheimes) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers