[UBUNTU 20.04] s390x/pci: fix linking between PF and VF for multifunction devices

Bug #1879704 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
High
Canonical Kernel Team
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
High
Canonical Kernel Team

Bug Description

SRU Justification:
==================

[Impact]

* It's currently not possible on s390x to verify the relationships between PFs and VFs of network interfaces (neither natively nor in libvirt).

* So s390x currently behaves differently here compared to other architectures, but shouldn't, since this is needed for proper management.

* The creation of not only the sysfs, but also the in-kernel link (struct pci_dev->physfn), solves this and on top allows the use of a common code path for disabling/shutdown of PFs.

* This code path is right now fenced off by the struct pci_dev->no_vf_scan flag of which s390x is currently the only user.

* This allows to gracefully and orderly shutdown VFs associated with a PF as triggered by '/sys/bus/pci/devices/<some_pf>/sriov_numvfs'

* Previously this could leave the card in an unresponsive error state.

[Fix]

* a1ceea67f2e5b73cebd456e7fb463b3052bc6344 a1ceea67f2e5 "PCI/IOV: Introduce pci_iov_sysfs_link() function"

* e5794cf1a270d813a5b9373a6876487d4d154195 e5794cf1a270 "s390/pci: create links between PFs and VFs"

[Test Case]

* Setup an s390x LPAR with at least one SR-IOV card and assign PF and VFs to that system.

* Determine if a device is a virtual function: for other architectures this is currently available in the file 'physfn' which is a link to the parent PF's device.

* Determine virtual functions of a physical function: for other architectures this is currently available as 'virtfn{index}' links under the PF device's directory.

* Determine the physical function of a virtual function: on x86 this is currently available in the file 'physfn' which is a link to the parent PF.

* This verification needs to be done by IBM on a system with SR-IOV (PCI-based) hardware.

[Regression Potential]

* There is a certain regression risk with having code changes in the PCI/IOV space,
  even is they are limited, especially is the patches touche common code.

* The changes in pci.h are very minimal, and the iov.c changes are traceable, too. All other modifications are s390x specific.

* Nevertheless, it could be that PCI hardware get harmed, here especially (SR-)IOV hardware.

* The patches got cross-company verified (IBM and Google).

* They were brought upstream and are currently tagged with 20200521, and are planned to be included in 5.8.

* A patched kernel was created based on a LP PPA and successfully tested by IBM.

[Other]

* Since the fix/patch is planned to be included in kernel v5.8, it will later automatically land in groovy.

* But because groovy is not there yet (5.8 is not yet out), this SRU got requested for focal and groovy.

* This SRU depends on the SRU from LP 1874056, and this has already two ACKs.
  So LP 1874056 needs to be applied before this one!
__________

As with other architectures, we must be able on s390x to verify the following relationships between PFs and VFs for proper management (including by libvirt) of network interfaces:

1. Determine if a device is a virtual function: for other architectures this is currently available in the file `physfn` which is a link to the parent PF's device.
2. Determine virtual functions of a physical function: for other architectures this is currently available as `virtfn{index}` links under the PF device's directory.
3. Determine the physical function of a virtual function: on x86 this is currently available in the file `physfn` which is a link to the parent PF

More details for the already existing parameters mentioned above can be found here: https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci

Moreover creating not just the sysfs but also the in-kernel link
(struct pci_dev->physfn) also allows us to use the common code path
for disabling/shutdown of PFs.
This code path is currently fenced off by the
struct pci_dev->no_vf_scan flag of which s390 is currently the only user.

This in turn allows for a graceful and orderly shutdown of VFs
associated with a VF as triggered by:

echo 0 > /sys/bus/pci/devices/<some_pf>/sriov_numvfs

Previously this could leave the card in an unresponsive error state.

The patches for this have been discussed and Acked-by the
responsible upstream maintainer here:

[RFC 0/2] Enable PF-VF linking with pdev->no_vf_scan (s390)
https://<email address hidden>/

[RFC 1/2] PCI/IOV: Introduce pci_iov_sysfs_link() function
https://<email address hidden>/

[RFC 2/2] s390/pci: create links between PFs and VFs
https://<email address hidden>/

They are currently queued to be posted to the public s390 Kernel
repository and linux-next / 5.8.
These depend on the previous multi-function/enumeration rework.

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-185929 severity-high targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Frank Heimes (fheimes) wrote :

This is a spin off of LP 1874056 - see comment #32 there.
More details will follow here soon.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in ubuntu-z-systems:
status: New → Incomplete
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Kernel Team (canonical-kernel-team)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-05-20 09:57 EDT-------
As with other architectures, we must be able to verify the following relationships between PFs and VFs for proper management (including by libvirt) of network interfaces:

1. Determine if a device is a virtual function: for other architectures this is currently available in the file `physfn` which is a link to the parent PF's device.
2. Determine virtual functions of a physical function: for other architectures this is currently available as `virtfn{index}` links under the PF device's directory.
3. Determine the physical function of a virtual function: on x86 this is currently available in the file `physfn` which is a link to the parent PF

More details for the already existing parameters mentioned above can be found here: https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci

Moreover creating not just the sysfs but also the in-kernel link
(struct pci_dev->physfn) also allows us to use the common code path
for disabling/shutdown of PFs.
This code path is currently fenced off by the
struct pci_dev->no_vf_scan flag of which s390 is currently the only user.

This in turn allows for a graceful and orderly shutdown of VFs
associated with a VF as triggered by:

echo 0 > /sys/bus/pci/devices/<some_pf>/sriov_numvfs

Previously this could leave the card in an unresponsive error state.

The patches for this have been discussed and Acked-by the
responsible upstream maintainer here:
https://<email address hidden>/

They are currently queued to be posted to the public s390 Kernel
repository and linux-next / 5.8.
These depend on the previous multi-function/enumeration rework.

Revision history for this message
Frank Heimes (fheimes) wrote : Re: [UBUNTU 20.04] s390x/pci: implement linking between PF and VF for multifunction devices

The content of comment #2 was copied over as bug description.

The patch set in preparation is:

[RFC 0/2] Enable PF-VF linking with pdev->no_vf_scan (s390)
https://<email address hidden>/

[RFC 1/2] PCI/IOV: Introduce pci_iov_sysfs_link() function
https://<email address hidden>/

[RFC 2/2] s390/pci: create links between PFs and VFs
https://<email address hidden>/

As already mentioned, target to upstream acceptance is kernel 5.8.

description: updated
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-05-20 11:37 EDT-------
And in the meantime this has also been added to
the public s390 repository on kernel.org which also
includes the later Reviewed-by from Pierre Morel.

PCI/IOV: Introduce pci_iov_sysfs_link() function

https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=a1ceea67f2e5b73cebd456e7fb463b3052bc6344

s390/pci: create links between PFs and VFs

https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=e5794cf1a270d813a5b9373a6876487d4d154195

I've also tested that the patches apply clean on
focal master-next + the patches from LP 1874056

Revision history for this message
Frank Heimes (fheimes) wrote : Re: [UBUNTU 20.04] s390x/pci: implement linking between PF and VF for multifunction devices

Thx for the update - yepp, found them in linux-next.
So should be possible to address them with the upcoming kernel SRU ...

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Changed in ubuntu-z-systems:
status: Incomplete → Triaged
Frank Heimes (fheimes)
description: updated
Frank Heimes (fheimes)
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

Test package(s) for s390x available at this PPA: https://launchpad.net/~fheimes/+archive/ubuntu/lp1879704 | ppa:fheimes/lp1879704

Frank Heimes (fheimes)
summary: - [UBUNTU 20.04] s390x/pci: implement linking between PF and VF for
+ [UBUNTU 20.04] s390x/pci: fix linking between PF and VF for
multifunction devices
description: updated
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-06-02 05:06 EDT-------
@Frank I can confirm that the kernel package from your PPA works as expected.
On a system with Mellanox PFs, I could successfully create the VFs and also see all the expected sysfs links and all playing nicely with the multi-function enumeration.

Thank you for the quick turn around!

Revision history for this message
Frank Heimes (fheimes) wrote :

Many thx Niklas for the quick PPA testing!

description: updated
Frank Heimes (fheimes)
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2020-June/thread.html#110753
Updating status to 'In Progress'.

Changed in linux (Ubuntu):
importance: Undecided → High
Changed in ubuntu-z-systems:
importance: Undecided → High
Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Frank Heimes (fheimes) wrote :

For further assessment of potential regression, here is a reference to the upstream discussion with upstream maintainer Bjorn: https://lkml.org/lkml/2020/5/6/837

and especially his statement (https://lkml.org/lkml/2020/5/6/1252):
>>This whole thing is not "introducing" any new functionality; it's "refactoring" to move existing functionality around and make it callable separately.<<

Frank Heimes (fheimes)
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-06-12 03:27 EDT-------
I just verified that this is working as expected with 5.4.0-38-generic from Focal Proposed. Thank you for the quick turn around and great cooperation!

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Many thanks for verifying. Adjusting tags.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (linux-oracle-5.4/5.4.0-1019.19~18.04.1)

All autopkgtests for the newly accepted linux-oracle-5.4 (5.4.0-1019.19~18.04.1) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

zfs-linux/unknown (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#linux-oracle-5.4

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (30.0 KiB)

This bug was fixed in the package linux - 5.4.0-40.44

---------------
linux (5.4.0-40.44) focal; urgency=medium

  * linux-oem-5.6-tools-common and -tools-host should be dropped (LP: #1881120)
    - [Packaging] Add Conflicts/Replaces to remove linux-oem-5.6-tools-common and
      -tools-host

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * Slow send speed with Intel I219-V on Ubuntu 18.04.1 (LP: #1802691)
    - e1000e: Disable TSO for buffer overrun workaround

  * CVE-2020-0543
    - UBUNTU/SAUCE: x86/speculation/srbds: do not try to turn mitigation off when
      not supported

  * Realtek 8723DE [10ec:d723] subsystem [10ec:d738] disconnects unsolicitedly
    when Bluetooth is paired: Reason: 23=IEEE8021X_FAILED (LP: #1878147)
    - SAUCE: Revert "UBUNTU: SAUCE: rtw88: Move driver IQK to set channel before
      association for 11N chip"
    - SAUCE: Revert "UBUNTU: SAUCE: rtw88: fix rate for a while after being
      connected"
    - SAUCE: Revert "UBUNTU: SAUCE: rtw88: No retry and report for auth and assoc"
    - SAUCE: Revert "UBUNTU: SAUCE: rtw88: 8723d: Add coex support"
    - rtw88: add a debugfs entry to dump coex's info
    - rtw88: add a debugfs entry to enable/disable coex mechanism
    - rtw88: 8723d: Add coex support
    - SAUCE: rtw88: coex: 8723d: set antanna control owner
    - SAUCE: rtw88: coex: 8723d: handle BT inquiry cases
    - SAUCE: rtw88: fix EAPOL 4-way failure by finish IQK earlier

  * CPU stress test fails with focal kernel (LP: #1867900)
    - [Config] Disable hisi_sec2 temporarily

  * Enforce all config annotations (LP: #1879327)
    - [Config]: do not enforce CONFIG_VERSION_SIGNATURE
    - [Config]: prepare to enforce all
    - [Config]: enforce all config options

  * Focal update: v5.4.44 upstream stable release (LP: #1881927)
    - ax25: fix setsockopt(SO_BINDTODEVICE)
    - dpaa_eth: fix usage as DSA master, try 3
    - net: don't return invalid table id error when we fall back to PF_UNSPEC
    - net: dsa: mt7530: fix roaming from DSA user ports
    - net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend
    - __netif_receive_skb_core: pass skb by reference
    - net: inet_csk: Fix so_reuseport bind-address cache in tb->fast*
    - net: ipip: fix wrong address family in init error path
    - net/mlx5: Add command entry handling completion
    - net: mvpp2: fix RX hashing for non-10G ports
    - net: nlmsg_cancel() if put fails for nhmsg
    - net: qrtr: Fix passing invalid reference to qrtr_local_enqueue()
    - net: revert "net: get rid of an signed integer overflow in
      ip_idents_reserve()"
    - net sched: fix reporting the first-time use timestamp
    - net/tls: fix race condition causing kernel panic
    - nexthop: Fix attribute checking for groups
    - r8152: support additional Microsoft Surface Ethernet Adapter variant
    - sctp: Don't add the shutdown timer if its already been added
    - sctp: Start shutdown on association restart if in SHUTDOWN-SENT state and
      socket is closed
    - tipc: block BH before using dst_cache
    - net/mlx5e: kTLS, Destroy key object after destroying the TIS
    - net/mlx5e: Fix inner tirs handling
    - net/m...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.4.0-42.46

---------------
linux (5.4.0-42.46) focal; urgency=medium

  * focal/linux: 5.4.0-42.46 -proposed tracker (LP: #1887069)

  * linux 4.15.0-109-generic network DoS regression vs -108 (LP: #1886668)
    - SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2 cgroups"

linux (5.4.0-41.45) focal; urgency=medium

  * focal/linux: 5.4.0-41.45 -proposed tracker (LP: #1885855)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * CVE-2019-19642
    - kernel/relay.c: handle alloc_percpu returning NULL in relay_open

  * CVE-2019-16089
    - SAUCE: nbd_genl_status: null check for nla_nest_start

  * CVE-2020-11935
    - aufs: do not call i_readcount_inc()

  * ip_defrag.sh in net from ubuntu_kernel_selftests failed with 5.0 / 5.3 / 5.4
    kernel (LP: #1826848)
    - selftests: net: ip_defrag: ignore EPERM

  * Update lockdown patches (LP: #1884159)
    - SAUCE: acpi: disallow loading configfs acpi tables when locked down

  * seccomp_bpf fails on powerpc (LP: #1885757)
    - SAUCE: selftests/seccomp: fix ptrace tests on powerpc

  * Introduce the new NVIDIA 418-server and 440-server series, and update the
    current NVIDIA drivers (LP: #1881137)
    - [packaging] add signed modules for the 418-server and the 440-server
      flavours

 -- Khalid Elmously <email address hidden> Thu, 09 Jul 2020 19:50:26 -0400

Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-07-28 06:05 EDT-------
IBM Bugzilla status-> closed. Fix Released with 20.04

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.