Reassign I/O Path of ConnectX-5 Port 1 before Port 2 causes NULL dereference
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu on IBM z Systems |
Fix Released
|
Low
|
Skipper Bug Screeners | ||
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Canonical Kernel Team | ||
Hirsute |
Fix Released
|
Undecided
|
Unassigned | ||
Impish |
Fix Released
|
Undecided
|
Unassigned | ||
Jammy |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
SRU Justification:
[Impact]
* After reassigning a PCHID of a ConnectX-5 based RoCE Adapter
from one physical LPAR to another,
running Ubuntu 20.04 with kernel 5.4 (latest),
a lifetime issue occurs.
* Subsequent testing on newer kernels now shows that a
NULL pointer dereference in the zPCI code happens (causing a hard crash)
that was previously hidden by leaking the struct pci_dev.
* For a more detailed root cause analysis, see the below original bug description.
[Fix]
The following three commits fix this issue in focal:
* upstream (since v5.12-rc4):
0b13525c20feb
backport: https:/
* upstream (since v5.14-rc7):
2a671f77ee49f
backport: https:/
* upstream (since v5.15-rc5):
a46044a92add6
backport: https:/
* Commit 0b13525c20fe fixes a lifetime issue of the struct pci_dev that was not released on removal,
commit 2a671f77ee49 fixes the 'NULL pointer dereference' (causing the hard crash) itself.
and commit a46044a92add fixes the handling of multiple events for a single reserve state transition of the device.
Without this, the NULL dereference can still be triggered as Reassign I/O Path causes a redudant second removal event.
* Since none of the three upstream commits does apply cleanly to focal master-next by just cherry-picking them
(mainly due to changes in the context), the above backports are needed.
[Test Case]
* Two z15 or LinuxONE III LPARs, one with a Connect-X5 based RoCE adapter attached.
* LPARs need to run Ubuntu 20.04 with kernel 5.4 to hit the lifetime issue
(that hides the also potential existing 'NULL pointer dereference') -
with Hirsute and kernel 5.11 the 'NULL pointer dereference' crash occurs.
* Now change the PCHID (physical channel identifier)
to a different one from the 2nd LPAR (at the HMC?).
* Verify if the reassignment worked properly (by checking the PCHID) and
monitor the kernel ring buffer dmesg (diagnostic messages) for
"Krnl PSW" crash (caused by NULL pointer)
(for more error details, please see below original bug description).
* Due hardware availability reasons (the ConnectX-5 cards are only used in special cases),
the testing needs to be done by IBM.
[Regression Potential / What can go wrong]
* What can go wrong with: 2a671f77ee49 "s390/pci: fix use after free of zpci_dev"
* The reference count to the struct zpci_dev got increased
while it is used by the PCI core.
This could cause a leak if not properly released.
* Hot-plug of there Connect-X5 devices could be broken on s390x entirely,
in case the new pointer handing is erroneous.
* This may even have an impact on "cold plug", too.
* Fortunately the modifications are quite minimal and thereby traceable,
* and affect /arch/s390/
hence are specific to the s390x platform only
and there again to "plugging" of zPCI devices.
* What can go wrong with: 0b13525c20fe "s390/pci: fix leak of PCI device structure"
* The function zpci_remove_device got expanded with an additional set_error argument,
and the internal flow got significantly changed.
In case handled in a wrong way, this may harm the entire remove/release logic.
* The calls of zpci_remove_device need to be adjusted (as part of the new arg),
failures here will most likely be identified at compile time.
* The initialization of the pci_dev struct got improved,
* and the flow in __zpci_
to reflect the device slot/bus remove characteristics.
However, issues here may lead again to general zpci hotplug removal issues.
* Fortunately all modifications are limited to s390x only (/arch/s390/*
and /drivers/
(and no ccw devices).
[Other]
* jammy, the current release in development, has all three commits included.
* impish and hirsute already incl. "s390/pci: fix leak of PCI device structure"
and "s390/pci: fix use after free of zpci_dev";
"s390/pci: fix zpci_zdev_put() on reserve" is tagged for upstream stable v5.14.x / v5.10.x
(see https:/
and since we pick up v5.14.x / v5.10.x for the Ubuntu hirsute and impish kernels,
it will arrive there via upstream stable.
__________
After reassign RoCE ConnectX-5 Card Pchid to another LPAR dmesg show under Ubuntu the following Error message
Ubuntu 20.04.01 with updates
oot@t35lp02:~# uname -a
Linux t35lp02.lnxne.boe 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 17:41:33 UTC 2021 s390x s390x s390x GNU/Linux
root@t35lp02:~#
DMESG Output
761.778422] mlx5_core 0018:00:00.1: poll_health:
[ 761.778432] mlx5_core 0018:00:00.1: print_health_
[ 761.778435] mlx5_core 0018:00:00.1: print_health_
[ 761.778437] mlx5_core 0018:00:00.1: print_health_
[ 761.778439] mlx5_core 0018:00:00.1: print_health_
[ 761.778442] mlx5_core 0018:00:00.1: print_health_
[ 761.778444] mlx5_core 0018:00:00.1: print_health_
[ 761.778447] mlx5_core 0018:00:00.1: print_health_
[ 761.778451] mlx5_core 0018:00:00.1: print_health_
[ 761.778454] mlx5_core 0018:00:00.1: print_health_
[ 761.778456] mlx5_core 0018:00:00.1: print_health_
[ 761.778460] mlx5_core 0018:00:00.1: print_health_
[ 761.778462] mlx5_core 0018:00:00.1: print_health_
[ 761.778465] mlx5_core 0018:00:00.1: print_health_
[ 761.778467] mlx5_core 0018:00:00.1: mlx5_trigger_
[ 763.179016] mlx5_core 0018:00:00.1: E-Switch: cleanup
[ 768.348431] mlx5_core 0018:00:00.1: mlx5_reclaim_
[ 768.348433] ------------[ cut here ]------------
[ 768.348434] FW pages counter is 43318 after reclaiming all pages
[ 768.348562] WARNING: CPU: 0 PID: 123 at drivers/
[ 768.348563] Modules linked in: s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel drm drm_panel_
[ 768.348586] CPU: 0 PID: 123 Comm: kmcheck Tainted: G W 5.4.0-80-generic #90-Ubuntu
[ 768.348586] Hardware name: IBM 8561 T01 703 (LPAR)
[ 768.348587] Krnl PSW : 0704c00180000000 000003ff808d33ac (mlx5_reclaim_
[ 768.348607] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 768.348608] Krnl GPRS: 0000000000000004 0000000000000006 0000000000000034 0000000000000007
[ 768.348608] 0000000000000007 00000000fcb4fa00 000000000000007b 000003e00458fafc
[ 768.348609] 00000000b7d406f0 000000004d849c00 00000000b7d00120 000000010000b6c0
[ 768.348610] 00000000f4cb1100 000003e00458fe70 000003ff808d33a8 000003e00458fa50
[ 768.348615] Krnl Code: 000003ff808d339c: c02000041043 larl %r2,000003ff809
[ 768.348622] Call Trace:
[ 768.348641] ([<000003ff808d
[ 768.348661] [<000003ff808c8
[ 768.348680] [<000003ff808c9
[ 768.348699] [<000003ff808c9
[ 768.348702] [<000000004d270
[ 768.348706] [<000000004d2f7
[ 768.348707] [<000000004d267
[ 768.348708] [<000000004d267
[ 768.348710] [<000000004cd36
[ 768.348713] [<000000004d382
[ 768.348714] [<000000004d389
[ 768.348716] [<000000004cd68
[ 768.348719] [<000000004d5a5
[ 768.348720] [<000000004d5a5
[ 768.348720] Last Breaking-
[ 768.348739] [<000003ff808d3
[ 768.348740] ---[ end trace 1056779ff3084977 ]---
[ 768.354255] pci 0018:00:00.1: Removing from iommu group 2
[ 768.359097] pci_bus 0018:00: busn_res: [bus 00] is released
[ 768.359122] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=B, anc=0, erc=0, rsid=0
root@t35lp02:~#
== Comment: #2 - <email address hidden>> - 2021-07-27 08:43:37 ==
Make an Update to Ubuntu 21.04 as mentioned with Niklas:
root@t35lp02:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 21.04
Release: 21.04
Codename: hirsute
root@t35lp02:~# ls -la
total 36
drwx------ 5 root root 4096 Jul 27 13:06 .
drwxr-xr-x 20 root root 4096 Jul 27 12:49 ..
-rw------- 1 root root 174 Jul 27 13:15 .bash_history
-rw-r--r-- 1 root root 3106 Dec 5 2019 .bashrc
drwx------ 2 root root 4096 Jul 27 12:58 .cache
-rw-r--r-- 1 root root 161 Dec 5 2019 .profile
drwxr-xr-x 3 root root 4096 Jul 27 12:58 snap
drwx------ 2 root root 4096 Jul 27 12:58 .ssh
-rw------- 1 root root 979 Jul 27 13:06 .viminfo
root@t35lp02:~# uname -a
Linux t35lp02.lnxne.boe 5.11.0-25-generic #27-Ubuntu SMP Fri Jul 9 18:40:37 UTC 2021 s390x s390x s390x GNU/Linux
root@t35lp02:~#
dmesg show the following Call Trace after reasign ConnectX-5 Ports to another LPAR
[ 232.218778] mlx5_core 0008:00:00.1: mlx5_wait_
[ 234.108700] mlx5_core 0008:00:00.1: E-Switch: cleanup
[ 234.281483] pci 0008:00:00.1: Removing from iommu group 1
[ 234.281510] ------------[ cut here ]------------
[ 234.281511] WARNING: CPU: 6 PID: 140 at arch/s390/
[ 234.281522] Modules linked in: s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core drm_panel_
[ 234.281573] CPU: 6 PID: 140 Comm: kmcheck Not tainted 5.11.0-25-generic #27-Ubuntu
[ 234.281575] Hardware name: IBM 8561 T01 703 (LPAR)
[ 234.281576] Krnl PSW : 0704c00180000000 00000000d2af2e92 (pcibios_
[ 234.281581] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 234.281582] Krnl GPRS: 000000000000001f 00000000ffffffff 00000000839dcc00 0000000000000000
[ 234.281584] 0000000000000000 0038008000000000 0000000000000000 0000038000000000
[ 234.281585] 00000000d43efef0 0000000000000006 0000000000000c00 00000000839f5298
[ 234.281586] 0000000083563300 0000000000000000 000003800475fad8 000003800475fa88
[ 234.281596] Krnl Code: 00000000d2af2e84: c0e5003c400e brasl %r14,00000000d3
[ 234.281605] Call Trace:
[ 234.281607] [<00000000d2af2
[ 234.281612] [<00000000d328f
[ 234.281620] [<00000000d335f
[ 234.281626] [<00000000d3260
[ 234.281632] [<00000000d32cf
[ 234.281637] [<00000000d328f
[ 234.281638] [<00000000d335f
[ 234.281640] [<00000000d3260
[ 234.281642] [<00000000d2af8
[ 234.281644] [<00000000d36c5
[ 234.281650] [<00000000d36cf
[ 234.281653] [<00000000d2b3c
[ 234.281657] [<00000000d3728
[ 234.281662] Last Breaking-
[ 234.281662] [<00000000d2af2
[ 234.281664] ---[ end trace c37123f53d0bbb72 ]---
[ 236.519017] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=B, anc=0, erc=0, rsid=0
root@t35lp02:~#
== Comment: #3 - <email address hidden>> - 2021-08-04 05:31:00 ==
The first dmesg on Ubuntu 20.04 looks like a Mellanox internal driver problem which if my memory serves me correctly has since been fixed in the Mellanox driver. As far as I can tell in the worst case this leaks a few pages.
The output on Ubuntu 21.04 turns out to be a NULL pointer dereference in zPCI code however that was previously hidden by us leaking the struct pci_dev.
I have analyzed the issue and can reproduce it on current development kernels, here is what I believe happens:
The backtrace shows a warning in
pcibios_
zpci_
pci_
which is
WARN_ON(
That however is a red herring, on this z15 machine with a ConnectX-5 we have
MIO support so should never even enter pci_iounmap_fh().
Adding a debug print in pcibios_
struct zpci_dev * we get to to_zpci(pdev) is NULL.
Digging a bit the problem is that during the detach PCI availability event
we call zpci_zdev_put() as the zdev went away. We already performed the
pci_stop_
after that the struct pci_dev refcount reaches 0 and will not be accessed anymore.
This is usually true, however here the problem is that we first removed
the PF for Port 1 while keeping the PF for Port 2.
Now the "struct pci_sriov" in pdev->sriov where pdev is the PF of the Port 2 has a field sturct pci_sriov::dev with the comment "/* Lowest numbered PF */". This field holds a reference to the struct pci_dev of the PF for Port 1 thus preventing
the refcount of that reaching 0 until the PF for Port 2 is released.
When the PF for Port 2 is released the refcount for the PF of Port 1 reaches 0
and only then do we get the call pci_release_dev() -> pcibios_
but at this point the struct zpci_dev was already released and zbus->functions
pointer NULLed when it was unregistered from the zbus.
Here is /sys/kernel/
root@t35lp47 ~ # cat /sys/kernel/
00 01627041292:161310 3 - 0007 0000000b3ffa6560 wb bit: 1
...
00 01627041292:165772 3 - 0007 0000000b3ffa2952 add fid:280, fh:2f80, c:1
00 01627041292:166083 3 - 0007 0000000b3ffa2952 add fid:2c0, fh:3002, c:1
...
00 01627041292:181187 3 - 0007 0000000b3ffa66aa ena fid:280, fh:a3002f80, rc:0
00 01627041292:181194 3 - 0007 0000000b3ffa6728 ena mio fid:280, fh:a3002f80, rc:0 <-- MIO enabled for PF of Port 1
00 01627041292:182176 3 - 0007 0000000b3ffa66aa ena fid:2c0, fh:a7003002, rc:0
00 01627041292:182181 3 - 0007 0000000b3ffa6728 ena mio fid:2c0, fh:a7003002, rc:0 <-- MIO enabled for PF of Port 2
....
00 01627041423:815382 3 - 0014 0000000b3ffa2c74 rem fid:280 <-- sturct zpci_dev for Port 1 released but no zpci_unmap_
00 01627041566:352720 3 - 0012 0000000b3ffa2362 zunmap: zdev:0000000000
00 01627041566:353274 3 - 0012 0000000b3ffa2362 zunmap: zdev:0000000088
00 01627041567:548896 3 - 0012 0000000b3ffa2c74 rem fid:2c0
Thus we have a definite bug in the coordination between the lifetimes of
struct zpci_dev and struct pci_dev where the former can outlive the latter.
I think the problem is that for the struct zpci_dev we only keep exactly one
reference owned by the zPCI core that gets released once the underlying zPCI
device goes away from the view of the zPCI core.
At the same time the struct pci_dev has its own reference counting and via
pdev holds its own reference to the struct zpci_dev (indirect via
pdev->sysdata which is a strict zpci_bus which holds a struct zpci_dev* for
all functions on the bus via zbus->functions
This reference is unaccounted for and can outlive the zPCI core's refrence
as seen in the above scenario.
== Comment: #4 - <email address hidden>> - 2021-08-24 09:00:03 ==
A fix for this has now landed upstream with the following commit:
2a671f77ee49 ("s390/pci: fix use after free of zpci_dev")
Note that this references an earlier commit that previously hid the issue
and has not yet been merged to Ubuntu 20.04's kernel but is included in 21.04
only with both fixes does the freeing of the struct pci_dev for correctly
in the tested case.
0b13525c20fe ("s390/pci: fix leak of PCI device structure")
tags: | added: architecture-s39064 bugnameltc-193748 severity-low targetmilestone-inin--- |
Changed in ubuntu: | |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
affects: | ubuntu → linux (Ubuntu) |
Changed in ubuntu-z-systems: | |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
Changed in linux (Ubuntu): | |
importance: | Undecided → Low |
Changed in ubuntu-z-systems: | |
importance: | Undecided → Low |
summary: |
- Reassign I/O Path of Mojave Port 1 before Port 2 causes NULL dereference + Reassign I/O Path of ConnectX-5 Port 1 before Port 2 causes NULL + dereference |
description: | updated |
Changed in linux (Ubuntu Impish): | |
status: | Fix Released → In Progress |
Changed in linux (Ubuntu Focal): | |
status: | Incomplete → In Progress |
description: | updated |
Changed in linux (Ubuntu Impish): | |
assignee: | Skipper Bug Screeners (skipper-screen-team) → nobody |
Changed in linux (Ubuntu Jammy): | |
assignee: | Skipper Bug Screeners (skipper-screen-team) → nobody |
importance: | Low → Undecided |
Changed in linux (Ubuntu Impish): | |
importance: | Low → Undecided |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Hirsute): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Focal): | |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
Changed in linux (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
Changed in ubuntu-z-systems: | |
status: | In Progress → Fix Committed |
tags: |
added: targetmilestone-inin20045 removed: targetmilestone-inin--- |
A patched kernel 5.11.0-35 for Hirsute / 21.04 was build and is available via the following PPA: /launchpad. net/~fheimes/ +archive/ ubuntu/ lp1943464/
https:/