Reassign I/O Path of ConnectX-5 Port 1 before Port 2 causes NULL dereference

Bug #1943464 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Low
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Canonical Kernel Team
Hirsute
Fix Released
Undecided
Unassigned
Impish
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:

[Impact]

* After reassigning a PCHID of a ConnectX-5 based RoCE Adapter
  from one physical LPAR to another,
  running Ubuntu 20.04 with kernel 5.4 (latest),
  a lifetime issue occurs.

* Subsequent testing on newer kernels now shows that a
  NULL pointer dereference in the zPCI code happens (causing a hard crash)
  that was previously hidden by leaking the struct pci_dev.

* For a more detailed root cause analysis, see the below original bug description.

[Fix]

The following three commits fix this issue in focal:

* upstream (since v5.12-rc4):
  0b13525c20febcfecccf6fc1db5969727401317d 0b13525c20fe "s390/pci: fix leak of PCI device structure"
  backport: https://launchpadlibrarian.net/566161494/0001-s390-pci-fix-leak-of-PCI-device-structure.patch

* upstream (since v5.14-rc7):
  2a671f77ee49f3e78997b77fdee139467ff6a598 2a671f77ee49 "s390/pci: fix use after free of zpci_dev"
  backport: https://launchpadlibrarian.net/566161496/0002-s390-pci-fix-use-after-free-of-zpci_dev.patch

* upstream (since v5.15-rc5):
  a46044a92add6a400f4dada7b943b30221f7cc80 a46044a92add "s390/pci: fix zpci_zdev_put() on reserve"
  backport: https://launchpadlibrarian.net/566161497/0003-s390-pci-fix-zpci_zdev_put-on-reserve.patch

* Commit 0b13525c20fe fixes a lifetime issue of the struct pci_dev that was not released on removal,
  commit 2a671f77ee49 fixes the 'NULL pointer dereference' (causing the hard crash) itself.
  and commit a46044a92add fixes the handling of multiple events for a single reserve state transition of the device.
  Without this, the NULL dereference can still be triggered as Reassign I/O Path causes a redudant second removal event.

* Since none of the three upstream commits does apply cleanly to focal master-next by just cherry-picking them
  (mainly due to changes in the context), the above backports are needed.

[Test Case]

* Two z15 or LinuxONE III LPARs, one with a Connect-X5 based RoCE adapter attached.

* LPARs need to run Ubuntu 20.04 with kernel 5.4 to hit the lifetime issue
  (that hides the also potential existing 'NULL pointer dereference') -
  with Hirsute and kernel 5.11 the 'NULL pointer dereference' crash occurs.

* Now change the PCHID (physical channel identifier)
  to a different one from the 2nd LPAR (at the HMC?).

* Verify if the reassignment worked properly (by checking the PCHID) and
  monitor the kernel ring buffer dmesg (diagnostic messages) for
  "Krnl PSW" crash (caused by NULL pointer)
  (for more error details, please see below original bug description).

* Due hardware availability reasons (the ConnectX-5 cards are only used in special cases),
  the testing needs to be done by IBM.

[Regression Potential / What can go wrong]

* What can go wrong with: 2a671f77ee49 "s390/pci: fix use after free of zpci_dev"

* The reference count to the struct zpci_dev got increased
  while it is used by the PCI core.
  This could cause a leak if not properly released.

* Hot-plug of there Connect-X5 devices could be broken on s390x entirely,
  in case the new pointer handing is erroneous.

* This may even have an impact on "cold plug", too.

* Fortunately the modifications are quite minimal and thereby traceable,

* and affect /arch/s390/pci/pci.c and arch/s390/pci/pci_bus.h only,
  hence are specific to the s390x platform only
  and there again to "plugging" of zPCI devices.

* What can go wrong with: 0b13525c20fe "s390/pci: fix leak of PCI device structure"

* The function zpci_remove_device got expanded with an additional set_error argument,
  and the internal flow got significantly changed.
  In case handled in a wrong way, this may harm the entire remove/release logic.

* The calls of zpci_remove_device need to be adjusted (as part of the new arg),
  failures here will most likely be identified at compile time.

* The initialization of the pci_dev struct got improved,

* and the flow in __zpci_event_availability carefully changed
  to reflect the device slot/bus remove characteristics.
  However, issues here may lead again to general zpci hotplug removal issues.

* Fortunately all modifications are limited to s390x only (/arch/s390/*
  and /drivers/pci/hotplug/s390*) obviously affect zpci devices only
  (and no ccw devices).

[Other]

* jammy, the current release in development, has all three commits included.

* impish and hirsute already incl. "s390/pci: fix leak of PCI device structure"
  and "s390/pci: fix use after free of zpci_dev";
  "s390/pci: fix zpci_zdev_put() on reserve" is tagged for upstream stable v5.14.x / v5.10.x
  (see https://lore.<email address hidden>/)
  and since we pick up v5.14.x / v5.10.x for the Ubuntu hirsute and impish kernels,
  it will arrive there via upstream stable.
__________

After reassign RoCE ConnectX-5 Card Pchid to another LPAR dmesg show under Ubuntu the following Error message

Ubuntu 20.04.01 with updates

oot@t35lp02:~# uname -a
Linux t35lp02.lnxne.boe 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 17:41:33 UTC 2021 s390x s390x s390x GNU/Linux
root@t35lp02:~#

DMESG Output

  761.778422] mlx5_core 0018:00:00.1: poll_health:715:(pid 0): Fatal error 1 detected
[ 761.778432] mlx5_core 0018:00:00.1: print_health_info:381:(pid 0): assert_var[0] 0xffffffff
[ 761.778435] mlx5_core 0018:00:00.1: print_health_info:381:(pid 0): assert_var[1] 0xffffffff
[ 761.778437] mlx5_core 0018:00:00.1: print_health_info:381:(pid 0): assert_var[2] 0xffffffff
[ 761.778439] mlx5_core 0018:00:00.1: print_health_info:381:(pid 0): assert_var[3] 0xffffffff
[ 761.778442] mlx5_core 0018:00:00.1: print_health_info:381:(pid 0): assert_var[4] 0xffffffff
[ 761.778444] mlx5_core 0018:00:00.1: print_health_info:384:(pid 0): assert_exit_ptr 0xffffffff
[ 761.778447] mlx5_core 0018:00:00.1: print_health_info:386:(pid 0): assert_callra 0xffffffff
[ 761.778451] mlx5_core 0018:00:00.1: print_health_info:389:(pid 0): fw_ver 65535.65535.65535
[ 761.778454] mlx5_core 0018:00:00.1: print_health_info:390:(pid 0): hw_id 0xffffffff
[ 761.778456] mlx5_core 0018:00:00.1: print_health_info:391:(pid 0): irisc_index 255
[ 761.778460] mlx5_core 0018:00:00.1: print_health_info:392:(pid 0): synd 0xff: unrecognized error
[ 761.778462] mlx5_core 0018:00:00.1: print_health_info:394:(pid 0): ext_synd 0xffff
[ 761.778465] mlx5_core 0018:00:00.1: print_health_info:396:(pid 0): raw fw_ver 0xffffffff
[ 761.778467] mlx5_core 0018:00:00.1: mlx5_trigger_health_work:696:(pid 0): new health works are not permitted at this stage
[ 763.179016] mlx5_core 0018:00:00.1: E-Switch: cleanup
[ 768.348431] mlx5_core 0018:00:00.1: mlx5_reclaim_startup_pages:562:(pid 123): FW did not return all pages. giving up...
[ 768.348433] ------------[ cut here ]------------
[ 768.348434] FW pages counter is 43318 after reclaiming all pages
[ 768.348562] WARNING: CPU: 0 PID: 123 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:567 mlx5_reclaim_startup_pages+0x12c/0x1c0 [mlx5_core]
[ 768.348563] Modules linked in: s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_service_time mlx5_ib ib_uverbs ib_core pkey qeth_l2 zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 mlx5_core sha256_s390 sha1_s390 sha_common tls mlxfw ptp pps_core zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[ 768.348586] CPU: 0 PID: 123 Comm: kmcheck Tainted: G W 5.4.0-80-generic #90-Ubuntu
[ 768.348586] Hardware name: IBM 8561 T01 703 (LPAR)
[ 768.348587] Krnl PSW : 0704c00180000000 000003ff808d33ac (mlx5_reclaim_startup_pages+0x12c/0x1c0 [mlx5_core])
[ 768.348607] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 768.348608] Krnl GPRS: 0000000000000004 0000000000000006 0000000000000034 0000000000000007
[ 768.348608] 0000000000000007 00000000fcb4fa00 000000000000007b 000003e00458fafc
[ 768.348609] 00000000b7d406f0 000000004d849c00 00000000b7d00120 000000010000b6c0
[ 768.348610] 00000000f4cb1100 000003e00458fe70 000003ff808d33a8 000003e00458fa50
[ 768.348615] Krnl Code: 000003ff808d339c: c02000041043 larl %r2,000003ff80955422
                          000003ff808d33a2: c0e5ffff70c7 brasl %r14,000003ff808c1530
                         #000003ff808d33a8: a7f40001 brc 15,000003ff808d33aa
                         >000003ff808d33ac: e330a5f04012 lt %r3,263664(%r10)
                          000003ff808d33b2: a784ffd7 brc 8,000003ff808d3360
                          000003ff808d33b6: b9140033 lgfr %r3,%r3
                          000003ff808d33ba: c0200004104e larl %r2,000003ff80955456
                          000003ff808d33c0: c0e5ffff70b8 brasl %r14,000003ff808c1530
[ 768.348622] Call Trace:
[ 768.348641] ([<000003ff808d33a8>] mlx5_reclaim_startup_pages+0x128/0x1c0 [mlx5_core])
[ 768.348661] [<000003ff808c8e14>] mlx5_function_teardown+0x44/0xa0 [mlx5_core]
[ 768.348680] [<000003ff808c95b0>] mlx5_unload_one+0x80/0x160 [mlx5_core]
[ 768.348699] [<000003ff808c9720>] remove_one+0x50/0xd0 [mlx5_core]
[ 768.348702] [<000000004d2704c0>] pci_device_remove+0x40/0xa0
[ 768.348706] [<000000004d2f724e>] device_release_driver_internal+0xee/0x1c0
[ 768.348707] [<000000004d267054>] pci_stop_bus_device+0x94/0xc0
[ 768.348708] [<000000004d267210>] pci_stop_and_remove_bus_device_locked+0x30/0x50
[ 768.348710] [<000000004cd36cbe>] __zpci_event_availability+0x26e/0x340
[ 768.348713] [<000000004d382794>] chsc_process_crw+0x2e4/0x300
[ 768.348714] [<000000004d389fd6>] crw_collect_info+0x276/0x340
[ 768.348716] [<000000004cd681e6>] kthread+0x126/0x160
[ 768.348719] [<000000004d5a568c>] ret_from_fork+0x28/0x30
[ 768.348720] [<000000004d5a5694>] kernel_thread_starter+0x0/0x10
[ 768.348720] Last Breaking-Event-Address:
[ 768.348739] [<000003ff808d33a8>] mlx5_reclaim_startup_pages+0x128/0x1c0 [mlx5_core]
[ 768.348740] ---[ end trace 1056779ff3084977 ]---
[ 768.354255] pci 0018:00:00.1: Removing from iommu group 2
[ 768.359097] pci_bus 0018:00: busn_res: [bus 00] is released
[ 768.359122] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=B, anc=0, erc=0, rsid=0
root@t35lp02:~#

== Comment: #2 - <email address hidden>> - 2021-07-27 08:43:37 ==
Make an Update to Ubuntu 21.04 as mentioned with Niklas:
root@t35lp02:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 21.04
Release: 21.04
Codename: hirsute
root@t35lp02:~# ls -la
total 36
drwx------ 5 root root 4096 Jul 27 13:06 .
drwxr-xr-x 20 root root 4096 Jul 27 12:49 ..
-rw------- 1 root root 174 Jul 27 13:15 .bash_history
-rw-r--r-- 1 root root 3106 Dec 5 2019 .bashrc
drwx------ 2 root root 4096 Jul 27 12:58 .cache
-rw-r--r-- 1 root root 161 Dec 5 2019 .profile
drwxr-xr-x 3 root root 4096 Jul 27 12:58 snap
drwx------ 2 root root 4096 Jul 27 12:58 .ssh
-rw------- 1 root root 979 Jul 27 13:06 .viminfo
root@t35lp02:~# uname -a
Linux t35lp02.lnxne.boe 5.11.0-25-generic #27-Ubuntu SMP Fri Jul 9 18:40:37 UTC 2021 s390x s390x s390x GNU/Linux
root@t35lp02:~#

dmesg show the following Call Trace after reasign ConnectX-5 Ports to another LPAR

[ 232.218778] mlx5_core 0008:00:00.1: mlx5_wait_for_pages:735:(pid 140): Skipping wait for vf pages stage
[ 234.108700] mlx5_core 0008:00:00.1: E-Switch: cleanup
[ 234.281483] pci 0008:00:00.1: Removing from iommu group 1
[ 234.281510] ------------[ cut here ]------------
[ 234.281511] WARNING: CPU: 6 PID: 140 at arch/s390/pci/pci.c:374 pcibios_release_device+0xfe/0x110
[ 234.281522] Modules linked in: s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core drm_panel_orientation_quirks ip_tables x_tables btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_service_time mlx5_ib ib_uverbs ib_core qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 mlx5_core prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common tls mlxfw ptp pps_core zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[ 234.281573] CPU: 6 PID: 140 Comm: kmcheck Not tainted 5.11.0-25-generic #27-Ubuntu
[ 234.281575] Hardware name: IBM 8561 T01 703 (LPAR)
[ 234.281576] Krnl PSW : 0704c00180000000 00000000d2af2e92 (pcibios_release_device+0x102/0x110)
[ 234.281581] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 234.281582] Krnl GPRS: 000000000000001f 00000000ffffffff 00000000839dcc00 0000000000000000
[ 234.281584] 0000000000000000 0038008000000000 0000000000000000 0000038000000000
[ 234.281585] 00000000d43efef0 0000000000000006 0000000000000c00 00000000839f5298
[ 234.281586] 0000000083563300 0000000000000000 000003800475fad8 000003800475fa88
[ 234.281596] Krnl Code: 00000000d2af2e84: c0e5003c400e brasl %r14,00000000d327aea0
                          00000000d2af2e8a: a7f4ffc3 brc 15,00000000d2af2e10
                         #00000000d2af2e8e: af000000 mc 0,0
                         >00000000d2af2e92: e5442006ffff mvhhi 6(%r2),-1
                          00000000d2af2e98: a7f4ffe2 brc 15,00000000d2af2e5c
                          00000000d2af2e9c: 0707 bcr 0,%r7
                          00000000d2af2e9e: 0707 bcr 0,%r7
                          00000000d2af2ea0: c00400000000 brcl 0,00000000d2af2ea0
[ 234.281605] Call Trace:
[ 234.281607] [<00000000d2af2e92>] pcibios_release_device+0x102/0x110
[ 234.281612] [<00000000d328fa42>] pci_release_dev+0x62/0xa0
[ 234.281620] [<00000000d335f918>] device_release+0x48/0xb0
[ 234.281626] [<00000000d32604d4>] kobject_put+0x174/0x2e0
[ 234.281632] [<00000000d32cfeee>] pci_iov_release+0x4e/0x70
[ 234.281637] [<00000000d328fa2e>] pci_release_dev+0x4e/0xa0
[ 234.281638] [<00000000d335f918>] device_release+0x48/0xb0
[ 234.281640] [<00000000d32604d4>] kobject_put+0x174/0x2e0
[ 234.281642] [<00000000d2af870a>] __zpci_event_availability+0x20a/0x350
[ 234.281644] [<00000000d36c5d60>] chsc_process_crw+0x2d0/0x2f0
[ 234.281650] [<00000000d36cff36>] crw_collect_info+0x276/0x340
[ 234.281653] [<00000000d2b3c4fa>] kthread+0x14a/0x170
[ 234.281657] [<00000000d3728028>] ret_from_fork+0x24/0x2c
[ 234.281662] Last Breaking-Event-Address:
[ 234.281662] [<00000000d2af2e2e>] pcibios_release_device+0x9e/0x110
[ 234.281664] ---[ end trace c37123f53d0bbb72 ]---
[ 236.519017] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=B, anc=0, erc=0, rsid=0
root@t35lp02:~#

== Comment: #3 - <email address hidden>> - 2021-08-04 05:31:00 ==
The first dmesg on Ubuntu 20.04 looks like a Mellanox internal driver problem which if my memory serves me correctly has since been fixed in the Mellanox driver. As far as I can tell in the worst case this leaks a few pages.

The output on Ubuntu 21.04 turns out to be a NULL pointer dereference in zPCI code however that was previously hidden by us leaking the struct pci_dev.

I have analyzed the issue and can reproduce it on current development kernels, here is what I believe happens:

The backtrace shows a warning in

pcibios_release_device()
   zpci_unmap_resources()
      pci_iounmap_fh()

which is

WARN_ON(!zpci_iomap_start[idx].count);

That however is a red herring, on this z15 machine with a ConnectX-5 we have
MIO support so should never even enter pci_iounmap_fh().

Adding a debug print in pcibios_release_device() it turns out that the
struct zpci_dev * we get to to_zpci(pdev) is NULL.

Digging a bit the problem is that during the detach PCI availability event
we call zpci_zdev_put() as the zdev went away. We already performed the
pci_stop_and_remove_bus_device_locked(pdev) before that point and assumed that
after that the struct pci_dev refcount reaches 0 and will not be accessed anymore.

This is usually true, however here the problem is that we first removed
the PF for Port 1 while keeping the PF for Port 2.

Now the "struct pci_sriov" in pdev->sriov where pdev is the PF of the Port 2 has a field sturct pci_sriov::dev with the comment "/* Lowest numbered PF */". This field holds a reference to the struct pci_dev of the PF for Port 1 thus preventing
the refcount of that reaching 0 until the PF for Port 2 is released.

When the PF for Port 2 is released the refcount for the PF of Port 1 reaches 0
and only then do we get the call pci_release_dev() -> pcibios_release_device()
but at this point the struct zpci_dev was already released and zbus->functions[devfn]
pointer NULLed when it was unregistered from the zbus.

Here is /sys/kernel/debug/s390dbf/pci_msg/sprintf output with the added debug print for my reproduction:
root@t35lp47 ~ # cat /sys/kernel/debug/s390dbf/pci_msg/sprintf
00 01627041292:161310 3 - 0007 0000000b3ffa6560 wb bit: 1
...
00 01627041292:165772 3 - 0007 0000000b3ffa2952 add fid:280, fh:2f80, c:1
00 01627041292:166083 3 - 0007 0000000b3ffa2952 add fid:2c0, fh:3002, c:1
...
00 01627041292:181187 3 - 0007 0000000b3ffa66aa ena fid:280, fh:a3002f80, rc:0
00 01627041292:181194 3 - 0007 0000000b3ffa6728 ena mio fid:280, fh:a3002f80, rc:0 <-- MIO enabled for PF of Port 1
00 01627041292:182176 3 - 0007 0000000b3ffa66aa ena fid:2c0, fh:a7003002, rc:0
00 01627041292:182181 3 - 0007 0000000b3ffa6728 ena mio fid:2c0, fh:a7003002, rc:0 <-- MIO enabled for PF of Port 2
....
00 01627041423:815382 3 - 0014 0000000b3ffa2c74 rem fid:280 <-- sturct zpci_dev for Port 1 released but no zpci_unmap_resources() called for it!
00 01627041566:352720 3 - 0012 0000000b3ffa2362 zunmap: zdev:0000000000000000 FID:0, mio:0 <-- zdev is NULL in pcibios_release_device() for Port 1 and we read the FH/MIO from somewhere in lowcore BAD!
00 01627041566:353274 3 - 0012 0000000b3ffa2362 zunmap: zdev:000000008803c000 FID:2c0, mio:1
00 01627041567:548896 3 - 0012 0000000b3ffa2c74 rem fid:2c0

Thus we have a definite bug in the coordination between the lifetimes of
struct zpci_dev and struct pci_dev where the former can outlive the latter.

I think the problem is that for the struct zpci_dev we only keep exactly one
reference owned by the zPCI core that gets released once the underlying zPCI
device goes away from the view of the zPCI core.

At the same time the struct pci_dev has its own reference counting and via
pdev holds its own reference to the struct zpci_dev (indirect via
pdev->sysdata which is a strict zpci_bus which holds a struct zpci_dev* for
all functions on the bus via zbus->functions[devfn]).

This reference is unaccounted for and can outlive the zPCI core's refrence
as seen in the above scenario.

== Comment: #4 - <email address hidden>> - 2021-08-24 09:00:03 ==
A fix for this has now landed upstream with the following commit:

2a671f77ee49 ("s390/pci: fix use after free of zpci_dev")

Note that this references an earlier commit that previously hid the issue
and has not yet been merged to Ubuntu 20.04's kernel but is included in 21.04
only with both fixes does the freeing of the struct pci_dev for correctly
in the tested case.

0b13525c20fe ("s390/pci: fix leak of PCI device structure")

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-193748 severity-low targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu):
importance: Undecided → Low
Changed in ubuntu-z-systems:
importance: Undecided → Low
Frank Heimes (fheimes)
summary: - Reassign I/O Path of Mojave Port 1 before Port 2 causes NULL dereference
+ Reassign I/O Path of ConnectX-5 Port 1 before Port 2 causes NULL
+ dereference
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote (last edit ):

A patched kernel 5.11.0-35 for Hirsute / 21.04 was build and is available via the following PPA:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1943464/

Changed in linux (Ubuntu Focal):
status: New → Incomplete
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
Changed in linux (Ubuntu Impish):
status: New → Fix Released
Revision history for this message
Frank Heimes (fheimes) wrote :

Impish (5.13) includes 0b13525c20fe from (regular) upstream and got 2a671f77ee49 as upstream stable update (via LP#1941789), hence is not affected.
Setting Impish to Fix Released.

Changed in ubuntu-z-systems:
status: New → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-10-27 04:49 EDT-------
There a new patch for this issue. The previous fixes still had an issue around the handling of multiple events for a single reserve state transition of the device.

The fix is the following upstream commit:

a46044a92add6a400f4dada7b943b30221f7cc80 s390/pci: fix zpci_zdev_put() on reserve

This fix has also been added to the v5.14.15 stable kernel where
we needed to pull in an additional trivial prerequisite commit:

Prerequisite: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.14.y&id=e27170d5f2fca8a42d610e0c87e52c41cfae7a21
Fix:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.14.y&id=4e5d794a2743f5a1e13ee1066ce37711f2696044

It has also been added to the v5.10.76 stable kernel, here the prerequisite
commit is *NOT* necessary since we don't have the refactoring of introducing
the zdev->has_resources attribute.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=188907c252188ac7a5ec3efcfa31dae1f47b8a20

I can of course provide separate backports if neither of the above backport approaches apply cleanly on the relevant Ubuntu kernels.

Revision history for this message
Frank Heimes (fheimes) wrote :

Thanks for the update, Niklas.
Ok, we need to address all this for all Ubuntu releases that are in service, up to the oldest that needs to be patched - and these are in this case impish, hirsute and focal

Since we pick the upstream stables v5.10 and v5.14 (since 5.13 reached EOL) for Ubuntu hirsute's 5.11 and for impish's 5.13, it looks like we don't need to SRU there and just wait until the kernel team has picked up the relevant upstream stable updates v5.14.15 respectively v5.10.76 - which is great. Only focal is left.

So I started over with a pristine focal master-next tree (git clone git://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal --branch master-next --single-branch)
but wasn't able to cherry-pick a46044a92add cleanly and get merge conflicts in
 both modified: arch/s390/include/asm/pci.h
 both modified: arch/s390/pci/pci.c
 both modified: arch/s390/pci/pci_event.c
I assumed that the pre-req. commit (that is needed in case the refactoring was done), is not part of the 5.4.x tree, hence not needed in this case.
So I (initially) didn't applied it (well, later I even tried with applying that pre-req. commit first, and the pre-req. commit itself applied cleanly, but no changes with the above merge conflicts while trying to cherry-pick the actual fix a46044a92add).

I think either some more pre-req. are needed for getting a46044a92add into focal's kernel 5.4x - or a backport of upstream a46044a92add is needed?!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-10-28 09:16 EDT-------
(In reply to comment #11)
> Thanks for the update, Niklas.
> Ok, we need to address all this for all Ubuntu releases that are in service,
> up to the oldest that needs to be patched - and these are in this case
> impish, hirsute and focal
>
> Since we pick the upstream stables v5.10 and v5.14 (since 5.13 reached EOL)
> for Ubuntu hirsute's 5.11 and for impish's 5.13, it looks like we don't need
> to SRU there and just wait until the kernel team has picked up the relevant
> upstream stable updates v5.14.15 respectively v5.10.76 - which is great.
> Only focal is left.
>
> So I started over with a pristine focal master-next tree (git clone
> git://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal
> --branch master-next --single-branch)
> but wasn't able to cherry-pick a46044a92add cleanly and get merge conflicts
> in
> both modified: arch/s390/include/asm/pci.h
> both modified: arch/s390/pci/pci.c
> both modified: arch/s390/pci/pci_event.c
> I assumed that the pre-req. commit (that is needed in case the refactoring
> was done), is not part of the 5.4.x tree, hence not needed in this case.
> So I (initially) didn't applied it (well, later I even tried with applying
> that pre-req. commit first, and the pre-req. commit itself applied cleanly,
> but no changes with the above merge conflicts while trying to cherry-pick
> the actual fix a46044a92add).
>
> I think either some more pre-req. are needed for getting a46044a92add into
> focal's kernel 5.4x - or a backport of upstream a46044a92add is needed?!

I think it's simplest if I provide a focal backport directly. Then I can also already do some testing for it.

Revision history for this message
Frank Heimes (fheimes) wrote :

That'll be great, thx!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-10-29 06:01 EDT-------
Ok, so for focal master-next we didn't yet have the earlier commits

2a671f77ee49 ("s390/pci: fix use after free of zpci_dev")
0b13525c20fe ("s390/pci: fix leak of PCI device structure")

so I had to backport those in addition to

a46044a92add s390/pci: fix zpci_zdev_put() on reserve

That said I tested first with the current focal kernel and similar to what I mentioned on the stable list for v5.10.x[0] the original scenario that caused the crash does not cause a crash there. I think this is due to the common code not
keeping a reference to function 0 around after it is removed. I guess this is
also why this was never seen during the initial multi-function support development. However even if I don't know how to trigger the problem on focal it is still there. Should some code keep a reference to the PCI device after removal we would run
into the same crash trying to access it. We also do still leak the PCI device
structure on removal without these patches.

[0] https://lore.<email address hidden>/

Revision history for this message
bugproxy (bugproxy) wrote : Focal: 0001-s390-pci-fix-leak-of-PCI-device-structure.patch

------- Comment (attachment only) From <email address hidden> 2021-10-29 06:02 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Focal: 0002-s390-pci-fix-use-after-free-of-zpci_dev.patch

------- Comment (attachment only) From <email address hidden> 2021-10-29 06:03 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Focal: 0003-s390-pci-fix-zpci_zdev_put-on-reserve.patch

------- Comment (attachment only) From <email address hidden> 2021-10-29 06:03 EDT-------

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Niklas, thanks for the backports.
I applied them on a focal kernel and triggered some build here:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1943464/

Regarding the test case, well, I think we at least need to think about a regression test case or so - since a test plan will be needed as part of the SRU Justification.

But anyway, I work on the SRU submission (and the justification) next week...

Frank Heimes (fheimes)
Changed in linux (Ubuntu Impish):
status: Fix Released → In Progress
Changed in linux (Ubuntu Focal):
status: Incomplete → In Progress
Frank Heimes (fheimes)
description: updated
Changed in linux (Ubuntu Impish):
assignee: Skipper Bug Screeners (skipper-screen-team) → nobody
Changed in linux (Ubuntu Jammy):
assignee: Skipper Bug Screeners (skipper-screen-team) → nobody
importance: Low → Undecided
Changed in linux (Ubuntu Impish):
importance: Low → Undecided
status: In Progress → Fix Committed
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :

SRU request submitted to the Ubuntu kernel team mailing list for focal:
https://lists.ubuntu.com/archives/kernel-team/2021-November/thread.html#125405
Changing status to 'In Progress' for and focal.

As mentioned in the SRU Justification, IBM is working on upstream stable,
that will eventually cover hirsute and impish.

Frank Heimes (fheimes)
Changed in linux (Ubuntu Focal):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.4.0-91.102 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-11-10 03:24 EDT-------
Somehow this hasn't made it to focal-proposed yet as far as I can tell. With proposed enabled the version of linux-generic I see is still:

linux-generic/focal-proposed,now 5.4.0.91.95 s390x [installed]

Revision history for this message
Frank Heimes (fheimes) wrote (last edit ):

Hi Niklas, well, I agree that this is a little bit confusing (and hit me, too).
There is a kernel meta package, and there are binary packages (e.g. linux-generic), and both may differ slightly in their last digit (build number), like in this case.

$ rmadison -a source linux | grep focal-proposed
 linux | 5.4.0-91.102 | focal-proposed | source
$ rmadison --arch=s390x linux-generic | grep focal-proposed
 linux-generic | 5.4.0.91.95 | focal-proposed | s390x

But you can safely ignore that last/fifth digit (here '102' and '95') - it is just important that the fourth digit is the same (here '91') - if that's the case, it's ensured that you picked the correct kernel.

In other words: having focal-proposed enabled, and installing linux-generic 5.4.0.91.95 will install the kernel that needs the verification.

Revision history for this message
bugproxy (bugproxy) wrote :

In other words: having /focal-proposed enabled, and installing linux-generic 5.4.0.91.95 will install the kernel that needs the verification.

------- Comment From <email address hidden> 2021-11-10 12:30 EDT-------
Ah ok didn't know that, thanks for the explanation.

I tried the test scenario with this kernel and actually saw the
behavior of the common PCI core keeping a reference to the removed
function 0 that I didn't see in some previous kernels. This could
also be due to other circumstances in testing but it does mean that
without this fix the kernel would likely have crashed in my test.
It did not and the /sys/kernel/debug/s390dbf/pci_msg/sprintf prints
added with the fix showed that it worked as designed.

In summary the fix is working in the proposed kernel and we do
definitely need it to prevent NULL pointer dereference and
subsequent kernel crash in the test scenario.

The "FW pages counter is 43318 after reclaiming all pages" warning is
still there as expected though. I did a quick search but I'm not sure
which upstream commit in the Mellanox driver would get rid of that too
but at least this doesn't lead to a panic.

Revision history for this message
Frank Heimes (fheimes) wrote :

Thank you Niklas for the verification on focal - I'm updating the tag accordingly...

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (12.1 KiB)

This bug was fixed in the package linux - 5.4.0-91.102

---------------
linux (5.4.0-91.102) focal; urgency=medium

  * focal/linux: 5.4.0-91.102 -proposed tracker (LP: #1949840)

  * Packaging resync (LP: #1786013)
    - [Packaging] update Ubuntu.md
    - debian/dkms-versions -- update from kernel-versions (main/2021.11.08)

  * KVM emulation failure when booting into VM crash kernel with multiple CPUs
    (LP: #1948862)
    - KVM: x86: Properly reset MMU context at vCPU RESET/INIT

  * aufs: kernel bug with apparmor and fuseblk (LP: #1948470)
    - SAUCE: aufs: bugfix, stop omitting path->mnt

  * ebpf: bpf_redirect fails with ip6 gre interfaces (LP: #1947164)
    - net: handle ARPHRD_IP6GRE in dev_is_mac_header_xmit()

  * require CAP_NET_ADMIN to attach N_HCI ldisc (LP: #1949516)
    - Bluetooth: hci_ldisc: require CAP_NET_ADMIN to attach N_HCI ldisc

  * ACL updates on OCFS2 are not revalidated (LP: #1947161)
    - ocfs2: fix remounting needed after setfacl command

  * ppc64 BPF JIT mod by 1 will not return 0 (LP: #1948351)
    - powerpc/bpf: Fix BPF_MOD when imm == 1

  * Drop "UBUNTU: SAUCE: cachefiles: Page leaking in
    cachefiles_read_backing_file while vmscan is active" (LP: #1947709)
    - Revert "UBUNTU: SAUCE: cachefiles: Page leaking in
      cachefiles_read_backing_file while vmscan is active"

  * Reassign I/O Path of ConnectX-5 Port 1 before Port 2 causes NULL dereference
    (LP: #1943464)
    - s390/pci: fix leak of PCI device structure
    - s390/pci: fix use after free of zpci_dev
    - s390/pci: fix zpci_zdev_put() on reserve

  * [SRU][F] USB: serial: pl2303: add support for PL2303HXN (LP: #1948377)
    - USB: serial: pl2303: add support for PL2303HXN
    - USB: serial: pl2303: fix line-speed handling on newer chips

  * Focal update: v5.4.151 upstream stable release (LP: #1947888)
    - tty: Fix out-of-bound vmalloc access in imageblit
    - cpufreq: schedutil: Use kobject release() method to free sugov_tunables
    - cpufreq: schedutil: Destroy mutex before kobject_put() frees the memory
    - usb: cdns3: fix race condition before setting doorbell
    - fs-verity: fix signed integer overflow with i_size near S64_MAX
    - hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary
      structure field
    - hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary
      structure field
    - hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary
      structure field
    - scsi: ufs: Fix illegal offset in UPIU event trace
    - mac80211: fix use-after-free in CCMP/GCMP RX
    - x86/kvmclock: Move this_cpu_pvti into kvmclock.h
    - drm/amd/display: Pass PCI deviceid into DC
    - ipvs: check that ip_vs_conn_tab_bits is between 8 and 20
    - hwmon: (mlxreg-fan) Return non-zero value when fan current state is enforced
      from sysfs
    - mac80211: Fix ieee80211_amsdu_aggregate frag_tail bug
    - mac80211: limit injected vht mcs/nss in ieee80211_parse_tx_radiotap
    - mac80211: mesh: fix potentially unaligned access
    - mac80211-hwsim: fix late beacon hrtimer handling
    - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
    - hwmon: (tmp421) report /P...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
bugproxy (bugproxy)
tags: added: targetmilestone-inin20045
removed: targetmilestone-inin---
Revision history for this message
Frank Heimes (fheimes) wrote :

Meanwhile:

commit "s390/pci: fix zpci_zdev_put() on reserve" landed as ce2a0d2c8fb5 in impish via upstream stable (it's in Ubuntu-5.13.0-23 and newer), hence all patches needed are included now and the impish entry can be updated to Fix Released.

and

commit "s390/pci: fix zpci_zdev_put() on reserve" landed as 7476a573c040 in hirsute via upstream stable (it's in Ubuntu-5.11.0-42.46 and newer), hence all patches needed are included now and the hirsute entry can be updated to Fix Released.

Finally the ticket is closed.

Changed in linux (Ubuntu Impish):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.