[UBUNTU 18.04] zpcictl --reset - contribution for kernel

Bug #1870320 reported by bugproxy on 2020-04-02
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Skipper Bug Screeners
linux (Ubuntu)
Status tracked in Groovy

Bug Description

SRU Justification:


* With zpci_disable() working, 'lockdep' detected a potential deadlock (in the s390x zPCI subsystem).

* The deadlock is between recovering a PCI function via the /sys/bus/pci/devices/<dev>/recover attribute vs powering it off via /sys/bus/pci/slots/<slot>/power.


* Backport 1: https://launchpadlibrarian.net/479554961/0001-s390-pci-Recover-handle-in-clp_set_pci_fn.patch

* Backport 2: https://launchpadlibrarian.net/478714295/0001-s390-pci-Fix-possible-deadlock-in-recover_store.patch

[Test Case]

* It's best to (re-)test using the kernel's locking validator, also known as 'lockdep'.

* Since this potential deadlock was identified by lockdep.

[Regression Potential]

* The regression potential can be considered as moderate, since:

* It is purely s390x specific code (arch/s390/include/asm/pci.h and arch/s390/pci/{pci.c,pci_clp.c,pci_sysfs.c}).

* It only affects the zPCI, the s390x specific PCI code layer.

* PCI cards available for s390x are optional cards (RoCE and zEDC) and not very wide-spread.

* The states between such a deadlock can happen (recover and power off) are non standard and usually undesired states.

* The patches are upstream accepted since 5.6 and already landed in eoan and focal.

[Other Info]

* Patches 17cdec960cf7 "s390/pci: Recover handle in clp_set_pci_fn()" and 576c75e36c68 "s390/pci: Fix possible deadlock in recover_store()" are upstream accepted since kernel 5.6, but they don't apply cleanly on bionic master-next, hence backports are needed:

* 0001-s390-pci-Recover-handle-in-clp_set_pci_fn.patch is a backport of 17cdec960cf776b20b1fb08c622221babe591d51 17cdec960cf7 "s390/pci: Recover handle in clp_set_pci_fn()"

* 0001-s390-pci-Fix-possible-deadlock-in-recover_store.patch is a backport of 576c75e36c689bec6a940e807bae27291ab0c0de 576c75e36c68 "s390/pci: Fix possible deadlock in recover_store()"

* Both patches/commits already landed in focal (with LP 1863768) and in eoan (with LP 1868324).

This Bug tracks the necessary backport for the Linux Kernel
to enable proper reset/recovery of PCI Functions in the error state.
There is a related fix to s390-tools but the relevant zpcictl command
is not part of Ubuntu 18.04

Upstream this includes the following commits:

In the Kernel:

17cdec960cf776b20b1fb08c622221babe591d51 s390/pci: Recover handle in clp_set_pci_fn()
Backport patch attached.

576c75e36c689bec6a940e807bae27291ab0c0de s390/pci: Fix possible deadlock in recover_store()
applies cleanly but for the second a small backport change is necessary.

These fixes are already in 20.04 but need also be applied to 18.04.

CVE References

Default Comment by Bridge

tags: added: architecture-s39064 bugnameltc-184736 severity-high targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes) on 2020-04-02
Changed in ubuntu-z-systems:
status: New → Triaged
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Kernel Team (canonical-kernel-team)
Po-Hsu Lin (cypressyew) on 2020-04-09
Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Bionic):
status: New → Confirmed
Frank Heimes (fheimes) on 2020-04-09
Changed in ubuntu-z-systems:
status: Triaged → Confirmed
Frank Heimes (fheimes) on 2020-05-06
Changed in linux (Ubuntu Eoan):
status: New → Fix Released
Changed in linux (Ubuntu Focal):
status: New → Fix Released
Frank Heimes (fheimes) on 2020-05-06
Changed in linux (Ubuntu Eoan):
status: Fix Released → New
Frank Heimes (fheimes) wrote :

This got fixed based on LP 1863768 for focal, and is with that also fixed in groovy.

And it also landed in eoan via the stable release update processed in LP 1868324.

Hence only bionic is open.

Changed in linux (Ubuntu Groovy):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Frank Heimes (fheimes) wrote :

I was able to apply the patches, but only in the order:
1) apply 0001-s390-pci-Recover-handle-in-clp_set_pci_fn.patch (based on 17cdec960cf7)
2) cherry-pick 576c75e36c68

However, the test compile (on focal, but that shouldn't matter) failed:

make[2]: Leaving directory '/home/ubuntu/bionic-lp1870320/debian/build/build-generic'
make[1]: *** [Makefile:146: sub-make] Error 2
make[1]: Leaving directory '/home/ubuntu/bionic-lp1870320'
make: *** [debian/rules.d/2-binary-arch.mk:50: /home/ubuntu/bionic-lp1870320/debian/stamps/stamp-build-generic] Error 2

due to:

  CC [M] fs/afs/inode.o
/home/ubuntu/bionic-lp1870320/arch/s390/pci/pci_clp.c: In function ‘clp_disable_fh’:
/home/ubuntu/bionic-lp1870320/arch/s390/pci/pci_clp.c:295:6: warning: unused variable ‘fh’ [-Wunused-variable]
  295 | u32 fh = zdev->fh;
      | ^~
  CC crypto/lzo.o
  CC security/keys/request_key.o
  CC arch/s390/pci/pci_sysfs.o
  CC security/smack/smack_lsm.o
  CC kernel/power/wakelock.o
  CC [M] fs/afs/main.o
  CC security/selinux/hooks.o
/home/ubuntu/bionic-lp1870320/arch/s390/pci/pci_sysfs.c: In function ‘recover_store’:
/home/ubuntu/bionic-lp1870320/arch/s390/pci/pci_sysfs.c:73:6: error: implicit declaration of function ‘pci_dev_is_added’; did you mean ‘pci_device_add’? [-Werror=implicit-function-declaration]
   73 | if (pci_dev_is_added(pdev)) {
      | ^~~~~~~~~~~~~~~~
      | pci_device_add
  CC crypto/rng.o
cc1: some warnings being treated as errors

Frank Heimes (fheimes) on 2020-05-07
Changed in linux (Ubuntu Eoan):
status: New → Fix Released

------- Comment From <email address hidden> 2020-05-07 09:31 EDT-------
Ok I can provide a backport.
pci_dev_is_added() was added in

44bda4b7d26e9 PCI: Fix is_added/is_busmaster race condition

It looks to me like we can just replace the call with "dev-is_added" but I'll have to test this.

Indeed, the first patch alone fixes the original problem, the second only fixes a racing deadlock that could happen when /sys/bus/pci/devices/<dev>/recover (which previously didn't work) collides with /sys/bus/pci/slots/<slot>/power.
This should be quite hard to hit but nevertheless I'd like this to get fixed as well.

------- Comment From <email address hidden> 2020-05-07 09:32 EDT-------
"dev->is_added" of course.

Frank Heimes (fheimes) wrote :

Compiling the patched bionic kernel on bionic spits out less warnings, but this error stays:

  CC kernel/power/snapshot.o
/home/ubuntu/bionic-lp1870320/arch/s390/pci/pci_clp.c: In function ‘clp_disable_fh’:
/home/ubuntu/bionic-lp1870320/arch/s390/pci/pci_clp.c:295:6: warning: unused variable ‘fh’ [-Wunused-variable]
  u32 fh = zdev->fh;
  CC arch/s390/pci/pci_sysfs.o
  CC mm/vmpressure.o
/home/ubuntu/bionic-lp1870320/arch/s390/pci/pci_sysfs.c: In function ‘recover_store’:
/home/ubuntu/bionic-lp1870320/arch/s390/pci/pci_sysfs.c:73:6: error: implicit declaration of function ‘pci_dev_is_added’; did you mean ‘pci_device_add’? [-Werror=implicit-function-declaration]
  if (pci_dev_is_added(pdev)) {
  CC kernel/power/swap.o

Frank Heimes (fheimes) wrote :

Hi Niklas, according to comment #4, yes a backport is needed - simple cherry-pick wasn't possible.

------- Comment on attachment From <email address hidden> 2020-05-07 11:53 EDT-------

Yes the pci_dev_is_added() function was definitely introduced after the bionic kernel was cut so that is expected. I've attached a backport patch and it compiles on top of bionic master-next, but I would still like to do the testing (need to test with a debug kernel too to see a possible deadlock detection). I'll be on holiday until wednesday next week but will then test this.

------- Comment on attachment From <email address hidden> 2020-05-13 08:45 EDT-------

I've removed an unused variable from the "s390/pci: Recover handle in clp_set_pci_fn" patch fixing a compiler warning.
Also I've tested both backported patches together now. There is no zpcictl on Ubuntu 18.04 but a manual
echo 1 > /sys/bus/pci/devices/<dev>/recover
works and will now also work after a deconfigure e.g. when using zpcictl compiled from source.

Frank Heimes (fheimes) on 2020-05-13
description: updated
Changed in linux (Ubuntu Bionic):
status: Confirmed → In Progress
Changed in ubuntu-z-systems:
status: Confirmed → In Progress
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted:
Updating status to 'In Progress'.

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Frank Heimes (fheimes) on 2020-05-15
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic

------- Comment From <email address hidden> 2020-05-20 09:09 EDT-------
Verified this is working as intended. Thank you!

Frank Heimes (fheimes) wrote :

Thx for the verification - updating the tags...

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (18.8 KiB)

This bug was fixed in the package linux - 4.15.0-106.107

linux (4.15.0-106.107) bionic; urgency=medium

  * CVE-2020-0543
    - SAUCE: x86/cpu: Add a steppings field to struct x86_cpu_id
    - SAUCE: x86/cpu: Add 'table' argument to cpu_matches()
    - SAUCE: x86/speculation: Add Special Register Buffer Data Sampling (SRBDS)
    - SAUCE: x86/speculation: Add SRBDS vulnerability and mitigation documentation
    - SAUCE: x86/speculation: Add Ivy Bridge to affected list

linux (4.15.0-103.104) bionic; urgency=medium

  * bionic/linux: 4.15.0-103.104 -proposed tracker (LP: #1881272)

  * "BUG: unable to handle kernel paging request" when testing
    ubuntu_kvm_smoke_test.kvm_smoke_test with B-KVM in proposed (LP: #1881072)
    - KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
    - KVM: VMX: Mark RCX, RDX and RSI as clobbered in vmx_vcpu_run()'s asm blob

linux (4.15.0-102.103) bionic; urgency=medium

  * bionic/linux: 4.15.0-102.103 -proposed tracker (LP: #1878856)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * debian/scripts/file-downloader does not handle positive failures correctly
    (LP: #1878897)
    - [Packaging] file-downloader not handling positive failures correctly

  * Kernel log flood "ceph: Failed to find inode for 1" (LP: #1875884)
    - ceph: don't check quota for snap inode
    - ceph: quota: cache inode pointer in ceph_snap_realm

  * [UBUNTU 18.04] zpcictl --reset - contribution for kernel (LP: #1870320)
    - s390/pci: Recover handle in clp_set_pci_fn()
    - s390/pci: Fix possible deadlock in recover_store()

  * Bionic update: upstream stable patchset 2020-05-12 (LP: #1878256)
    - drm/edid: Fix off-by-one in DispID DTD pixel clock
    - drm/qxl: qxl_release leak in qxl_draw_dirty_fb()
    - drm/qxl: qxl_release leak in qxl_hw_surface_alloc()
    - drm/qxl: qxl_release use after free
    - btrfs: fix block group leak when removing fails
    - btrfs: fix partial loss of prealloc extent past i_size after fsync
    - mmc: sdhci-xenon: fix annoying 1.8V regulator warning
    - mmc: sdhci-pci: Fix eMMC driver strength for BYT-based controllers
    - ALSA: hda/realtek - Two front mics on a Lenovo ThinkCenter
    - ALSA: hda/hdmi: fix without unlocked before return
    - ALSA: pcm: oss: Place the plugin buffer overflow checks correctly
    - PM: ACPI: Output correct message on target power state
    - PM: hibernate: Freeze kernel threads in software_resume()
    - dm verity fec: fix hash block number in verity_fec_decode
    - RDMA/mlx5: Set GRH fields in query QP on RoCE
    - RDMA/mlx4: Initialize ib_spec on the stack
    - vfio: avoid possible overflow in vfio_iommu_type1_pin_pages
    - vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()
    - iommu/qcom: Fix local_base status check
    - scsi: target/iblock: fix WRITE SAME zeroing
    - iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system
    - ALSA: opti9xx: shut up gcc-10 range warning
    - nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
    - dmaengine: dmatest: Fix iteration non-stop logic
    - selinux: properly handle multiple messages in ...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Frank Heimes (fheimes) on 2020-06-10
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-06-10 03:21 EDT-------
IBM Bugzilla closed, Fix Released with requested Distro

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers