[UBUNTU 20.04] KVM hardware diagnose data improvements for guest kernel - kernel part

Bug #1953334 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Undecided
Canonical Kernel Team
Focal
Fix Released
Undecided
Canonical Kernel Team
Hirsute
Invalid
Undecided
Canonical Kernel Team
Impish
Fix Released
Undecided
Canonical Kernel Team
Jammy
Fix Released
Undecided
Canonical Kernel Team

Bug Description

SRU Justification:
==================

[Impact]

* Hardware diagnose data (diag 318) of KVM guest kernel cannot be handled.

* A fix is needed to enhance problem determination of guest kernel under KVM using DIAG 0x318 instruction execution.

* The s390x diagnose 318 instruction sets the control program name code (CPNC) and control program version code (CPVC) to provide useful information regarding the OS during debugging.

* The CPNC is explicitly set to 4 to indicate a Linux/KVM environment.

[Fix]

* In general the following 4 commits are needed:

* 3fd8417f2c728d810a3b26d7e2008012ffb7fd01 3fd8417f2c72 "KVM: s390: add debug statement for diag 318 CPNC data"
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1953334/+attachment/5545726/+files/0004-KVM-s390-add-debug-statement-for-diag-318-CPNC-data.patch

* 6cbf1e960fa52e4c63a6dfa4cda8736375b34ccc 6cbf1e960fa5 "KVM: s390: remove diag318 reset code"
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1953334/+attachment/5545725/+files/0003-KVM-s390-remove-diag318-reset-code.patch

* 23a60f834406c8e3805328b630d09d5546b460c1 23a60f834406 "s390/kvm: diagnose 0x318 sync and reset"
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1953334/+attachment/5545724/+files/0002-s390-kvm-diagnose-0x318-sync-and-reset.patch

* a23816f3cdcbffe5dc6e8c331914b3f51b87c2f3 a23816f3cdcb "s390/setup: diag 318: refactor struct"
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1953334/+attachment/5545723/+files/0001-s390-setup-diag-318-refactor-struct.patch

* For jammy, hirsute and impish only the first commit is needed, the others are already in.

* For focal all 4 commits are needed, but since they do not apply cleanly on focal, the attached backports need to be used instead.

[Test Case]

* Setup an IBM Z or LinuxONE LPAR with Ubuntu Server as KVM host.

* And setup an Ubuntu KVM virtual machine on top.

* It can then be observed if the CPNC (diag318 data) has been successfully set by looking at the s390dbf messages for the KVM guest.

* The CPNC will always be 4 (denotes Linux environment).

* Another way to test this is by running the sync_regs_test under tools/testing/selftests/kvm/s390x/sync_regs_test. Just running the kernel self test suite can trigger this.

[Where problems could occur]

* The approach here is to provide additional debug and diagnose information on top.

* Hence even if the diag318 changes are broken, the existing functionality shouldn't be harmed.

* The changes themselves are relatively discernible and mostly introduce new structures.

* However, with the functional changes broken code could be introduced (e.g. due to erroneous pointer arithmetic for example) that does not compile or causes crashes. But this is what the test builds are for (https://launchpad.net/~fheimes/+archive/ubuntu/lp1953334).

* On top the diag318 diagnose data might not properly provided - maybe empty or wrong. Again that is what the test builds and the verification later is targeted at.

* Since diag318 is s390x specific, all the modifications touch s390x code only. (in arch/s390/kvm/ kvm-s390.c and vsie.c, arch/s390/kernel/setup.c, arch/s390/include/asm/ kvm_host.h, kvm.h and diag.h). At least no other other architecture will be affected.

* Well, there is one tiny bit of a common code change, but it's just a new define statement in include/uapi/linux/kvm.h ('#define KVM_CAP_S390_DIAG318 186').

[Other]

* Request was to add the patches to focal / 20.04, but to avoid potential regressions on upgrades, the patches need to be added to jammy, impish and hirsute, too.

* As mentioned above, Jammy, Hirsute and Impish includes almost everything needed, except 3fd8417f2c72 "KVM: s390: add debug statement for diag 318 CPNC data".

* Hence the SRU is for Focal, Jammy, Hirsute and Impish, but less invasive for Jammy, Hirsute and Impish, also because commit 3fd8417f2c72 can be cleanly cherry-picked form there.

* LP#1953338 is related to this bug and covers the qemu/KVM bits.
__________

Hardware diagnose data (diag 318) of KVM guest kernel cannot be handled.
Fix needed to enhance problem determination of guest kernel under KVM

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-195465 severity-high targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
importance: Undecided → High
Revision history for this message
bugproxy (bugproxy) wrote : 0001-s390-setup-diag-318-refactor-struct Applied directly from upstream

------- Comment on attachment From <email address hidden> 2021-12-06 10:42 EDT-------

Re-attaching Collin's Patch as 'external' for sharing with Canonical

Revision history for this message
bugproxy (bugproxy) wrote : 0002-s390-kvm-diagnose-0x318-sync-and-reset backported from commit 23a60f834406c8e3805328b630d09d5546b460c1

------- Comment on attachment From <email address hidden> 2021-12-06 10:49 EDT-------

Comment: Re-attaching Collin's Patch (#2 of 4) as 'external' for sharing with Canonical

Revision history for this message
bugproxy (bugproxy) wrote : 0003-KVM-s390-remove-diag318-reset-code Applied directly from upstream

------- Comment on attachment From <email address hidden> 2021-12-06 10:51 EDT-------

Comment: Re-attaching Collin's Patch (#3 of 4) as 'external' for sharing with Canonical

Revision history for this message
bugproxy (bugproxy) wrote : 0004-KVM-s390-add-debug-statement-for-diag-318-CPNC-data Applied directly from upstream

------- Comment on attachment From <email address hidden> 2021-12-06 10:53 EDT-------

Comment: Re-attaching Collin's Patch (#4 of 4) as 'external' for sharing with Canonical

Revision history for this message
Frank Heimes (fheimes) wrote (last edit ):

I kicked-off a test kernel build with the above patches:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1953334
First of all the test kernel compile completed successfully.

But reading LP#1953338 one question is now whether any Extended-Length SCCB patches are required for full functionality on top of kernel 5.4 as well or not?

Revision history for this message
Frank Heimes (fheimes) wrote :

For focal SRUs we must ensure that all requested patches are also included in all versions newer than focal - this is to avoid any regressions that may occur after upgrades.

Hence I had to look-up the situation of the patches not only in focal (where the four backports apply cleanly), but also on hirsute (5.11), impish (5.13) and jammy (5.15) - and when the commits got upstream accepted.
1) a23816f3cdcb "s390/setup: diag 318: refactor struct" is upstream with v5.10-rc1 and newer
2) 23a60f834406 "s390/kvm: diagnose 0x318 sync and reset" is upstream with v5.10-rc1 and newer
3) 6cbf1e960fa5 "KVM: s390: remove diag318 reset code" is upstream with v5.10-rc6 and newer
4) but 3fd8417f2c72 "KVM: s390: add debug statement for diag 318 CPNC data" was just recently upstream accepted with v5.16-rc1 and newer.

Hence first three commits are already in hirsute, impish and jammy, BUT the last one is not, and needs to be applied on hirsute, impish and jammy on top of the work on focal.
I'm glad to see that 3fd8417f2c72 can be cleanly cherry-picked from hirsute, impish and jammy.

Frank Heimes (fheimes)
description: updated
Frank Heimes (fheimes)
summary: [UBUNTU 20.04] KVM hardware diagnose data improvements for guest kernel
- - kernel part
+ (diag 318)
summary: [UBUNTU 20.04] KVM hardware diagnose data improvements for guest kernel
- (diag 318)
+ - kernel part
Revision history for this message
Frank Heimes (fheimes) wrote :

SRU request submitted to the Ubuntu kernel team mailing list for impish, hirsute and focal:
https://lists.ubuntu.com/archives/kernel-team/2021-December/thread.html#126349
Changing status to 'In Progress' for impish, hirsute and focal - jammy patch requested on top.

Changed in ubuntu-z-systems:
status: New → In Progress
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
Changed in linux (Ubuntu Impish):
status: New → In Progress
Changed in linux (Ubuntu Jammy):
status: New → In Progress
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Impish):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Hirsute):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Focal):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Impish):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.13.0-24.24 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-impish' to 'verification-done-impish'. If the problem still exists, change the tag 'verification-needed-impish' to 'verification-failed-impish'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-impish
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.11.0-47.52 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.4.0-97.110 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Kelsey Skunberg (kelsey-skunberg) wrote :

Hi Frank, may you please verify this bug is resolved on the focal/hirsute/impish kernels in -proposed? Appreciate your help!

Revision history for this message
Frank Heimes (fheimes) wrote :

Hello Kelsey, sorry for the delay, I've noticed this verification request, but for verifying this another package based on LP#1953338 is needed, which became just available since yesterday.
Now with having both, a verification will be done soon-ish ...

Frank Heimes (fheimes)
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :
Frank Heimes (fheimes)
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :

for jammy it's incl. in kernel 5.15.0-18.18

Revision history for this message
Frank Heimes (fheimes) wrote :

Collin just confirmed that my results show the proper verification (and that the 'cpnc to 0' can happen on top). Hence adjusting the tags accordingly ...

tags: added: verification-done-focal verification-done-hirsute verification-done-impish
removed: verification-needed-focal verification-needed-hirsute verification-needed-impish
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.15.0-18.18

---------------
linux (5.15.0-18.18) jammy; urgency=medium

  * jammy/linux: 5.15.0-18.18 -proposed tracker (LP: #1958638)

  * CVE-2021-4155
    - xfs: map unwritten blocks in XFS_IOC_{ALLOC, FREE}SP just like fallocate

  * CVE-2022-0185
    - SAUCE: vfs: test that one given mount param is not larger than PAGE_SIZE

  * [UBUNTU 20.04] KVM hardware diagnose data improvements for guest kernel -
    kernel part (LP: #1953334)
    - KVM: s390: add debug statement for diag 318 CPNC data

  * OOB write on BPF_RINGBUF (LP: #1956585)
    - SAUCE: bpf: prevent helper argument PTR_TO_ALLOC_MEM to have offset other
      than 0

  * Miscellaneous Ubuntu changes
    - [Config] re-enable shiftfs
    - [SAUCE] shiftfs: support kernel 5.15
    - [Config] update toolchain versions

  * Miscellaneous upstream changes
    - vfs: fs_context: fix up param length parsing in legacy_parse_param

 -- Andrea Righi <email address hidden> Fri, 21 Jan 2022 13:32:27 +0100

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (31.9 KiB)

This bug was fixed in the package linux - 5.4.0-97.110

---------------
linux (5.4.0-97.110) focal; urgency=medium

  * icmp_redirect from selftests fails on F/kvm (unary operator expected)
    (LP: #1938964)
    - selftests: icmp_redirect: pass xfail=0 to log_test()

  * Focal: CIFS stable updates (LP: #1954926)
    - cifs: use the expiry output of dns_query to schedule next resolution
    - cifs: set a minimum of 120s for next dns resolution
    - cifs: To match file servers, make sure the server hostname matches

  * seccomp_bpf in seccomp from ubuntu_kernel_selftests failed to build on B-5.4
    (LP: #1896420)
    - SAUCE: selftests/seccomp: fix "storage size of 'md' isn't known" build issue
    - SAUCE: selftests/seccomp: Fix s390x regs not defined issue

  * system crash when removing ipmi_msghandler module (LP: #1950666)
    - ipmi: Move remove_work to dedicated workqueue
    - ipmi: msghandler: Make symbol 'remove_work_wq' static

  * zcrypt DD: Toleration for new IBM Z Crypto Hardware - (Backport to Ubuntu
    20.04) (LP: #1954680)
    - s390/AP: support new dynamic AP bus size limit

  * [UBUNTU 20.04] KVM hardware diagnose data improvements for guest kernel -
    kernel part (LP: #1953334)
    - s390/setup: diag 318: refactor struct
    - s390/kvm: diagnose 0x318 sync and reset
    - KVM: s390: remove diag318 reset code
    - KVM: s390: add debug statement for diag 318 CPNC data

  * Updates to ib_peer_memory requested by Nvidia (LP: #1947206)
    - SAUCE: RDMA/core: Updated ib_peer_memory

  * Include Infiniband Peer Memory interface (LP: #1923104)
    - IB: Allow calls to ib_umem_get from kernel ULPs
    - SAUCE: RDMA/core: Introduce peer memory interface

  * Focal update: v5.4.162 upstream stable release (LP: #1954834)
    - arm64: zynqmp: Do not duplicate flash partition label property
    - arm64: zynqmp: Fix serial compatible string
    - ARM: dts: NSP: Fix mpcore, mmc node names
    - scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()
    - arm64: dts: hisilicon: fix arm,sp805 compatible string
    - RDMA/bnxt_re: Check if the vlan is valid before reporting
    - usb: musb: tusb6010: check return value after calling
      platform_get_resource()
    - usb: typec: tipd: Remove WARN_ON in tps6598x_block_read
    - arm64: dts: qcom: msm8998: Fix CPU/L2 idle state latency and residency
    - arm64: dts: freescale: fix arm,sp805 compatible string
    - ASoC: SOF: Intel: hda-dai: fix potential locking issue
    - clk: imx: imx6ul: Move csi_sel mux to correct base register
    - ASoC: nau8824: Add DMI quirk mechanism for active-high jack-detect
    - scsi: advansys: Fix kernel pointer leak
    - firmware_loader: fix pre-allocated buf built-in firmware use
    - ARM: dts: omap: fix gpmc,mux-add-data type
    - usb: host: ohci-tmio: check return value after calling
      platform_get_resource()
    - ARM: dts: ls1021a: move thermal-zones node out of soc/
    - ARM: dts: ls1021a-tsn: use generic "jedec,spi-nor" compatible for flash
    - ALSA: ISA: not for M68K
    - tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
    - MIPS: sni: Fix the build
    - scsi: target: Fix ordered tag handling
    - scsi: target: Fix al...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (74.6 KiB)

This bug was fixed in the package linux - 5.13.0-28.31

---------------
linux (5.13.0-28.31) impish; urgency=medium

  * amd_sfh: Null pointer dereference on early device init causes early panic
    and fails to boot (LP: #1956519)
    - HID: amd_sfh: Fix potential NULL pointer dereference

  * impish: ddebs build take too long and times out (LP: #1957810)
    - [Packaging] enforce xz compression for ddebs

  * audio mute/ mic mute are not working on a HP machine (LP: #1955691)
    - ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook

  * rtw88_8821ce causes freeze (LP: #1927808)
    - rtw88: Disable PCIe ASPM while doing NAPI poll on 8821CE

  * alsa/sdw: fix the audio sdw codec parsing logic in the acpi table
    (LP: #1955686)
    - ALSA: hda: intel-sdw-acpi: harden detection of controller
    - ALSA: hda: intel-sdw-acpi: go through HDAS ACPI at max depth of 2

  * icmp_redirect from selftests fails on F/kvm (unary operator expected)
    (LP: #1938964)
    - selftests: icmp_redirect: pass xfail=0 to log_test()

  * Impish update: upstream stable patchset 2021-12-17 (LP: #1955180)
    - arm64: zynqmp: Do not duplicate flash partition label property
    - arm64: zynqmp: Fix serial compatible string
    - ARM: dts: sunxi: Fix OPPs node name
    - arm64: dts: allwinner: h5: Fix GPU thermal zone node name
    - arm64: dts: allwinner: a100: Fix thermal zone node name
    - staging: wfx: ensure IRQ is ready before enabling it
    - ARM: dts: NSP: Fix mpcore, mmc node names
    - scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()
    - arm64: dts: rockchip: Disable CDN DP on Pinebook Pro
    - arm64: dts: hisilicon: fix arm,sp805 compatible string
    - RDMA/bnxt_re: Check if the vlan is valid before reporting
    - bus: ti-sysc: Add quirk handling for reinit on context lost
    - bus: ti-sysc: Use context lost quirk for otg
    - usb: musb: tusb6010: check return value after calling
      platform_get_resource()
    - usb: typec: tipd: Remove WARN_ON in tps6598x_block_read
    - ARM: dts: ux500: Skomer regulator fixes
    - staging: rtl8723bs: remove possible deadlock when disconnect (v2)
    - ARM: BCM53016: Specify switch ports for Meraki MR32
    - arm64: dts: qcom: msm8998: Fix CPU/L2 idle state latency and residency
    - arm64: dts: qcom: ipq6018: Fix qcom,controlled-remotely property
    - arm64: dts: freescale: fix arm,sp805 compatible string
    - ASoC: SOF: Intel: hda-dai: fix potential locking issue
    - clk: imx: imx6ul: Move csi_sel mux to correct base register
    - ASoC: nau8824: Add DMI quirk mechanism for active-high jack-detect
    - scsi: advansys: Fix kernel pointer leak
    - ALSA: intel-dsp-config: add quirk for APL/GLK/TGL devices based on ES8336
      codec
    - firmware_loader: fix pre-allocated buf built-in firmware use
    - ARM: dts: omap: fix gpmc,mux-add-data type
    - usb: host: ohci-tmio: check return value after calling
      platform_get_resource()
    - ARM: dts: ls1021a: move thermal-zones node out of soc/
    - ARM: dts: ls1021a-tsn: use generic "jedec,spi-nor" compatible for flash
    - ALSA: ISA: not for M68K
    - tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
    - MIPS: sni:...

Changed in linux (Ubuntu Impish):
status: Fix Committed → Fix Released
Revision history for this message
Frank Heimes (fheimes) wrote :

Changing the "affects hirsute" entry to Invalid, since hirsute reached it's end of life on January the 20th.
With that all other releases in service are Fix Released and with that the project entry itself, hence closing as Fix Released.

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Invalid
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-ibm-5.4/5.4.0-1014.15~18.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Frank Heimes (fheimes) wrote :

This bug was for 20.04 GA kernel plain Ubuntu - hence verification on linux-ibm-5.4/5.4.0-1014 does not apply here.
However, I'm updating the tags to verification-done-bionic, just to unblock the process.

tags: added: verification-done-bionic
removed: verification-needed-bionic
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers