[UBUNTU 20.04] KVM hardware diagnose data improvements for guest kernel - qemu part

Bug #1953338 reported by bugproxy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
qemu (Ubuntu)
Fix Released
Undecided
Canonical Server
Focal
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:
==================

[Impact]

* Hardware diagnose data (diag 318) of KVM guest kernel cannot be handled.

* A fix is needed to enhance problem determination of guest kernel under KVM using DIAG 0x318 instruction execution.

* The s390x diagnose 318 instruction sets the control program name code (CPNC) and control program version code (CPVC) to provide useful information regarding the OS during debugging.

* The CPNC is explicitly set to 4 to indicate a Linux/KVM environment.

* The user story to this is, that s390x clearly is a platform that has
  virtualization for ages, and as part of that established diag calls to
  allow you add data to guests. This helps live management and/or guest
  debugging in case of problems. For KVM guests this data is
  wrong/incomplete so far and this is the fix for it.
  You might want to see [1] for the base-feature that this fixes. And do
  not say it is ugly, I did not send the PoP page about diags :-)

[1]: https://git.mentality.rip/OpenE2K/qemu-e2k/commit/fabdada9357b

[Fix]

* In general the following 9 patches (backports) are needed:

* "[PATCH 1/9] s390/sclp: get machine once during read scp/cpu info"
  https://launchpadlibrarian.net/581388471/0001-s390-sclp-get-machine-once-during-read-scp-cpu-info.patch

* "[PATCH 2/9] s390/sclp: rework sclp boundary checks"
  https://launchpadlibrarian.net/581388472/0002-s390-sclp-rework-sclp-boundary-checks.patch

* "[PATCH 3/9] s390/sclp: read sccb from mem based on provided length"
  https://launchpadlibrarian.net/581388474/0003-s390-sclp-read-sccb-from-mem-based-on-provided-lengt.patch

* "[PATCH 4/9] s390/sclp: check sccb len before filling in data"
  https://launchpadlibrarian.net/581388476/0004-s390-sclp-check-sccb-len-before-filling-in-data.patch

* "[PATCH 5/9] s390/sclp: use cpu offset to locate cpu entries"
  https://launchpadlibrarian.net/581389965/0005-s390-sclp-use-cpu-offset-to-locate-cpu-entries.patch

* "[PATCH 6/9] s390/sclp: add extended-length sccb support for kvm guest"
  https://launchpadlibrarian.net/581389970/0006-s390-sclp-add-extended-length-sccb-support-for-kvm-g.patch

* "[PATCH 7/9] s390: guest support for diagnose 0x318"
  https://launchpadlibrarian.net/581389974/0007-s390-guest-support-for-diagnose-0x318.patch

* "[PATCH 8/9] s390x: pv: Remove sclp boundary checks"
  https://launchpadlibrarian.net/581389981/0008-s390x-pv-Remove-sclp-boundary-checks.patch

* "[PATCH 9/9] s390x: pv: Fix diag318 PV fencing"
  https://launchpadlibrarian.net/581389982/0009-s390x-pv-Fix-diag318-PV-fencing.patch

[Test Case]

* Setup an IBM Z or LinuxONE LPAR with Ubuntu Server as KVM host.

* And setup an Ubuntu KVM virtual machine on top.

* It can then be observed if the CPNC (diag318 data) has been successfully set by looking at the s390dbf messages for the KVM guest.

* The CPNC will always be 4 (denotes Linux environment).

* Another way to test this is by running the sync_regs_test under tools/testing/selftests/kvm/s390x/sync_regs_test. Just running the kernel self test suite can trigger this.

* It is important that the patched qemu is tested with correct patched kernel, since one required the other - they can be found here:
  qemu: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4767
  kernel: https://launchpad.net/~fheimes/+archive/ubuntu/lp1953334

[Where problems could occur]

* The approach here is to provide additional debug and diagnose information on top.

* Hence even if the diag318 changes are broken, the existing functionality shouldn't be harmed.

* However, with the functional changes broken code could be introduced (e.g. due to erroneous pointer arithmetic for example) that does not compile or causes crashes. But this is what the test builds are for
qemu: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4767
(kernel: https://launchpad.net/~fheimes/+archive/ubuntu/lp1953334)

* On top the diag318 diagnose data might not properly provided - maybe empty or wrong. Again that is what the test builds and the verification later is targeted at.

[Other]

* LP#1953334 is related to this bug and covers the Kernel part.
__________

Hardware diagnose data (diag 318) of KVM guest kernel cannot be handled.
Fix needed to enhance problem determination of guest kernel under KVM

Solution provided by Collin:
All patches are provided to enable the DIAGNOSE 0x318 problem determination aid for a QEMU guest. Analogous KVM patches are required.

This solution required the prerequisite Extended-Length SCCB patches as well.

I've applied a bugfix related to resetting the diag318 to one of the patches (one line fix -- upstream conversation here: https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg03618.html)

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-195467 severity-high targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
affects: linux (Ubuntu) → qemu (Ubuntu)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in qemu (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Server Team (canonical-server)
Changed in ubuntu-z-systems:
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for the heads up,
Please let us know when the upstream discussion settled and there is a commit id we shall import.

Furthermore as usual if this shall go to older active releases I wanted to ask from your dev/testing what the best way to trigger/fake diag 318 for testing would be?

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-12-06 10:43 EDT-------
The DIAG 318 invocation has been present in the the Linux kernel for some time, so it's likely you can pick up any modern kernel release and run with it for testing. The only way we can easily observe if the CPNC (diag318 data) has been successfully set is by looking at the s390dbf messages for the KVM guest. The CPNC will always be 4 (denotes Linux environment).

Another way to check is by running the sync_regs_test under tools/testing/selftests/kvm/s390x/sync_regs_test. Just running the kernel self test suite can trigger this.

Lastly, I suppose if you were to create a userspace program that simply ran the instruction using whatever values you wanted, that could work. I have not done this myself, so I cannot offer much guidance with this route other than to suggest looking at function "setup_control_program_code" in arch/s390/kernel/setup.c

Revision history for this message
bugproxy (bugproxy) wrote : 0001-s390-sclp-get-machine-once-during-read-scp-cpu-info

------- Comment on attachment From <email address hidden> 2021-12-06 11:08 EDT-------

backported from commit: 912d70d2755cb9b3144eeed4014580ebc5485ce6

Comment: Re-attaching Collin's Patch (#1 of 8) as 'external' for sharing with Canonical LP

Revision history for this message
bugproxy (bugproxy) wrote : 0002-s390-sclp-rework-sclp-boundary-checks

------- Comment on attachment From <email address hidden> 2021-12-06 11:10 EDT-------

backported from commit: db13387ca01a69d870cc16dd232375c2603596f2

Comment: Re-attaching Collin's Patch (#2 of 8) as 'external' for sharing with Canonical LP

Revision history for this message
bugproxy (bugproxy) wrote : 0003-s390-sclp-read-sccb-from-mem-based-on-provided-length

------- Comment on attachment From <email address hidden> 2021-12-06 11:11 EDT-------

backported from commit: c1db53a5910f988eeb32f031c53a50f3373fd824

Comment: Re-attaching Collin's Patch (#3 of 8) as 'external' for sharing with Canonical LP

Revision history for this message
bugproxy (bugproxy) wrote : 0004-s390-sclp-check-sccb-len-before-filling-in-data

------- Comment on attachment From <email address hidden> 2021-12-06 11:12 EDT-------

Applied directly from upstream

Comment: Re-attaching Collin's Patch (#4 of 8) as 'external' for sharing with Canonical LP

Revision history for this message
bugproxy (bugproxy) wrote : 0005-s390-sclp-use-cpu-offset-to-locate-cpu-entries

------- Comment on attachment From <email address hidden> 2021-12-06 11:14 EDT-------

Applied directly from upstream

Comment: Re-attaching Collin's Patch (#5 of 8) as 'external' for sharing with Canonical LP

Revision history for this message
bugproxy (bugproxy) wrote : 0006-s390-sclp-add-extended-length-sccb-support-for-kvm-guest

------- Comment on attachment From <email address hidden> 2021-12-06 11:15 EDT-------

backported from commit: 1ecd6078f587cfadda8edc93d45b5072e35f2d17

Comment: Re-attaching Collin's Patch (#6 of 8) as 'external' for sharing with Canonical LP

Revision history for this message
bugproxy (bugproxy) wrote : 0007-s390-guest-support-for-diagnose-0x318

------- Comment on attachment From <email address hidden> 2021-12-06 11:16 EDT-------

backported from commit: fabdada9357b9cfd980c7744ddce47e34600bbef

Comment: Re-attaching Collin's Patch (#7 of 8) as 'external' for sharing with Canonical LP

Revision history for this message
bugproxy (bugproxy) wrote : 0008-s390-kvm-fix-diag318-propagation-and-reset-functiona

------- Comment on attachment From <email address hidden> 2021-12-06 11:17 EDT-------

backported from commit: fabdada9357b9cfd980c7744ddce47e34600bbef

Comment: Re-attaching Collin's Patch (#8 of 8) as 'external' for sharing with Canonical LP

Frank Heimes (fheimes)
Changed in qemu (Ubuntu):
status: Incomplete → New
Revision history for this message
Frank Heimes (fheimes) wrote :

I've made some patched test kernel available here:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1953334

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for the provided backports.
So far I was assuming this is a new feat for Jammy, but now no more.
Maybe we need to talk about what the target of this bug is - is it Focal and therefore the backport target here was 4.2 :-)

backport/origin upstream-hash patch
B 912d70d2 0001-s390-sclp-get-machine-once-during-read-scp-cpu-info.patch
B db13387c 0002-s390-sclp-rework-sclp-boundary-checks.patch
B c1db53a5 0003-s390-sclp-read-sccb-from-mem-based-on-provided-lengt.patch
O 0260b978 0004-s390-sclp-check-sccb-len-before-filling-in-data.patch
O 1a7a5688 0005-s390-sclp-use-cpu-offset-to-locate-cpu-entries.patch
B 1ecd6078 0006-s390-sclp-add-extended-length-sccb-support-for-kvm-g.patch
B fabdada9 0007-s390-guest-support-for-diagnose-0x318.patch
O e2c6cd56 0008-s390-kvm-fix-diag318-propagation-and-reset-functiona.patch

Of these 1,2,3,6,7 are in qemu 5.2 already.
4,5 were part of the same PR and also in 5.2, just didn't need backporting
Finally patch 8 came in a later PR, but still is part of 5.2.

Therefore I set this to fix released, but add a Focal task assuming (please confirm) that you wanted to target that.

Changed in qemu (Ubuntu):
status: New → Fix Released
Changed in qemu (Ubuntu Focal):
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-12-07 09:52 EDT-------
Backport target is focal.
I just complemented the title of this bugzilla / LP entry to make it more obvious

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

They might be backported to plain 4.2, but they do not apply cleanly to 1:4.2-3ubuntu6.19.

0001 & 0002:
The patch directly from upstream git applies, but your backport does not - so I just used cherry picks

0003:
Neither your backprt nor a cherry pick worked as-is.

Here things start to be rather different and instead of bad-backporting - and potentially even missing the point what the target was - I think i use this opportunity to ask.

1. What was this meant for - Focal?
2. For a backport to exactly what Focal has right see below:

I guess you do not want to think about Debian packaging too much, you can do so via (there are many ways, but this is at least one):

git clone -b ubuntu/focal-devel https://git.launchpad.net/ubuntu/+source/qemu
cd qemu
QUILT_PATCHES="debian/patches" quilt push --fuzz=0 -a
git add .
git commit -m "Current Focal base"

This will be 4.2 + all patches in Ubuntu for you to work with.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note: this is related to bug https://bugs.launchpad.net/bugs/1953334
It shares the motivation and the test steps.
Both fixes can exists along (no hard version requirement needed in he uploads) but only when both are together it can fully work.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Triaged
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Incomplete since we wait for the proper rebase for the backports as I asked in early December.

Changed in qemu (Ubuntu Focal):
status: Triaged → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : 0001-s390-sclp-get-machine-once-during-read-scp-cpu-info

------- Comment (attachment only) From <email address hidden> 2022-01-20 22:56 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : 0002-s390-sclp-rework-sclp-boundary-checks

------- Comment (attachment only) From <email address hidden> 2022-01-20 22:57 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : 0003-s390-sclp-read-sccb-from-mem-based-on-provided-lengt

------- Comment (attachment only) From <email address hidden> 2022-01-20 22:58 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : 0004-s390-sclp-check-sccb-len-before-filling-in-data

------- Comment (attachment only) From <email address hidden> 2022-01-20 22:58 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : 0005-s390-sclp-use-cpu-offset-to-locate-cpu-entries

------- Comment (attachment only) From <email address hidden> 2022-01-20 22:59 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : 0006-s390-sclp-add-extended-length-sccb-support-for-kvm-g

------- Comment (attachment only) From <email address hidden> 2022-01-20 22:59 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : 0007-s390-guest-support-for-diagnose-0x318

------- Comment (attachment only) From <email address hidden> 2022-01-20 23:00 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : 0008-s390x-pv-Remove-sclp-boundary-checks

------- Comment (attachment only) From <email address hidden> 2022-01-20 23:00 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : 0009-s390x-pv-Fix-diag318-PV-fencing

------- Comment (attachment only) From <email address hidden> 2022-01-20 23:00 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2022-01-20 23:16 EDT-------
New patches attached. All patches were based on top of the code pointed out by paelzer. Old patches were marked obsolete.

This series includes 9 patches, as opposed to the 8 from the previous series. This is due to presence of the Protected Virtualization (PV) code that was not present during my previous attempt. These patches address fixes introduced by my DIAGNOSE 318 patches that conflicted with the PV design.

Please note that the patch pointed out by https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg03618.html has been rolled into patch #7.

Looking back at previous comments, I'd like to emphasize that these patches were based on top of focal. If testing / backports are required for other versions, please let me know.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you for the backported patches,
With those I've created a test PPA for Focal at [1] (still building atm).

@Frank AFAIK you need that to test the current proposed kernel.
Once you can confirm me that this looks good we can prep and kick off the SRU process of this upload here as well.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4767

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - right now I have nothing else in the SRU queue for qemu.
The plan to handle this upload without being a "useless" download for any non-s390x users will be that we drive the SRU normally but set block-proposed. There it will be picked up by a subsequent qemu security update in not too much time.

@fheimes - could you add your SRU template content for this case here?

Frank Heimes (fheimes)
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Uploaded to focal-unapproved

tags: added: block-proposed
Changed in qemu (Ubuntu Focal):
status: Incomplete → In Progress
description: updated
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Brian Murray (brian-murray) wrote :

The block-proposed tags are release specific so I've added block-proposed-focal.

tags: added: block-proposed-focal
Changed in qemu (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted qemu into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:4.2-3ubuntu6.20 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:4.2-3ubuntu6.20)

All autopkgtests for the newly accepted qemu (1:4.2-3ubuntu6.20) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

initramfs-tools/0.136ubuntu6.6 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2022-02-02 03:21 EDT-------
(In reply to comment #48)
> All autopkgtests for the newly accepted qemu (1:4.2-3ubuntu6.20) for focal
> have finished running.
> The following regressions have been reported in tests triggered by the
> package:
>
> initramfs-tools/0.136ubuntu6.6 (amd64)
>
> Please visit the excuses page listed below and investigate the failures,
> proceeding afterwards as per the StableReleaseUpdates policy regarding
> autopkgtest regressions [1].
>
> https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/
> update_excuses.html#qemu
>
> [1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions
>
> Thank you!

Seems that thhe build has failed, https://launchpad.net/ubuntu/+source/qemu/1:4.2-3ubuntu6.20
I wasn't able to spot a buld log, where can it be found?

Revision history for this message
Frank Heimes (fheimes) wrote :

There was an issue with our builders yesterday,
restarting the same build today worked again - package is now there (at the PPA).

I did a quick test (since I had the env. still from the LP#1953334 verification.
And from what I understood it seems to be fine:$ sudo grep "cpnc to 4" /sys/kernel/debug/s390dbf/kvm-1479/sprintf
00 01643802097:936371 3 - 10 000000013fa231c4 00[0000200180000000-0000000031852d02]: setting cpnc to 4

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-02-02 07:03 EDT-------
(In reply to comment #50)
> There was an issue with our builders yesterday,
> restarting the same build today worked again - package is now there (at the
> PPA).
>
> I did a quick test (since I had the env. still from the LP#1953334
> verification.
> And from what I understood it seems to be fine:$ sudo grep "cpnc to 4"
> /sys/kernel/debug/s390dbf/kvm-1479/sprintf
> 00 01643802097:936371 3 - 10 000000013fa231c4
> 00[0000200180000000-0000000031852d02]: setting cpnc to 4

Perfect, thanks Frank!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:4.2-3ubuntu6.21

---------------
qemu (1:4.2-3ubuntu6.21) focal-security; urgency=medium

  * SECURITY UPDATE: crash or code exec in USB redirector device emulation
    - debian/patches/CVE-2021-3682.patch: fix free call in
      hw/usb/redirect.c.
    - CVE-2021-3682
  * SECURITY UPDATE: heap use-after-free in virtio_net_receive_rcu
    - debian/patches/CVE-2021-3748.patch: fix use after unmap/free for sg
      in hw/net/virtio-net.c.
    - CVE-2021-3748
  * SECURITY UPDATE: off-by-one error in mode_sense_page()
    - debian/patches/CVE-2021-3930.patch: MODE_PAGE_ALLS not allowed in
      MODE SELECT commands in hw/scsi/scsi-disk.c.
    - CVE-2021-3930
  * SECURITY UPDATE: NULL dereference in floppy disk emulator
    - debian/patches/CVE-2021-20196-1.patch: Extract
      blk_create_empty_drive() in hw/block/fdc.c.
    - debian/patches/CVE-2021-20196-2.patch: kludge missing floppy drive in
      hw/block/fdc.c.
    - CVE-2021-20196
  * SECURITY UPDATE: integer overflow in vmxnet3 NIC emulator
    - debian/patches/CVE-2021-20203.patch: validate configuration values
      during activate in hw/net/vmxnet3.c.
    - CVE-2021-20203

 -- Marc Deslauriers <email address hidden> Tue, 22 Feb 2022 12:44:44 -0500

Changed in qemu (Ubuntu Focal):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.