[UBUNTU 22.04] s390x system emulation of QEMU has random hangs

Bug #1981339 reported by bugproxy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Undecided
Skipper Bug Screeners
qemu (Ubuntu)
Fix Released
Undecided
Canonical Server
Jammy
Fix Released
Undecided
Christian Ehrhardt 

Bug Description

[Impact]

 * s390x emulation of interrupts on EX could result in endless loops

 * Backport a set of upstream fixes to Kinetic (as even 7.0 doesn't
   have it) and Jammy (6.2)

[Test Plan]

 * Get kernel build dependencies installed
   $ apt build-dep linux
 * Fetch a 5.19 kernel - just a tarball from kernel.org is enough
 * Build it for debug_defconfig
   $ make debug_defconfig
   $ make -j24
   # use arch/s390/boot/bzImage then

 * Then boot that in qemu tcg which will run a bunch of self tests
   and eventually fail as we won't give it a proper root disk.
   In the failing case ~50% of them will get stuck. With the fix
   applied it is expected to work fine and complete when running
   the following loop.

   $ for i in $(seq 120); do sudo qemu-system-s390x -machine s390-ccw-virtio,accel=tcg -nographic -kernel bzImage; echo "Loop $i passed"; done

[Where problems could occur]

 * Changes are limited to s390x TCG, so that is what we should watch out.
   This is gladly a very isolated use-case rather separate from most
   other things qemu does provide and therefore regressions (if any)
   should be easily mapped back to this.

[Other Info]

 * there still is bug 1980896 in -proposed. I'm uploading this one
   now, but expect the SRU team only to act accepting it to -proposed
   once the other one has moved to -updates.

--- original Problem Description---

QEMU system emulation of s390x sometimes hangs when running Linux. Turns out that interrupts on an EX instructions can result in endless loops.

Contact Information = <email address hidden>

These 4 patches are missing from TCG

https://git.qemu.org/?p=qemu.git;a=commit;h=b67b6c7ce4d56bb76a523eb63feb4a1978b05351
https://git.qemu.org/?p=qemu.git;a=commit;h=8ec2edac5f32117b523620a216638704d80bbed9
https://git.qemu.org/?p=qemu.git;a=commit;h=872e13796f732cfd65c4dc62bd2e4bbdbb4fa848
https://git.qemu.org/?p=qemu.git;a=commit;h=3d8111fd3bf7298486bcf1a72013b44c9044104e

Richard Henderson (4):
  target/s390x: Remove DISAS_GOTO_TB
  target/s390x: Remove DISAS_PC_STALE
  target/s390x: Remove DISAS_PC_STALE_NOCHAIN
  target/s390x: Exit tb after executing ex_value

Related branches

CVE References

bugproxy (bugproxy)
tags: added: architecture-all bugnameltc-198914 severity-medium targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Skipper Bug Screeners (skipper-screen-team)
affects: linux (Ubuntu) → qemu (Ubuntu)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you for the report, this isn't even in release v7.0 which I'm working on already.
I'll add them as backports to that.

Feature wise - how do you think of s390x emulation do we want/need that also backported to an LTS?
If so do you expect (I haven't checked) this to apply to and work with 6.2 or even earlier versions fine?

Changed in qemu (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Server (canonical-server)
status: New → Triaged
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2022-07-11 13:34 EDT-------
(In reply to comment #7)
> Thank you for the report, this isn't even in release v7.0 which I'm working
> on already.
> I'll add them as backports to that.
>
> Feature wise - how do you think of s390x emulation do we want/need that also
> backported to an LTS?
> If so do you expect (I haven't checked) this to apply to and work with 6.2
> or even earlier versions fine?

22.04 would be good (as we have seen the issue here)

Changed in qemu (Ubuntu Jammy):
status: New → Triaged
Changed in qemu (Ubuntu):
status: Triaged → In Progress
tags: added: qemu-22.10
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Christian o/,
thank you for the clarification - targeted for Kinetic -> Jammy now.

Added the changes to the preliminary 7.0 build in [1] on which I iterate my tests.

The bugs refer to a corrupted value and various DISAS_* changes. If you happen to have some test cases for this bug it would be great if you could try them out and report them here. Additionally - as you know - for the SRU we will need such test steps documented - so for the SRU this is even more important.

[1]: https://launchpad.net/~paelzer/+archive/ubuntu/qemu-7.0-kinetic

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-07-12 03:33 EDT-------
(In reply to comment #9)
> Hi Christian o/,
> thank you for the clarification - targeted for Kinetic -> Jammy now.
>
> Added the changes to the preliminary 7.0 build in [1] on which I iterate my
> tests.
>
> The bugs refer to a corrupted value and various DISAS_* changes. If you
> happen to have some test cases for this bug it would be great if you could
> try them out and report them here. Additionally - as you know - for the SRU
> we will need such test steps documented - so for the SRU this is even more
> important.
>
> [1]: https://launchpad.net/~paelzer/+archive/ubuntu/qemu-7.0-kinetic

The test is basically booting a recent kernel (5.19-rc) compiled debug_defconfig. Very often this hangs during boot.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've compared booting a normal Ubuntu 5.15.0-41-generic and a 5.19-rc6 with debug_defconfig under qemu 1:6.2+dfsg-2ubuntu6.3 on Jammy as reported .

sudo qemu-system-s390x -machine s390-ccw-virtio,accel=tcg -nographic -kernel bzImage

Just booting the normal Ubuntu kernel works until it fails (expected) on "VFS: Cannot open root device". All Tries were fine in TCG and KVM mode on all versions.

From there I stepped up as recommended to do the same with the 5.19+debug_defconfig.
I've run that with accel=kvm and accel=tcg in qemu 6.2/7.0/7.0+fix.

As expected accel=kvm worked just fine and fast (completed in <7.5 seconds usually).
As expected accel=tcg was slower even when working ~4 minutes.

Qemu 6.2 and 7.0 indeed seemed stuck with no further activity at some point (2/2 tests each - no response for >=60 seconds).
Trying the fixed PPA build with the referenced patches applied for the same worked fine - it wasn't much faster, but did get to the complete the boot.

The most interesting rate here is how often it got stuck.
In my manual testing with the non-fixed qemu I had 2/4 runs getting stuck.
With the fix it was good, but how much is enough to call it good?
Gladly the good case can run in a loop over night (never gets stuck).

The following is a simple ~8h test which should fully pass if fixed (and get stuck sooner or later if not fixed):
$ for i in $(seq 120); do sudo qemu-system-s390x -machine s390-ccw-virtio,accel=tcg -nographic -kernel bzImage; echo "Loop $i passed"; done

That will be fine as a testcase for the SRU and for now I can go on expecting that this upload will indeed fix the issue.

Note: all tests kvm and tcg mode can be done on the same s390x system

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Pre-tests completed - SRU template ready.
Will be uploaded for Kinetic as part of 7.0 and then later as SRU for Jammy.

description: updated
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (9.1 KiB)

This bug was fixed in the package qemu - 1:7.0+dfsg-7ubuntu1

---------------
qemu (1:7.0+dfsg-7ubuntu1) kinetic; urgency=medium

  * Merge with Debian unstable (LP: #1971315)(LP: #1980896), remaining changes:
    - qemu-kvm to systemd unit
      - d/qemu-kvm-init: script for QEMU KVM preparation modules, ksm,
        hugepages and architecture specifics
      - d/qemu-system-common.qemu-kvm.service: systemd unit to call
        qemu-kvm-init
      - d/qemu-system-common.install: install helper script
      - d/qemu-system-common.qemu-kvm.default: defaults for
        /etc/default/qemu-kvm
      - d/rules: call dh_installinit and dh_installsystemd for qemu-kvm
    - Distribution specific machine type
      (LP: 1304107 1621042 1776189 1761372 1761372 1776189)
      - d/p/ubuntu/define-ubuntu-machine-types.patch: define distro machine
        types containing release versioned machine attributes
      - d/qemu-system-x86.NEWS Info on fixed machine type defintions
        for host-phys-bits=true
      - Add an info about -hpb machine type in debian/qemu-system-x86.NEWS
      - ubuntu-q35 alias added to auto-select the most recent q35 ubuntu type
    - Enable nesting by default
      - d/p/ubuntu/enable-svm-by-default.patch: Enable nested svm by default
        in qemu64 on amd
        [ No more strictly needed, but required for backward compatibility ]
    - tolerate ipxe size change on migrations to >=18.04 (LP: 1713490)
      - d/p/ubuntu/pre-bionic-256k-ipxe-efi-roms.patch: old machine types
        reference 256k path
      - d/control-in: depend on ipxe-qemu-256k-compat-efi-roms to be able to
        handle incoming migrations from former releases.
    - d/qemu-system-x86.README.Debian: add info about updated nesting changes
    - Ease the use of module retention on upgrades (LP 1913421)
      - debian/qemu-block-extra.postinst: enable mount unit on install/upgrade
    - Fix I/O stalls when using NVMe storage (LP 1970737).
      - d/p/lp1970737-linux-aio-*.patch: Fix unbalanced plugged counter
        in laio_io_unplug.
    - SECURITY UPDATE: heap overflow in floppy disk emulator
      - debian/patches/CVE-2021-3507.patch: prevent end-of-track overrun in
        hw/block/fdc.c.
      - CVE-2021-3507
  * Dropped Changes [now part of 1:7.0+dfsg-7]:
    - d/rules: xen libexec dir is no more versioned
    - d/rules: ensure xen is built on x86
    - d/kvm-spice: fix when acceleration is already defined on the commandline
    - debian/control[-in]: no more disable glusterfs in Ubuntu (LP 1246924)
  * Dropped Changes [now part of upstream v7.0.0]
    - d/p/u/lp-1959984-s390x-ipl-support-extended-kernel-command-line-size.patch
      Allow long kernel command lines for QEMU (LP 1959984)
    - d/p/u/fix-virtiofsd-for-glibc2.35.patch: add rseq to seccomp allow list
    - d/p/u/tcg-Remove-dh_alias-indirection-for-dh_typecode.patch: fix 32bit
      tcg on s390x.
    - Fix diff handling on ceph that can cause data corruption (LP 1968258)
      - d/p/u/lp-1968258-block-rbd-fix-handling-of-holes-in-.bdrv_co.patch
      - d/p/u/lp-1968258-block-rbd-workaround-for-ceph-issue-53784.patch
    - d/p/u/lp-1970563-ui-vnc.c-Fixed-a-deadlock-bug.patch: avoid deadl...

Read more...

Changed in qemu (Ubuntu):
status: In Progress → Fix Released
tags: added: server-todo
Changed in qemu (Ubuntu Jammy):
assignee: nobody → Christian Ehrhardt  (paelzer)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI this is unblocked by having it in Kinetic. It was delayed from there due to me being out for a while and in addition we still need to wait for bug 1980896 to complete so that this one here can enter the SRU queue.

Prepared the jammy changes:
- MP: https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/429809
- PPA: https://launchpad.net/~paelzer/+archive/ubuntu/lp-1981339-s390x-emulation

description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks, uploaded as outlined in both SRU templates.

tags: added: cetest
tags: removed: cetest
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted qemu into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:6.2+dfsg-2ubuntu6.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Jammy):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-jammy
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.5)

All autopkgtests for the newly accepted qemu (1:6.2+dfsg-2ubuntu6.5) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

livecd-rootfs/2.765.10 (amd64, arm64)
systemd/249.11-0ubuntu3.6 (s390x)
ubuntu-image/2.2+22.04ubuntu3 (arm64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2022-10-11 03:38 EDT-------
with the qemu from proposed the testcase works fine now.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.3 KiB)

I didn't realize Frank was on that as well :-/
Well now it was tested two times...

In addition the autopkgtests that got reported all are known to be flaky.
I restarted them to see if anything would be reproducible (the logs do not look like it so far). By now three recovered, one is left rerunning again.

For the case itself I ran the testcase and hang (as expected) in trying with jammy as-is.

Then I upgraded to -proposed which worked fine:

root@j:~/linux-5.19.14# apt install qemu-system-s390x qemu-utils qemu-system-data qemu-system-common qemu-block-extra
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Suggested packages:
  debootstrap
The following packages will be upgraded:
  qemu-block-extra qemu-system-common qemu-system-data qemu-system-s390x qemu-utils
5 upgraded, 0 newly installed, 0 to remove and 21 not upgraded.
Need to get 7665 kB of archives.
After this operation, 5120 B of additional disk space will be used.
Get:1 http://ports.ubuntu.com/ubuntu-ports jammy-proposed/main s390x qemu-block-extra s390x 1:6.2+dfsg-2ubuntu6.5 [65.6 kB]
Get:2 http://ports.ubuntu.com/ubuntu-ports jammy-proposed/main s390x qemu-system-data all 1:6.2+dfsg-2ubuntu6.5 [1431 kB]
Get:3 http://ports.ubuntu.com/ubuntu-ports jammy-proposed/main s390x qemu-system-s390x s390x 1:6.2+dfsg-2ubuntu6.5 [2777 kB]
Get:4 http://ports.ubuntu.com/ubuntu-ports jammy-proposed/main s390x qemu-system-common s390x 1:6.2+dfsg-2ubuntu6.5 [1930 kB]
Get:5 http://ports.ubuntu.com/ubuntu-ports jammy-proposed/main s390x qemu-utils s390x 1:6.2+dfsg-2ubuntu6.5 [1461 kB]
Fetched 7665 kB in 1s (7003 kB/s)
(Reading database ... 78617 files and directories currently installed.)
Preparing to unpack .../qemu-block-extra_1%3a6.2+dfsg-2ubuntu6.5_s390x.deb ...
Unpacking qemu-block-extra (1:6.2+dfsg-2ubuntu6.5) over (1:6.2+dfsg-2ubuntu6.4) ...
Preparing to unpack .../qemu-system-data_1%3a6.2+dfsg-2ubuntu6.5_all.deb ...
Unpacking qemu-system-data (1:6.2+dfsg-2ubuntu6.5) over (1:6.2+dfsg-2ubuntu6.4) ...
Preparing to unpack .../qemu-system-s390x_1%3a6.2+dfsg-2ubuntu6.5_s390x.deb ...
Unpacking qemu-system-s390x (1:6.2+dfsg-2ubuntu6.5) over (1:6.2+dfsg-2ubuntu6.4) ...
Preparing to unpack .../qemu-system-common_1%3a6.2+dfsg-2ubuntu6.5_s390x.deb ...
Unpacking qemu-system-common (1:6.2+dfsg-2ubuntu6.5) over (1:6.2+dfsg-2ubuntu6.4) ...
Preparing to unpack .../qemu-utils_1%3a6.2+dfsg-2ubuntu6.5_s390x.deb ...
Unpacking qemu-utils (1:6.2+dfsg-2ubuntu6.5) over (1:6.2+dfsg-2ubuntu6.4) ...
Setting up qemu-system-common (1:6.2+dfsg-2ubuntu6.5) ...
Setting up qemu-system-data (1:6.2+dfsg-2ubuntu6.5) ...
Setting up qemu-utils (1:6.2+dfsg-2ubuntu6.5) ...
Setting up qemu-system-s390x (1:6.2+dfsg-2ubuntu6.5) ...
Setting up qemu-block-extra (1:6.2+dfsg-2ubuntu6.5) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for hicolor-icon-theme (0.17-2) ...
Scanning processes...
No services need to be restarted.
No containers need to be restarted.
No user sessions are running outdated b...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The remaining test fail is on amd64 livecd-rootfs
But there all tests fail via:

+ tar --xattrs --sort=name -czf ../livecd.ubuntu-cpc.wsl.rootfs.tar.gz bin boot dev etc home lib lib32 lib64 libx32 media mnt opt proc root run sbin snap srv sys tmp usr var
gzip: stdout: No space left on device

And then subsequent issues due to disk space.
That isn't caused qemu, but I'll reach out if Foundations knows something about it.

For the time being I'll run a migration-reference/0 run against it to clear the view.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The final test is resolved as well, it is currently broken by other unrelated changes.
The tooling knows about that and is all green now.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:6.2+dfsg-2ubuntu6.5

---------------
qemu (1:6.2+dfsg-2ubuntu6.5) jammy; urgency=medium

  * d/p/u/lp-1981339-*: Fix s390x emulation of newer kernels (LP: #1981339)

 -- Christian Ehrhardt <email address hidden> Tue, 13 Sep 2022 10:23:19 +0200

Changed in qemu (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
bugproxy (bugproxy)
tags: added: targetmilestone-inin2204
removed: targetmilestone-inin---
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.