Wrong code execution of s390x code with qemu TCG

Bug #2046439 reported by bugproxy
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
qemu (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Medium
Sergio Durigan Junior
Lunar
Won't Fix
Undecided
Unassigned
Mantic
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:

[ Impact ]

 * Wrong code execution with qemu.

 * Frequently used s390x code sequences are wrongly executed
   when running with qemu instruction set emulation.

 * This happens only in KVM VMs, not while running natively on s390x.

 * For example also with
   gcc 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
   on WSL (Windows 11_5.15.90.1-microsoft-standard-WSL2)
   with any kind of Build Optimization Options: O0, O1, O2, O3
   on KVM - like reported.

 * The problem was probably introduced with:
   Convert COMPARE, COMPARE LOGICAL
   https://gitlab.com/qemu-project/qemu/-/commit/a7e836d5
   and got fixed with:
   https://gitlab.com/qemu-project/qemu/-/commit/17b032c6

[ Test Plan ]

 * An Ubuntu Server 22.04 LTS installed on an s390x LPAR as KVM host
   and a KVM guest running on top - again 22.04.

 * Have a build environment installed with gcc 11(.4).

 * Now compiling this reproducer:
   #include <stdio.h>

   signed short v1 = 1;
   signed int v2 = 2;
   unsigned long long bug = 0;

   int main ()
   {
       if ((v1 < v2))
       {
           bug = v2;
       }
       printf("bug = %llu\n", bug);

       return 0;
   }
   with:
   gcc -o bug0 bug.c -O0 -fsanitize=undefined

 * Now running it:
   qemu-s390x-static -L /usr/s390x-linux-gnu/ ./bug0

 * Expected output (on KVM host, that natively runs Ubuntu):
   O0: 2
   O1: 2
   O2: 2
   O3: 2
   Actual output (on un-fixed qemu environment):
   O0: 2
   O1: 2
   O2: 0
   O3: 0

[ Where problems could occur ]

 * The fix is in COMPARE HALFWORD RELATIVE LONG and two files
   are touched in tcg:
   target/s390x/tcg/translate.c
   target/s390x/tcg/insn-data.h.inc

 * Problems can for example occur in the newly introduced in2_mri2_16s
   in case the pointer handling is wrong, or wrong arguments are taken
   (not only in in2_mri2_16s, but also in tcg_gen_qemu_ld16s).

 * Issues could also happen is something relies on mri2_32s or mri2_64.

 * The problem and fix is limited to s390x.

[ Other Info ]

 * The issue was initially reported at gcc upstream:
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986
   but tunred out to be a qemue problem.
   Nevertheless, there is a reproducer mentioned that got
   picked here as test case.

 * This issue is fixed in qemu 7, but qemu 6.2.0 in Ubuntu 22.04
   is still affected, hence this SRU.

__________

---Problem Description---
Wrong code execution with qemu

---Steps to Reproduce---
please have a look at the following bug:
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986

------------------------------------------------------------------------
Contact Information = Andreas Krebbel <email address hidden>

Machine Type = IBM Z

Userspace tool common name: qemu

The userspace tool has the following bit modes: 64 bit

Userspace deb: - 1:6.2+dfsg-2ubuntu6.15
------------------------------------------------------------------------

Frequently used s390x code sequences are wrongly executed when running with qemu instruction set emulation.

The problem has been fixed in upstream qemu already. A backport for qemu 7 branch has been committed as well. The qemu 6.2.0 version used in Ubuntu 22.04 needs a backport of a trivial fix to work properly:

From the GCC BZ:
Problem fixed in v8.0.0 (https://gitlab.com/qemu-project/qemu/-/commit/54fce97cfcaf5463ee5f325bc1f1d4adc2772f38).
The fix was backported to v7.2.2 (https://gitlab.com/qemu-project/qemu/-/commit/17b032c6598ea756889f25e8d3e4cd9f2036669b), but not to v6.

Please consider picking up
https://gitlab.com/qemu-project/qemu/-/commit/17b032c6598ea756889f25e8d3e4cd9f2036669b
for the Ubuntu 22.04 qemu package 1:6.2+dfsg-2ubuntu6.15

Related branches

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-204491 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
affects: linux (Ubuntu) → qemu (Ubuntu)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in qemu (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → nobody
Changed in ubuntu-z-systems:
importance: Undecided → High
status: New → Triaged
tags: added: server-next
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank Andreas, this is helpful and I appreciate the work.

But I'd not say this is something I'd rush or hold people back from a Christmas PTO.
Since that means the fix will land in ~january we do IMHO not really have to consider Lunar which goes EOL there.
But for jammy this will be a good fix to have.

Changed in qemu (Ubuntu Mantic):
status: New → Fix Released
Changed in qemu (Ubuntu Lunar):
status: New → Won't Fix
Changed in qemu (Ubuntu Jammy):
status: New → Triaged
Changed in qemu (Ubuntu):
status: New → Fix Released
Changed in qemu (Ubuntu Jammy):
importance: Undecided → Medium
tags: added: server-todo
removed: server-next
Changed in qemu (Ubuntu Jammy):
assignee: nobody → Sergio Durigan Junior (sergiodj)
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hello,

I'm working on this bug now, and I need at least one more patch backported from QEMU upstream:

https://gitlab.com/qemu-project/qemu/-/commit/bdbc87e323ee417735141ed2b11dab0091b57593

However, there are other fixes that have been added on top of the patch mentioned above, namely:

https://gitlab.com/qemu-project/qemu/-/commit/703d03a4aaf38f285555ef5422ba5ce075416fc4
https://gitlab.com/qemu-project/qemu/-/commit/349372ff9e3e7c047e258383f061a8617f66adc3

These fixes seem important, albeit unrelated to this specific problem we're dealing with.

I'd like to ask for IBM's input on the best approach to take here. If we decide to go along with backporting the fixes above, we might have to expand the scope of this bug.

Thanks.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2024-01-04 08:43 EDT-------
I think these fixes are worth backporting.

In general, QEMU maintains a few stable branches; it might be an interesting approach to monitor them for s390x fixes. For example here:

https://gitlab.com/qemu-project/qemu/-/commits/stable-8.1/target/s390x?ref_type=heads

"target/s390x: Fix CLC corrupting cc_src" immediately stands out as a pretty serious issue, preventing installation of certain Linux distros.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the reply.

Given that there are more patches that may be needed, I would like to ask if you could prepare the complete set of backported patches to be applied, please. I would feel more comfortable this way.

Thank you in advance!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-01-09 11:03 EDT-------
I would recommend taking following patches:

$ git log --oneline --grep 'Fixes:' --grep stable@ $(git merge-base v6.2.0 upstream/stable-7.2) upstream/stable-7.2 target/s390x/tcg/
[-] 154760b6db6 target/s390x: Fix LAALG not updating cc_src
[+] e7ecf6a45f4 target/s390x: Check reserved bits of VFMIN/VFMAX's M5
[+] 7760cfd9c82 target/s390x: Fix VSTL with a large length
[+] 7c4ce14b414 target/s390x: Use a 16-bit immediate in VREP
[-] 08a4e6da12d target/s390x: Fix the "ignored match" case in VSTRS
[+] c1bdd3cdc46 target/s390x: Fix assertion failure in VFMIN/VFMAX with type 13
[+] cdd6b6a764d target/s390x: Make MC raise specification exception when class >= 16
[+] 0ef0b83104e target/s390x: Fix ICM with M3=0
[-] 7cf33584512 target/s390x: Fix CONVERT TO LOGICAL/FIXED with out-of-range inputs
[+] 34009bfd681 target/s390x: Fix CLM with M3=0
[+] 772caa5f231 target/s390x: Make CKSM raise an exception if R2 is odd
[+] cc271aa410e target/s390x: Fix LOCFHR taking the wrong half of R2
[+] 347714a28cc target/s390x: Fix LCBB overwriting the top 32 bits
[+] 00acdd8a9f6 s390x/tcg: Fix LDER instruction format

commit bdbc87e323ee417735141ed2b11dab0091b57593
Author: Richard Henderson <email address hidden>
Date: Sun Feb 26 17:55:31 2023 -1000

target/s390x: Split out gen_ri2

[+] 17b032c6598 target/s390x: Fix emulation of C(G)HRL
[+] f7d81a351d6 target/s390x: Fix emulation of the VISTR instruction
[+] 131aafa7eff s390x/tcg: Fix opcode for lzrf
[+] d324c21ba0b target/s390x: Fix CLFIT and CLGIT immediate size
[+] 13c59eb09bd target/s390x: fix handling of zeroes in vfmin/vfmax
[+] db67a6ff480 target/s390x: Fix writeback to v1 in helper_vstl
[+] 21641ee5a9b target/s390x: Fix the accumulation of ccm in op_icm
[+] 16ed5f14215 s390x/tcg: Fix BRCL with a large negative offset
[+] fc3dd86a290 s390x/tcg: Fix BRASL with a large negative offset
[+] 2092fdd97c2 s390x: sck: load into a temporary not into in1
[+] 6da170beda3 target/s390x: Fix shifting 32-bit values for more than 31 bits
[+] df103c09bc2 target/s390x: Fix cc_calc_sla_64() missing overflows
[+] 57556b28afd target/s390x: Fix SRDA CC calculation
[+] 521130f2672 target/s390x: Fix SLDA sign bit index

The patches with [-] are irrelevant or cannot be applied without extensive modifications.
bdbc87e323ee417735141ed2b11dab0091b57593 is a pre-req, as you already identified.
2092fdd97c2 has a trivial conflicts which can be resolved by taking the new code.

So the following is looking good to me:

$ git checkout v6.2.0
$ git cherry-pick 521130f2672 57556b28afd df103c09bc2 6da170beda3 2092fdd97c2 fc3dd86a290 16ed5f14215 21641ee5a9b db67a6ff480 13c59eb09bd d324c21ba0b 131aafa7eff f7d81a351d6 17b032c6598 00acdd8a9f6 bdbc87e323ee417735141ed2b11dab0091b57593 347714a28cc cc271aa410e 772caa5f231 34009bfd681 0ef0b83104e cdd6b6a764d c1bdd3cdc46 7c4ce14b414 7760cfd9c82 e7ecf6a45f4

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Ilya, thx for your response and list of commits.

But please notice that the Ubuntu SRU process is asking for a complete, but minimal list of commits/patches to fix a certain issue (this is to minimize potential and additional risk that might be introduced due to too many and maybe partially unrelated fixes).
On top it's desired to have one LP bug per problem (and it's fix) - so in case it makes sense to solve issues on top, we need to think about splitting them up into different LP bugs.

So I sorry, but we may ask you about such a complete but minimal list of commits needed to fix this bug (I believe it's probably a bit less than the commits in 'git cherry-pick' of the last comment, no?!)

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-01-17 10:32 EDT-------
Just for this issue, the original list from @sergiodj is enough:

https://gitlab.com/qemu-project/qemu/-/commit/bdbc87e323ee417735141ed2b11dab0091b57593
https://gitlab.com/qemu-project/qemu/-/commit/17b032c6598ea756889f25e8d3e4cd9f2036669b

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you for providing the minimal set of patches for this bug fix. As Frank said, if some other fix is needed for qemu then it's best to open other bugs.

I prepared a backport for the package and put it here:

https://launchpad.net/~sergiodj/+archive/ubuntu/qemu-bug2046439/+packages

Could you please confirm that the backported patches are indeed enough to fix the problem? Also, we will need an SRU template for the bug so that it can be properly processed by the SRU team.

Thank you.

Changed in qemu (Ubuntu Jammy):
status: Triaged → In Progress
Frank Heimes (fheimes)
description: updated
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-02-16 07:42 EDT-------
The fix works, thanks!

# qemu-s390x /tmp/chrl
chrl: /home/iii/myrepos/qemu/tests/tcg/s390x/chrl.c:22: test_chrl: Assertion `!cc' failed.

# apt install -y qemu-user=1:6.2+dfsg-2ubuntu6.17~ppa2 --allow-downgrades

# qemu-s390x /tmp/chrl ; echo $?
0

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for confirming the fix!

I'm preparing the upload. The package should be ready for the SRU team later today.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted qemu into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:6.2+dfsg-2ubuntu6.18 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
bugproxy (bugproxy)
tags: added: targetmilestone-inin2204
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2024-02-23 05:26 EDT-------
It works, thanks!

# dpkg -s qemu-user | grep ^Version:
Version: 1:6.2+dfsg-2ubuntu6.18

# qemu-s390x /tmp/chrl ; echo $?
0

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-02-23 05:46 EDT-------
Thanks Ilya for your successful verification!

With that, I am setting the tags to "verfication-done" and "verification-done-jammy"

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:6.2+dfsg-2ubuntu6.18)

All autopkgtests for the newly accepted qemu (1:6.2+dfsg-2ubuntu6.18) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

casper/1.470.2 (amd64)
cinder/2:20.3.1-0ubuntu1.1 (amd64)
edk2/2022.02-3ubuntu0.22.04.2 (armhf)
systemd/249.11-0ubuntu3.12 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Frank Heimes (fheimes) wrote :

Potential regressions were resolved, now blocked by freeze.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

If I understood the test plan correctly, it's meant to build that sample .c program 4 times, each one with different optimization levels (-O0 through -O3), and then run it and show the output.

The verification step in comment #12 didn't seem to do that. It ran a binary once, and showed exit status.

Could someone please clarify?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hi, could the question from comment #16 be clarified please?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2024-04-02 13:14 EDT-------
Hi, yes, I used the upstream testcase there.
I just checked the C program with 4 levels of optimization, and it works fine too.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:6.2+dfsg-2ubuntu6.18

---------------
qemu (1:6.2+dfsg-2ubuntu6.18) jammy; urgency=medium

  * d/p/u/lp-2046439-s390x-*.patch: Fix emulation of
    "COMPARE HALFWORD RELATIVE LONG" on s390x.
    (LP: #2046439)

 -- Sergio Durigan Junior <email address hidden> Wed, 21 Feb 2024 15:44:50 -0500

Changed in qemu (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.