spapr_hcall from ubuntu_kvm_unit_test failed on ppc64el with Z-hwe kernel

Bug #1712803 reported by Po-Hsu Lin on 2017-08-24
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Undecided
Unassigned
linux (Ubuntu)
Medium
Unassigned
Xenial
Undecided
Unassigned
qemu (Ubuntu)
Undecided
Unassigned
Xenial
Low
Unassigned

Bug Description

[Impact]

 * Xenial with HWE kernel (matching new relesaes) and qemu without cloud
   archive (not matching new releases) could trigger hypercalls that are
   not supported in xenials-qemu.

 * There is no "real" case other than kvm tests yet to trigger it yet, but
   it is an easy and well contained fix, so we might fix proactively.

[Test Case]

  1. deploy xenial + HWE kernel on a ppc64el box
  2. sudo apt-get install qemu-kvm -y
  3. git clone --depth=1 https://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git
  4. cd kvm-unit-tests
  5. ./configure --endian=little; make
  6. sudo ppc64_cpu --smt=off
  7. sudo ./run_tests -v

[Regression Potential]

 * The changes are all upstream since a long time now (lowering the risk
   of silly mistakes)
 * The changes are isolated to ppc, so in the worst case only this arch
   should regress.
 * We implement new hcalls, so if anything there might be a regression if
   software expected them to fail, but now breaks by them working (don't
   think so given the nature of these hcalls).

[Other Info]

 * n/a

---

kernel: 4.10.0-33.37~16.04.1

I think this issue was introduced by the old qemu version (similar issue was spotted on Xenial before), will need to investigate this further.

qemu-system-ppc64 -machine pseries,accel=kvm -bios powerpc/boot_rom.bin -display none -serial stdio -kernel powerpc/spapr_hcall.elf -smp 1
FAIL: hypercall: h_set_sprg0: sprg0 = 0xcafebabedeadbeef
FAIL: hypercall: h_set_sprg0: sprg0 = 0xaaaaaaaa55555555
FAIL: hypercall: h_set_sprg0: sprg0 = 0x41a588
FAIL: hypercall: h_page_init: h_zero_page
FAIL: hypercall: h_page_init: h_copy_page
FAIL: hypercall: h_page_init: h_copy_page+h_zero_page
FAIL: hypercall: h_page_init: h_zero_page unaligned dst
FAIL: hypercall: h_page_init: h_copy_page unaligned src
XFAIL: hypercall: h_random: h-call available

SUMMARY: 9 tests, 8 unexpected failures, 1 expected failures

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1712803

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
tags: added: kernel-da-key
Po-Hsu Lin (cypressyew) wrote :

Tested with qemu-2.7.1 built from source. This test has passed on the same node.
The old qemu version is 2.5 on this system.
$ qemu-system-ppc --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.16), Copyright (c) 2003-2008 Fabrice Bellard

$ sudo /home/ubuntu/qemu-2.7.1/ppc64-softmmu/qemu-system-ppc64 -machine pseries,accel=kvm -bios powerpc/boot_rom.bin -display none -serial stdio -kernel powerpc/spapr_hcall.elf -smp 1 # -initrd /tmp/tmp.p0YqiDq4CZ
PASS: hypercall: h_set_sprg0: sprg0 = 0xcafebabedeadbeef
PASS: hypercall: h_set_sprg0: sprg0 = 0xaaaaaaaa55555555
PASS: hypercall: h_set_sprg0: sprg0 = 0x41a800
PASS: hypercall: h_page_init: h_zero_page
PASS: hypercall: h_page_init: h_copy_page
PASS: hypercall: h_page_init: h_copy_page+h_zero_page
PASS: hypercall: h_page_init: h_zero_page unaligned dst
PASS: hypercall: h_page_init: h_copy_page unaligned src
XFAIL: hypercall: h_random: h-call available
SUMMARY: 9 tests, 1 expected failures

EXIT: STATUS=1

affects: qemu-kvm (Ubuntu) → qemu (Ubuntu)

Hi Po-Hsu,
what do you mean by "introduced by the old qemu version"?
I assume you are testing new kernels by the kernel-da-key tag?
So do you mean "the new (maybe HWE) kernel fails this test on xenial" - and your assumption is that it triggers there because qemu doesn't have the right fixes to be able to handle it?
If so we should mirror this bug to IBM so they can assist in picking the right ppc patches needed.

Catched Po-Hsu on IRC.

He will outline the steps to get the test elf image and then we should mirror to IBM so they can check which ppc changes would be needed. Based on knowing that we can then decide to SRU or not.

tags: added: pp64el
tags: added: ppc64el
removed: pp64el
Po-Hsu Lin (cypressyew) wrote :

Steps to reproduce:
1. deploy xenial + HWE kernel on a ppc64el box
2. sudo apt-get install qemu-kvm -y
3. git clone --depth=1 https://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git
4. cd kvm-unit-tests
5. ./configure --endian=little; make
6. sudo ppc64_cpu --smt=off
7. sudo ./run_tests -v

From here you will see it's using the elf file from kvm-unit-tests/powerpc and failed with 3 tests:
 - spapr_hcall
 - emulator (bug 1723914)
 - sprs (1723904)

To run it with a newer qemu built from source:
1. wget http://download.qemu-project.org/qemu-2.10.0.tar.xz
2. tar -xf qemu-2.10.0.tar.xz
3. cd qemu-2.10.0
4. ./configure; make
5. cd ~/kvm-unit-tests
6. sudo su
7. export QEMU=/home/ubuntu/qemu-2.10.0/ppc64-softmmu/qemu-system-ppc64
8. ./run_test -v

bugproxy (bugproxy) on 2017-10-24
tags: added: architecture-ppc64le bugnameltc-160546 severity-medium targetmilestone-inin16043

------- Comment From <email address hidden> 2018-01-24 14:13 EDT-------
The reason for this bug is because the h_set_sprg0 and h_page_init hypervisor calls were not yet implemented in Qemu 2.5. The problem was solved by two patches. They implement one hypercall each.

https://github.com/qemu/qemu/commit/423576f771db8c6dbb944ddb8dc15b472f62de4a

This is a very simple hypercall that only sets up the SPRG0
register for the guest (since writing to SPRG0 was only permitted
to the hypervisor in older versions of the PowerISA).

https://github.com/qemu/qemu/commit/3240dd9a6924df18dfccb83defa0914065da076e

This hypercall either initializes a page with zeros, or copies
another page.
According to LoPAPR, the i-cache of the page should also be
flushed if using H_ICACHE_INVALIDATE or H_ICACHE_SYNCHRONIZE,
and the d-cache should be synchronized to the RAM if the
H_ICACHE_SYNCHRONIZE flag is used. For this, two new functions
are introduced, kvmppc_dcbst_range() and kvmppc_icbi()_range, which
use the corresponding assembler instructions to flush the caches
if running with KVM on Power. If the code runs with TCG instead,
the code only uses tb_flush(), assuming that this will be
enough for synchronization.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-24 14:47 EDT-------
Canonical Team,

Do we need to have these patches applied in Qemu for Ubuntu 16.04.3? If so, could you tell me the repository/branch/mailing list to where I should send the backport, please?

Thanks for identifying the changes, both changes are in since 2.6 so I'm first marking the bug tasks accordingly.

Changed in qemu (Ubuntu Xenial):
status: New → Triaged
Changed in qemu (Ubuntu):
status: New → Fix Released

The patch seems easy enough, only the second has some very minor noise when applying.
Let me know if I should do the backport or if you want to provide patches against 2.5 (patches against 2.5 as of the qemu git is fine).

In terms of SRU scheduling I think prio is low, but it is easy enough and seems right to do so.
But OTOH several qemu security updates are in flight.

My suggestion would be to:
1. backport (or get it from you)
2. provide a build in a ppa to have the solution verified
3. once spectre fixes have settled upload on top of that

Are you ok with that - and please choose what you prefer on #1

Changed in qemu (Ubuntu Xenial):
importance: Undecided → Low
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-25 13:20 EDT-------
That's fine. Yes, I can provide the patches. Please tell me as to how to proceed: do I submit the patches via Launchpad so you can apply them?

Hi Yasmins,
you can either attach the patches as files here on the LP bug or point me to a git branch of yours that I can reach.
Unless you want to do the related packaging work as well that is all I need.
I can easily do the wrap up of the patches into the packaging for you.

TL;DR - I'm not going to block you by "restrictive workflow processes", get the patches to me in any way and I'll make it work :-)

Ok Christian I'll just send the patches here then. Let me know if you need something.

One minor detail I should mention is that you're going to see 3 patches on the file instead of 2. That is because h_page_init hypercall depends on a function `is_ram_address` that didn't exist in v2.5. This patch has been backported as well.

Current is 1:2.5+dfsg-5ubuntu10.16, and while there is a security update for 1:2.5+dfsg-5ubuntu10.17 incoming we can - as discussed - test from a ppa.

To stay ahead of the soon to be expected 1:2.5+dfsg-5ubuntu10.17 I'll call mine 1:2.5+dfsg-5ubuntu10.18~ppa1 (actually skipping 17 for now so you stay ahead in testing).

@Yasmins - actually your email address to be correctly referred to in headers/changelog would be nice - so far I go with (IBM) as an address :-)

The proposed fix is now building in [1] and should be available in ~1h.
@Po-Hsu and Yasmins: please check this ppa build with your tests. If they are good and my regression tests pass as well we are as prepared as we can be to fix this when the security update is complete.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3120

description: updated

Prepped SRU template here in the bug, also the builds in the ppa are complete.

Tested this new build and they're good. You can use my IBM email address that's fine :-)

Also passed all regression tests on ppc without a hickup.
Other than waiting for the security update to pass we are ready.

I refreshed the tested changes on top of the security update and pushed it for SRU review into Xenial.

Changed in qemu (Ubuntu Xenial):
status: Triaged → In Progress

Waiting for the SRU Team to ack it into x-proposed now

Hello Po-Hsu, or anyone else affected,

Accepted qemu into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.5+dfsg-5ubuntu10.21 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in qemu (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
Download full text (6.2 KiB)

On Xenial + HWE as-is:

ubuntu@wichita:~/kvm-unit-tests$ sudo ./run_tests.sh -v; cat logs/spapr_hcall.log
TESTNAME=selftest-setup TIMEOUT=90s ACCEL= ./powerpc/run powerpc/selftest.elf -smp 2 -m 256 -append 'setup smp=2 mem=256'
PASS selftest-setup
TESTNAME=spapr_hcall TIMEOUT=90s ACCEL= ./powerpc/run powerpc/spapr_hcall.elf -smp 1
FAIL spapr_hcall
TESTNAME=rtas-get-time-of-day TIMEOUT=5 ACCEL= ./powerpc/run powerpc/rtas.elf -smp 1 -append "get-time-of-day date=$(date +%s)"
PASS rtas-get-time-of-day
TESTNAME=rtas-get-time-of-day-base TIMEOUT=5 ACCEL= ./powerpc/run powerpc/rtas.elf -smp 1 -rtc base="2006-06-17" -append "get-time-of-day date=$(date --date="2006-06-17 UTC" +%s)"
PASS rtas-get-time-of-day-base
TESTNAME=rtas-set-time-of-day TIMEOUT=5 ACCEL= ./powerpc/run powerpc/rtas.elf -smp 1 -append "set-time-of-day"
PASS rtas-set-time-of-day
TESTNAME=emulator TIMEOUT=90s ACCEL= ./powerpc/run powerpc/emulator.elf -smp 1
FAIL emulator
SKIP h_cede_tm (test marked as manual run only)
MIGRATION=yes TESTNAME=sprs TIMEOUT=90s ACCEL= ./powerpc/run powerpc/sprs.elf -smp 1 -append '-w'
FAIL sprs
timeout -k 1s --foreground 90s /usr/bin/qemu-system-ppc64 -nodefaults -machine pseries,accel=kvm -bios powerpc/boot_rom.bin -display none -serial stdio -kernel powerpc/spapr_hcall.elf -smp 1 # -initrd /tmp/tmp.mp3tm4K2lz
FAIL: hypercall: h_set_sprg0: sprg0 = 0xcafebabedeadbeef
FAIL: hypercall: h_set_sprg0: sprg0 = 0xaaaaaaaa55555555
FAIL: hypercall: h_set_sprg0: sprg0 = 0x419744
FAIL: hypercall: h_page_init: h_zero_page
FAIL: hypercall: h_page_init: h_copy_page
FAIL: hypercall: h_page_init: h_copy_page+h_zero_page
FAIL: hypercall: h_page_init: h_zero_page unaligned dst
FAIL: hypercall: h_page_init: h_copy_page unaligned src
XFAIL: hypercall: h_random: h-call available
SUMMARY: 9 tests, 8 unexpected failures, 1 expected failures

EXIT: STATUS=3
R

Upgrade to the version in proposed:
$ sudo apt install qemu-kvm qemu-block-extra qemu-system-ppc
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  qemu-system-common qemu-utils
Suggested packages:
  samba vde2 openbios-ppc openhackware debootstrap
The following packages will be upgraded:
  qemu-block-extra qemu-kvm qemu-system-common qemu-system-ppc qemu-utils
5 upgraded, 0 newly installed, 0 to remove and 27 not upgraded.
Need to get 3.276 kB of archives.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://ports.ubuntu.com/ubuntu-ports xenial-proposed/main ppc64el qemu-system-common ppc64el 1:2.5+dfsg-5ubuntu10.21 [279 kB]
Get:2 http://ports.ubuntu.com/ubuntu-ports xenial-proposed/main ppc64el qemu-utils ppc64el 1:2.5+dfsg-5ubuntu10.21 [474 kB]
Get:3 http://ports.ubuntu.com/ubuntu-ports xenial-proposed/main ppc64el qemu-block-extra ppc64el 1:2.5+dfsg-5ubuntu10.21 [31,7 kB]
Get:4 http://ports.ubuntu.com/ubuntu-ports xenial-proposed/main ppc64el qemu-kvm ppc64el 1:2.5+dfsg-5ubuntu10.21 [6.662 B]
Get:5 http://ports.ubuntu.com/ubuntu-ports xenial-proposed/main ppc64el qemu-system-ppc ppc64el 1:2.5+dfsg-5ubuntu10.21 [2.485 kB]
Fetched 3.276 kB in 0s (3.806 kB/...

Read more...

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.5+dfsg-5ubuntu10.21

---------------
qemu (1:2.5+dfsg-5ubuntu10.21) xenial; urgency=medium

  * debian/patches/ubuntu/lp-1712803-hypercalls_backport.patch: support
    newer hcalls; Thanks to Yasmin Beatriz Alves da Silva. (LP: #1712803).

 -- Christian Ehrhardt <email address hidden> Thu, 08 Feb 2018 14:06:39 +0100

Changed in qemu (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Po-Hsu Lin (cypressyew) on 2018-06-04
Changed in ubuntu-kernel-tests:
status: New → Fix Released
Changed in linux (Ubuntu):
status: Triaged → Invalid
Changed in linux (Ubuntu Xenial):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers