QEMU - count cache flush Spectre v2 mitigation (CVE) (required for POWER9 DD2.3)

Bug #1832622 reported by bugproxy
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
High
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Disco
Fix Released
High
Unassigned
qemu (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Won't Fix
Undecided
Unassigned
Bionic
Fix Released
High
Canonical Server
Cosmic
Won't Fix
Undecided
Unassigned
Disco
Fix Released
High
Canonical Server
Eoan
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * This belongs to the overall context of spectre mitigations and even
   more the try to minimize the related performance impacts.
   On ppc64el there is a new chip revision (DD 2.3) which provides
   a facility that helps to better mitigate some of this.

 * Backport the patches that will make the feature (if supported by the
   HW) will pass the capability to the guest - to allow guests that
   support the improved mitigation to use it.

[Test Case]

 * Start guests with and without this capability
   * Check if the capability is guest visible as intented
   * Check if there are any issues on pre DD2.3 HW
 * Test migrations (IBM outlined the intented paths that will work
   below)
 * The problem with the above (and also the reasons I didn't add a list
   of commands this time) is that it needs special HW (mentioned DD2.3
   revision) of the chips which aren't available to us right now.
   Due to that testing / verification of this on all releases is on IBM

[Regression Potential]

 * Adding new capabilities usually works fine, there are three common
   pitfalls which here are the regression potential.
   - (severe) the code would announce a capability that isn't really
     available. The guest tries to use it and crashes
   - (medium) several migration paths especially from systems with the
     new cap to older (un-updated systems) will fail. But that applies
     to any "from machine with Feature to machine without that feature"
     and isn't really a new regression. As outlined by IBM below they
     even tried to make it somewhat compatible (by being a new value in
     an existing cap)
   - (low) the guest will see new caps and or facilities. A really odd
     guest could stumble due to that (would actually be a guest bug
     then)
  Overall all of the above was considered by IBM when developing this
  and should be ok. For archive wide SRU considerations, this has NO
  effect on non ppc64el.

[Other Info]

 * n/a

---

Power9 DD 2.3 CPUs running updated firmware will use a new Spectre v2 mitigation. The new mitigation improves performance of branch heavy workloads, but also requires kernel support in order to be fully secure.

Without the kernel support there is a risk of a Spectre v2 attack across a process context switch, though it has not been demonstrated in practice.

QEMU portion - platform definition needs to account for this new mitigation action.. so attribute for this needs to be added.

In terms of support for virtualisation there are 2 sides, kvm and qemu support. Patch list for each,

KVM:
2b57ecd0208f KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
This is part of LP1822870 already.

QEMU:
8ff43ee404 target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST
399b2896d4 target/ppc/spapr: Add workaround option to SPAPR_CAP_IBS

The KVM side is upstream as of v5.1-rc1.
The QEMU side is upstream as of v4.0.0-rc0.

In terms of migration the state is as follows.

In order to specify to the guest to use the count cache flush workaround we use the spapr-cap cap-ibs (indirect branch speculation) with the value workaround. Previously the only valid values were broken, fixed-ibs (indirect branch serialisation) and fixed-ccd (count cache disabled). And add a new cap cap-ccf-assist (count cache flush assist) to specify the availability of the hardware assisted flush variant.

Note the the way spapr caps work you can migrate to a host that supports a higher value, but not to one which doesn't support the current value (i.e. only supports lower values). Where for cap-ibs these are defined as:
0 - Broken
1 - Workaround
2 - fixed-ibs
3 - fixed-ccd

So the following migrations would be valid for example:
broken -> fixed-ccd, broken -> workaround, workaround -> fixed-ccd

While the following would be invalid:
fixed-ccd -> workaround, workaround ->broken, fixed-ccd -> broken

This is done to maintain at least the level of protection specified on the command line on migration.
Since the workaround must be communicated to the guest kernel at boot we cannot migrate a guest from a host with fixed-ccd to one with workaround since the guest wouldn't know to do the flush and so would be wholly unprotected.

This means that to migrate a guest from 2.2 and before to 2.3 would require the guest to either be have been booted with broken previously, or to be rebooted with workaround specified on the command line which would allow the migration to succeed to a 2.3.

== MICHAEL D. ROTH ==
I've tested a backport of count-cache-flush support consisting of the following patches applied (cleanly) on top of bionic's QEMU 2.11+dfsg-1ubuntu7.14 source:

  target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST
  ppc/spapr-caps: Change migration macro to take full spapr-cap name
  target/ppc/spapr: Add workaround option to SPAPR_CAP_IBS
  target/ppc: Factor out the parsing in kvmppc_get_cpu_characteristics()

The following tests were done using a DD 2.3 Witherspoon machine and the results seem to align with what's expected in the original summary:

== enablement tests (using 4.15.0-51-generic in both host and guests) ==

with cap-ibs=workaround,cap-ccf-assist=on:
  mdroth@ubuntu:~$ dmesg | grep cache-flush
  [ 0.000000] count-cache-flush: hardware assisted flush sequence enabled

with cap-ibs=workaround,cap-ccf-assist=off:
  mdroth@ubuntu:~$ dmesg | grep cache-flush
  [ 0.000000] count-cache-flush: full software flush sequence enabled.

with cap-ibs=broken
  mdroth@ubuntu:~$ dmesg | grep cache-flush
  [ 0.000000] count-cache-flush: software flush disabled.

== migration tests (using 4.15.0-51-generic in both host and guests) ==

Note that pseries-2.11-sxxm/bionic-sxxm defaults to:

    smc->default_caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_WORKAROUND;
    smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_WORKAROUND;
    smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_FIXED_CCD

but SPAPR_CAP_FIXED_CCD is not available on the DD 2.3 system I tested on (no fw-count-cache-disabled/enabled in host fw-features device tree), so I used pseries-2.11-sxxm,cap-ibs=broken as the base-level

cross-migration: qemu 2.11+dfsg-1ubuntu7.14 -> 2.11+dfsg-1ubuntu7.14+ccf-backport

source: -M bionic-sxxm,cap-ibs=broken
  target: -M bionic-sxxm,cap-ibs=workaround,cap-ccf-assist=off
    expected: warning
    actual: warning
      "cap-ibs lower level (0) in incoming stream than on destination (1))"
    software ccf enabled after reboot? yes
  target: -M bionic-sxxm,cap-ibs=workaround,cap-ccf-assist=on
    expected: warning
    actual: warning
      "cap-ccf-assist lower level (0) in incoming stream than on destination (1))"
    hardware ccf enabled after reboot? yes
  target: -M bionic-sxxm,cap-ibs=broken
    expected: success
    actual: success

migration: 2.11+dfsg-1ubuntu7.14+ccf-backport -> 2.11+dfsg-1ubuntu7.14+ccf-backport

source: -M bionic-sxxm,cap-ibs=workaround,cap-ccf-assist=off
  target: -M bionic-sxxm,cap-ibs=workaround,cap-ccf-assist=off
    expected: success
    actual: success
  target: -M bionic-sxxm,cap-ibs=workaround,cap-ccf-assist=on
    expected: warning
    actual: warning
      "cap-ccf-assist lower level (0) in incoming stream than on destination (1)"
    hardware ccf enabled after reboot? yes
  target: -M bionic-sxxm,cap-ibs=broken
    expected: fail
    actual: fail
      "cap-ibs higher level (1) in incoming stream than on destination (0)"

source: -M bionic-sxxm,cap-ibs=workaround,ccf-assist=on
  target: -M bionic-sxxm,cap-ibs=workaround,cap-ccf-assist=on
    expected: success
    actual: success
  target: -M bionic-sxxm,cap-ibs=workaround,cap-ccf-assist=off
    expected: fail
    actual: fail, "cap-ccf-assist higher level (1) in incoming stream than on destination (0)"
  target: cap-ibs=broken (expected: fail, actual: )
    expected: fail
    actual: fail
      "cap-ibs higher level (1) in incoming stream than on destination (0)"
      "cap-ccf-assist higher level (1) in incoming stream than on destination (0)"

Sorry, I forgot that I needed some fix-ups for the 4th/last patch, "target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST".

I've gone ahead and posted my git tree, which is based on top of the qemu_2.11+dfsg-1ubuntu7.14 source, so the 4 patches there should apply cleanly. There's are notes in the commit notes on what changes were needed for patch 4.

https://github.com/mdroth/qemu/commits/spectre-ccf-ubuntu-bionic-1ubuntu7.14

Related branches

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-176932 severity-critical targetmilestone-inin18041
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → qemu (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → Critical
assignee: nobody → Canonical Server Team (canonical-server)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'm glad that the kernel patch is already integrated by bug 1822870 in >=Bionic - no dependency on the kernel here then.

The patches themselve look small and clean.
Thanks for identifying the extra dependencies to:
- 8fea7044 (>=3.0) target/ppc: Factor out the parsing in kvmppc_get_cpu_characteristics()
- 8c5909c4 (>=2.12) ppc/spapr-caps: Change migration macro to take full spapr-cap name

That overall makes the request to apply:
- 8c5909c4 (>=2.12) ppc/spapr-caps: Change migration macro to take full spapr-cap name
- 8fea7044 (>=3.0) target/ppc: Factor out the parsing in kvmppc_get_cpu_characteristics()
- 399b2896 (>=4.0) target/ppc/spapr: Add workaround option to SPAPR_CAP_IBS
- 8ff43ee4 (>=4.0) target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST

By reading the bug top down I ran into issues with patch #4, but then I read the rest and found that you already handled that. Taking the backport from the referenced git worked great, thanks Michael.

There was some minor noise bringing that to 2.12 and 3.0 but it worked rather straight forward as expected for 2.12. In qemu 3.0 thou we need something else for the fourth patch. Neither the upstream original (9 rejects), nor the backport you provided for 2.11 apply (10 rejects).
Upstream is a bit closer, the lack of "large decr" in qemu 3.0 shows up as context change a few times, but those were resovable.

For "SPAPR_CAP_CCF_ASSIST" I followed your backport of leaving no holes in the cap numbering (the alternative would be to retain it being 0x9, but leave some in between undefined which would break when iterating).

TODO
check cosmic applied include/hw/ppc/spapr.h SPAPR_CAP_CCF_ASSIST for wholes

IIRC Xenial has no P9 support and probably would be much harder to backport, so unless further discussion this is a Won't Fix for Xenial.

Timing: we have a qemu SRU in the pipe that needs verification and release. Once done we will enqueue that one.

But until then we can still work on this.
I opend MPs for internal review with the backports for Bionic/Cosmic/Disco/Eoan (linked to the bug here) and a PPA [1].
If you want to test anything ahead of proposed please feel free to take a look at MPs and/or the PPA.

[1]: https://launchpad.net/~paelzer/+archive/ubuntu/bug-1832622-qemu-spectre-ppc

Changed in qemu (Ubuntu Xenial):
status: New → Won't Fix
Changed in qemu (Ubuntu Bionic):
status: New → Triaged
Changed in qemu (Ubuntu Cosmic):
status: New → Triaged
Changed in qemu (Ubuntu Disco):
status: New → Triaged
Changed in qemu (Ubuntu Eoan):
status: New → Triaged
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There is a rather similar set of patches for new Intel CPU revisions in the pipe. And in between will be a set of general security fixes to the virt stack.

I'd prefer to push both at the same upload, to avoid users having to download qemu too often.
I'd assume that this bug here is important, but then also not super-urgent as DD2.3 availability (right now) still should be very low anyway right?

If this is rather urgent then please let us know and test the PPA asap on all releases. If that is ok I'll ask the security Team to base their coming fixes on this instead of what is in proposed.

Revision history for this message
Mike Ranweiler (mranweil) wrote :

That's correct on DD 2.3 - still not very available - and is ok. Will still post test results.

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In Eoan the merge of qemu 4.0 will fix this, this is ongoing and I added bug reference to its changelog so this bug will get an update once complete.

Rafael started to review my MPs for B/C/D and it seems ok so far.
The work on the similar and to-be-grouped upload for bug 1828495 is going well too.

A precheck by IBM on the PPA that the backports are working as expected on Bionic/Cosmic/Disco DD 2.3 HW would help tremendously to raise the confidence in this going forward towards SRUs then.

tags: added: qemu-19.10
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.0 KiB)

This bug was fixed in the package qemu - 1:4.0+dfsg-0ubuntu1

---------------
qemu (1:4.0+dfsg-0ubuntu1) eoan; urgency=medium

  * Merge with Upstream release of qemu 4.0.
    Among many other things this fixes LP Bugs:
    LP: #1782206 - SnowRidge Accelerator Interfacing Architecture (AIA)
    LP: #1828038 - Update s390x CPU Model for more HW support
    LP: #1832622 - count cache flush Spectre v2 mitigation for ppc64el
    Remaining Changes:
    - qemu-kvm to systemd unit
      - d/qemu-kvm-init: script for QEMU KVM preparation modules, ksm,
        hugepages and architecture specifics
      - d/qemu-system-common.qemu-kvm.service: systemd unit to call
        qemu-kvm-init
      - d/qemu-system-common.install: install helper script
      - d/qemu-system-common.maintscript: clean old sysv and upstart scripts
      - d/qemu-system-common.qemu-kvm.default: defaults for
        /etc/default/qemu-kvm
      - d/rules: call dh_installinit and dh_installsystemd for qemu-kvm
    - Enable nesting by default
      - d/qemu-system-x86.modprobe: set nested=1 module option on intel.
        (is default on amd)
      - d/qemu-system-x86.postinst: re-load kvm_intel.ko if it was loaded
        without nested=1
      - d/p/ubuntu/expose-vmx_qemu64cpu.patch: expose nested kvm by default
        in qemu64 cpu type.
      - d/p/ubuntu/enable-svm-by-default.patch: Enable nested svm by default
        in qemu64 on amd
      - d/qemu-system-x86.README.Debian: document intention of nested being
        default is comfort, not full support
    - Distribution specific machine type (LP: 1304107 1621042)
      - d/p/ubuntu/define-ubuntu-machine-types.patch: define distro machine
        types
      - d/qemu-system-x86.NEWS Info on fixed machine type defintions
        for host-phys-bits=true (LP: 1776189)
      - add an info about -hpb machine type in debian/qemu-system-x86.NEWS
      - provide pseries-bionic-2.11-sxxm type as convenience with all
        meltdown/spectre workarounds enabled by default. (LP: 1761372).
    - improved dependencies
      - Make qemu-system-common depend on qemu-block-extra
      - Make qemu-utils depend on qemu-block-extra
      - let qemu-utils recommend sharutils
    - s390x support
      - Create qemu-system-s390x package
      - Enable numa support for s390x
    - arch aware kvm wrappers
    - d/control: update VCS links
    - qemu-guest-agent: freeze-hook fixes (LP: 1484990)
      - d/qemu-guest-agent.install: provide /etc/qemu/fsfreeze-hook
      - d/qemu-guest-agent.dirs: provide /etc/qemu/fsfreeze-hook.d
    - d/control-in: enable RDMA support in qemu (LP: 1692476)
        - enable RDMA config option
        - add libibumad-dev build-dep
    - tolerate ipxe size change on migrations to >=18.04 (LP: 1713490)
      - d/p/ubuntu/pre-bionic-256k-ipxe-efi-roms.patch: old machine types
        reference 256k path
      - d/control-in: depend on ipxe-qemu-256k-compat-efi-roms to be able to
        handle incoming migrations from former releases.
    - d/control-in: Disable capstone disassembler library support (universe)
    - Move s390x roms to a new qemu-system-data-s390x
      - d/qemu-system-data.install: install s390x roms as archi...

Read more...

Changed in qemu (Ubuntu Eoan):
status: Triaged → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Done in Eoan.

Setting the SRU tasks to incomplete to better reflect that we at least would want to get a positive reply from a sniff test on Bionic from the PPA [1] before thrwoing that into the SRU queue.

[1]: https://launchpad.net/~paelzer/+archive/ubuntu/bug-1832622-qemu-spectre-ppc

Changed in qemu (Ubuntu Disco):
status: Triaged → Incomplete
Changed in qemu (Ubuntu Cosmic):
status: Triaged → Incomplete
Changed in qemu (Ubuntu Bionic):
status: Triaged → Incomplete
Manoj Iyer (manjo)
Changed in qemu (Ubuntu Eoan):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Server Team (canonical-server)
Changed in qemu (Ubuntu Disco):
assignee: nobody → Canonical Server Team (canonical-server)
Changed in qemu (Ubuntu Cosmic):
assignee: nobody → Canonical Server Team (canonical-server)
Changed in qemu (Ubuntu Bionic):
assignee: nobody → Canonical Server Team (canonical-server)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Cosmic is about to end full support, lets reduce the test matrix a bit by already dropping the Cosmic task.

@IBM - I'm still waiting on a positive feedback on this sniff test.
Without I can't reliable make it part of the next coming (soon) qemu upload.
Also to be aware once SRUs on this are accepted by the SRU Team the same tests will be needed for Bionic and Disco.

Changed in qemu (Ubuntu Cosmic):
status: Incomplete → Won't Fix
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: Since I can't check this on the HW shared with us and lacking feedback on the PPA I have backed these changes out of the now started SRU update.

That gives you some more time to get this testing done ... and me the confidence to not rush something that will fail and we might have known if only we checked in advance.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The next Qemu SRU is about to start - probably somewhen this week.
Any chance that these checks are completed now to include this fix?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Given there was no reply I can't see how we hold this up as "critical" severity.
I have marked our tasks as low, given that without the feedback they aren't actionable at all.

I'd ask project tracking task to be lowered as well and unassigned from the server team (for now at least)

Changed in qemu (Ubuntu Bionic):
importance: Undecided → Low
Changed in qemu (Ubuntu Disco):
importance: Undecided → Low
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
importance: Critical → Medium
assignee: Canonical Server Team (canonical-server) → nobody
Revision history for this message
Diane Brent (drbrent) wrote :

What causes the status for Bionic to be "incomplete" and low priority?

Revision history for this message
Frank Heimes (fheimes) wrote :

Hello, since a test of the qemu test-build package was requested (available from the PPA mentioned in comment #1, made available mid of June), and the engineer/maintainer is waiting for some feedback since a while (please notice that we can not test this by ourselves), a prioritization was needed to unlock resources and to re-focus on further tickets (partly also other qemu bugs).
Once the package got successfully tested, the work on this one will promptly proceed and the states again adjusted. Hope this explains the procedure ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
since we are waiting quite some time for ther verification of the version in the PPAit got surpassed by other SRUs. I know your engineers know how to test explicit versions from the PPA (with apt install <pkg>=version), but to make things even easier I created (just for bionic) a respin rebased to the new version.

If it helps you, then you you might use PPA [1] for your test on the DD2.3 HW.

[1]: https://launchpad.net/~paelzer/+archive/ubuntu/bug-1832622-qemu-spectre-ppc-rebuild

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (3.2 KiB)

------- Comment From <email address hidden> 2019-08-21 03:21 EDT-------
I did testing on this and got the same results. The different scenarios are listed here and all match up with original results. I tested with 1:2.11+dfsg-1ubuntu7.18~ppa1 .

No migration:

max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on:
count-cache-flush: hardware assisted flush sequence enabled

max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off:
count-cache-flush: full software flush sequence enabled.

max-cpu-compat=power9,cap-ibs=broken:
count-cache-flush: software flush disabled.

First set:
Source: max-cpu-compat=power9,cap-ibs=broken
Target: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off

Result: worked w/warning:
qemu-system-ppc64le: warning: cap-ibs lower level (0) in incoming stream than on destination (1)

Source: max-cpu-compat=power9,cap-ibs=broken
Target: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on

Result: worked w/warning:
qemu-system-ppc64le: warning: cap-ibs lower level (0) in incoming stream than on destination (1)
qemu-system-ppc64le: warning: cap-ccf-assist lower level (0) in incoming stream than on destination (1)

Source: max-cpu-compat=power9,cap-ibs=broken
Target: max-cpu-compat=power9,cap-ibs=broken

Result: worked

Second set:
Source: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off
Target: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off

Result: worked

Source: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off
Target: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on

Result: worked w/warning
qemu-system-ppc64le: warning: cap-ccf-assist lower level (0) in incoming stream than on destination (1)
[ 0.000000] count-cache-flush: full software flush sequence enabled.

Source: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off
Target: max-cpu-compat=power9,cap-ibs=broken

Result: fail
qemu-system-ppc64le: cap-ibs higher level (1) in incoming stream than on destination (0)
qemu-system-ppc64le: error while loading state for instance 0x0 of device 'spapr'
qemu-system-ppc64le: load of migration failed: Invalid argument

Third set:

Source: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on
Target: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on

Result: worked
count-cache-flush: hardware assisted flush sequence enabled

Source: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on
Target: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off

Result: fail
qemu-system-ppc64le: cap-ccf-assist higher level (1) in incoming stream than on destination (0)
qemu-system-ppc64le: error while loading state for instance 0x0 of device 'spapr'
qemu-system-ppc64le: load of migration failed: Invalid argument

Source: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on
Target: max-cpu-compat=power9,cap-ibs=broken

Result: fail
qemu-system-ppc64le: cap-ibs higher level (1) in incoming stream than on destination (0)
qemu-system-ppc64le: cap-ccf-assist higher level (1) in incoming stream than on destination (0)
qemu-system-ppc64le: error while loading state for instance 0x0 of device 'spapr'
qemu-system-ppc64le...

Read more...

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Moving 'bionic' series back to 'triaged' to review Michael's test results (comment #14).

Changed in qemu (Ubuntu Bionic):
status: Incomplete → Confirmed
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

...correction: moved to 'confirmed'.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for doign that Test Michael.
It is a lot of text so I'll summarize (e.g. for the SRU team later):
Section "No migration"
=> mitigation in the guest is detected correctly
Section with migrations has three elements:
=> source == target config -> migration works
=> source older than target config -> migration works with warning
=> source newer than target config -> migration fails

That is exactly as predicted/expected which means we can go on with this as an SRU.

Changed in qemu (Ubuntu Disco):
status: Incomplete → Confirmed
importance: Low → High
Changed in qemu (Ubuntu Bionic):
importance: Low → High
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-08-21 08:29 EDT-------
(In reply to comment #28)
> Thanks for doign that Test Michael.
> It is a lot of text so I'll summarize (e.g. for the SRU team later):
> Section "No migration"
> => mitigation in the guest is detected correctly
> Section with migrations has three elements:
> => source == target config -> migration works
> => source older than target config -> migration works with warning
> => source newer than target config -> migration fails
>
> That is exactly as predicted/expected which means we can go on with this as
> an SRU.

Have tested and raised two issues
One is on migration:
Migration from cap-ibs=workaround -> cap-ibs=broken crashes guest rather to fail the migration gracefully.
expected the source guest continue to be in running state after the migration failure, but the guest crashes at destination and leaves the guest in source in paused state.
Raised Bug 180734 for the same.

Another is on usability of the hardware assisted flush(cap-ccf-assist=on), right now it has be set explicity in qemu-cmdline though we have HW support, but other layers like libvirt etc will not know about it. So it is not possible for user to set the capability though underlying HW is capable.
Raised Bug 180735 for the same.

Regards,
-Satheesh

Revision history for this message
bugproxy (bugproxy) wrote : Test logs

------- Comment on attachment From <email address hidden> 2019-08-21 08:34 EDT-------

Complete test logs, apart from above two cases, other scenarios are working fine.
Regards,
-Satheesh

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@IBM - so my working assumption then is that you'll get to us with whatever is needed/recommended for your new bugs 180734 / 180735 later on but for now want the patches we discussed and tested here to be pushed.

TL;DR: provide the security fix as tested now, potentially refine it later.

A confirmation of this would be great.

Changed in qemu (Ubuntu Eoan):
assignee: Canonical Server Team (canonical-server) → nobody
Changed in qemu (Ubuntu Cosmic):
assignee: Canonical Server Team (canonical-server) → nobody
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-08-21 11:25 EDT-------
Is this ready to move out of Reopened state and to submitted or verified or something?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I'm not sure if that is a question about internal bugzilla statuses, or about external launchpad statuses.

In launchpad, this issue is https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1832622 and has tasks opened against Bionic and Disco series, meaning those series are still to be fixed.

It has been fixed in the development series already (eoan), and will not be fixed in xenial/cosmic.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-08-21 18:53 EDT-------
It was IBM bugzilla status, I'll move it all back.

I took a look at the new bugs - 180734 and 180735. The first (180734) I can recreate on my system if I do it exactly (or nearly so) as you do - the status shows paused (postmigrate) and it's no longer responsive.. With my setup with more options it works fine for me. I have yet figured out which option triggers the change for me.

When I use my original options and directly to qemu-system-ppc64 it doesn't crash. It is an invalid migration - going from workaround to broken should fail. I get a similar warning message when I try it - but then the source remains active. Here's what I was originally using:
/usr/bin/qemu-system-ppc64le -m 20480 -smp 32,maxcpus=32,sockets=4,cores=8,threads=1 -object memory-backend-ram,id=ram-node0,size=10737418240 -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=10737418240 -numa node,nodeid=1,cpus=8-15,memdev=ram-node1 -realtime mlock=off -rtc base=utc -no-shutdown -boot strict=on -msg timestamp=on -device qemu-xhci,id=usb,bus=pci.0 -device spapr-vscsi,id=scsi0,reg=0x2000 -drive file=/home/ubuntu/u1804-root.qcow2.snap0.radix0,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,iommu_platform=off,disable-legacy=on -drive file=/home/ubuntu/secondary.qcow2,format=qcow2,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk1,id=virtio-disk1 -drive if=none,id=drive-scsi0-0-0-0,readonly=on -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -netdev user,id=hostnet0,hostfwd=tcp:127.0.0.1:2222-:22 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:66:32:28,bus=pci.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0 -monitor unix:/tmp/mdroth-vm0-hmp.sock,server,nowait -nographic -vnc none -L /usr/share/qemu/ -machine pseries-bionic-sxxm,accel=kvm,usb=off,dump-guest-core=off,max-cpu-compat=power9,cap-ibs=workaround

The second (180735) is a feature request.

It seems like we should move forward with the SRU now and fix bug 180734 as there becomes a fix available - it doesn't look like there is one now.

Suraj/Satheesh - you agree?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-08-22 04:00 EDT-------
Michael, sounds like the correct approach to take

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We have reviewed and tested the branch individually already.
I now had a test set running over night with the ones applied together that I intend to push in one SRU. All worked fine, uploading to -unapproved for the SRU Team to take a look.

Changed in qemu (Ubuntu Bionic):
status: Confirmed → In Progress
Changed in qemu (Ubuntu Disco):
status: Confirmed → In Progress
Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted qemu into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:3.1+dfsg-2ubuntu3.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in qemu (Ubuntu Disco):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-disco
Changed in qemu (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Robie Basak (racb) wrote :

Hello bugproxy, or anyone else affected,

Accepted qemu into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.11+dfsg-1ubuntu7.18 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
Diane Brent (drbrent) wrote :

IBMm will verify this today.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (4.0 KiB)

------- Comment From <email address hidden> 2019-08-27 12:42 EDT-------
I tested this out with the same verification checks as above and got the same results. The summary is that the mitigations are detected correctly in the guest and the migrations works when it should, warns when it should, and fails when it should.

ii qemu-block-extra:ppc64el 1:2.11+dfsg-1ubuntu7.18 ppc64el extra block backend modules for qemu-system and qemu-utils
ii qemu-kvm 1:2.11+dfsg-1ubuntu7.18 ppc64el QEMU Full virtualization on x86 hardware
ii qemu-system-common 1:2.11+dfsg-1ubuntu7.18 ppc64el QEMU full system emulation binaries (common files)
ii qemu-system-ppc 1:2.11+dfsg-1ubuntu7.18 ppc64el QEMU full system emulation binaries (ppc)
ii qemu-utils 1:2.11+dfsg-1ubuntu7.18 ppc64el QEMU utilities

No migration:
max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on
count-cache-flush: hardware assisted flush sequence enabled

max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off
count-cache-flush: full software flush sequence enabled.

max-cpu-compat=power9,cap-ibs=broken
count-cache-flush: software flush disabled.

Migrations:
Source: max-cpu-compat=power9,cap-ibs=broken
Target: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off

Worked w/warning:
qemu-system-ppc64le: warning: cap-ibs lower level (0) in incoming stream than on destination (1)
count-cache-flush: software flush disabled.

Source: max-cpu-compat=power9,cap-ibs=broken
Target: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on

Worked w/warning:
count-cache-flush: software flush disabled.
qemu-system-ppc64le: warning: cap-ibs lower level (0) in incoming stream than on destination (1)
qemu-system-ppc64le: warning: cap-ccf-assist lower level (0) in incoming stream than on destination (1)
count-cache-flush: software flush disabled.

Source: max-cpu-compat=power9,cap-ibs=broken
Target: max-cpu-compat=power9,cap-ibs=broken

Worked
count-cache-flush: software flush disabled.

Set 2:

Source: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off
Target: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off
Worked
count-cache-flush: full software flush sequence enabled.

Source: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off
Target: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on
Worked w/warning
count-cache-flush: full software flush sequence enabled.
qemu-system-ppc64le: warning: cap-ccf-assist lower level (0) in incoming stream than on destination (1)

Source: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=off
Target: max-cpu-compat=power9,cap-ibs=broken

Failed:
qemu-system-ppc64le: cap-ibs higher level (1) in incoming stream than on destination (0)
qemu-system-ppc64le: error while loading state for instance 0x0 of device 'spapr'
qemu-system-ppc64le: load of migration failed: Invalid argument

Third set:
Source: max-cpu-compat=power9,cap-ibs=workaround,cap-ccf-assist=on
Target: max-cpu-compat=power9,cap-ibs=wo...

Read more...

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Many thanks Michael for the bionic testing. Updating the bionic tags accordingly.

Are you also able to test the disco -proposed package 1:3.1+dfsg-2ubuntu3.4?

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-08-29 03:08 EDT-------
Sorry, my machine had a hw issue, but Satheesh made a DD 2.3 available with a fresh disco install. I had trouble with the disco qemu, though:
root@ws-g48-2d81-host:~# /usr/bin/qemu-system-ppc64le --version
QEMU emulator version 3.1.0 (Debian 1:3.1+dfsg-2ubuntu3.4)

For either cap-ccf-assist=off or cap-ccf-assist=on qemu doesn't start:
qemu-system-ppc64le: Requested safe indirect branch capability level not supported by kvm, try cap-ibs=broken

So maybe we're missing a patch, here or in the kernel, for disco.

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Thanks for testing Michael. I've marked disco as verification-failed.

tags: added: verification-failed-disco
removed: verification-needed-disco
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It is the same set of patches as we have on Bionic.
Bionic has
1. 8fea70440eb0d095442de7e80d586a285cf96be5
2. 399b2896d4948a1ec0278d896ea3a561df768d64
3. 8c5909c41916f25b47bfdc465059a926603c1319
4. 8ff43ee404d3e295839d1fd4e9e6571ca7a62a66

Disco for this bug has #2+#4 while #1+#3 are already part of the base version that is in qemu of Disco.

Due to different contexts they are slightly different.
Upstream defines it as
+#define SPAPR_CAP_CCF_ASSIST 0x09
Due to the context change in Bionic and Disco 0x06 and 0x08 respectively.
That index matters if it would be off in the capability_table[SPAPR_CAP_NUM].
I recounted the field to ensure there is no off by one and also otherwise compared the diffs of the upstream commits and the bionic/disco backports. There doesn't seem to be an issue in those.

@Michael could you retest this on Disco and the kernel you used (and worked) from Bionic.
If it is a kernel issue I'm fine and we can open a kernel task for it for Disco? That would help as we would not have to stop/gate qemu in that case.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:2.11+dfsg-1ubuntu7.18)

All autopkgtests for the newly accepted qemu (1:2.11+dfsg-1ubuntu7.18) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

ubuntu-image/1.7+18.04ubuntu1 (ppc64el, s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:3.1+dfsg-2ubuntu3.4)

All autopkgtests for the newly accepted qemu (1:3.1+dfsg-2ubuntu3.4) for disco have finished running.
The following regressions have been reported in tests triggered by the package:

systemd/240-6ubuntu5.3 (i386, ppc64el)
nova/2:19.0.1-0ubuntu2.1 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/disco/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-08-30 00:57 EDT-------
I did confirm it on bionic as a kernel issue - I could recreate the error on bionic with the bionic proposed qemu and the disco kernel (and additionally with an older bionic kernel, too). I wasn't able to get a setup for disco to confirm it working, or get the exact patch needed yet.

Changed in linux (Ubuntu Disco):
status: New → Confirmed
importance: Undecided → High
no longer affects: linux (Ubuntu Cosmic)
no longer affects: linux (Ubuntu Eoan)
no longer affects: linux (Ubuntu Xenial)
Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Bionic):
status: New → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Confirmed
Revision history for this message
Frank Heimes (fheimes) wrote :

May I ask which kernel was used while testing on disco - was is the kernel from main/updates or proposed (5.0.0.27)?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-08-30 12:17 EDT-------
From -proposed - 5.0.0-27.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - the related autopkgtest issues would now be resolved.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (5.4 KiB)

Lacking better options I gave this some extra testing on a pre DD2.3 P9 box.
revision : 2.2 (pvr 004e 1202)
I though at least CCF=off I should be able to test with these chips and that worked fine.

Summary:
- the new versions make cap-ibs=fixed-ibs work on DD2.2
- CCF=off works with Bionic and Disco kernels on DD 2.2
- CCF=on untestable without DD 2.3 HW as expected
- Working in Disco just as much as in Bionic

Are you 100% sure on the FW and HW levels that are on the DD2.3 machine that you used to test Disco?
Given my results are all good and your Bionic results were good with essentially the same code as in Disco I'm beginning to wonder if it might be an issue on the borrowed DD2.3 machine that you used for the Disco test.

@IBM - can you get a machine on which you first check that it works for CCF with Bionic (to ensure we know the HW/FW is good) and then directly upgrade this very same machine to Disco to verify it there?

FYI - the ongoing SRU contains more than just this change, and at some point I'll need to unblock the others.
Therefore I'd set a limit of ~48h from now. If we can't find a way to resolve the verification issue on this bug as-is until then I'll have to reroll the current SRU without this fix to get things going.

--- Tests Details ---

Note:
- Start basic guest with (and check it boots the bootloader):
  This can be done after just installing qemu-system-ppc
sudo /usr/bin/qemu-system-ppc64 -name guest=bionic,debug-threads=on -m 512 -smp 1 -no-user-config -nodefaults -nographic -chardev stdio,mux=on,id=char0 -mon chardev=char0,mode=readline -serial chardev:char0 -machine pseries-bionic,accel=kvm,usb=off,dump-guest-core=off,cap-ccf-assist=off,cap-ibs=fixed-ccd

This can be done with disks for a full linux boot, but doesn't have to for this test. To do so add:
 -boot strict=on -drive file=/var/lib/uvtool/libvirt/images/eoan.qcow,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1

#1: Bionic as-is
- qemu: 1:2.11+dfsg-1ubuntu7.17 kernel: 4.15.0.58.60
=> works (guest can be started as-is)
=> reports (-machine...?):
 cap-sbbc=string (Speculation Barrier Bounds Checking (broken, workaround, fixed)(null))
 cap-cfpc=string (Cache Flush on Privilege Change (broken, workaround, fixed)(null))
 cap-ibs=string (Indirect Branch Speculation (broken, fixed-ibs, fixed-ccd)(null))
Test IBS modes adding ,cap-ibs=:
- broken - ok
- fixed-ccd - ok
- fixed-ibs - "not supported by kvm"
Test CCF modes ,cap-ccf-assist=
- (doesn't exist here)

#2: Bionic proposed qemu
- qemu 1:2.11+dfsg-1ubuntu7.18 kernel: 4.15.0.58.60 (same as above)
=> works (guest can be started as-is)
=> reports (-machine...?):
 cap-sbbc=string (Speculation Barrier Bounds Checking (broken, workaround, fixed)(null))
 cap-cfpc=string (Cache Flush on Privilege Change (broken, workaround, fixed)(null))
 cap-ibs=string (Indirect Branch Speculation (broken, fixed-ibs, fixed-ccd)(null))
+cap-ccf-assist=bool (Count Cache Flush Assist via HW Instruction(null))
Test IBS modes adding ,cap-ibs=:
- broken - ok
- fixed-ccd - ok
- fixed-ibs - ok
Test CCF modes adding ,cap-ccf-assist=
- o...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I think I found the missing kernel bit.

As reported it needs:
2b57ecd0208f KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()

Which was brought into Bionic/Cosmic already as part of bug LP1822870.
This is only needed when I'd be on new HW/FW

Bionic: $ grep -Hrn KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST *
arch/powerpc/kvm/powerpc.c:1949: KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST;
arch/powerpc/kvm/powerpc.c:2014: cp->character |= KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST;
arch/powerpc/kvm/powerpc.c:2021: KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST;
arch/powerpc/include/uapi/asm/kvm.h:466:#define KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST (1ull << 54)

Disco: the same grep finds nothing.

$ git tag --contains 2b57ecd0208f
v5.1
...
Disco is on 5.0.0.27.28, so it needs this commit.

Comparing git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git with git://kernel.ubuntu.com/ubuntu/ubuntu-disco.git confirms, this was lost on the path to Disco.

@IBM - can we release the qemu portion of this now and the kernel Team will include that on the next kernel SRU cycle? Or does the addition of this to Qemu without the related kernel change break anything. It didn't seem so to me in my DD 2.2 Tests.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Back in bug 1822870 it was reported that the Disco kernel is only missing 92edf8df which is still applied to Disco these days. Maybe due to that 2b57ecd0208f was lost.

@Kernel Team - could you go through all changes that made up bug 1822870 and ensure whatever is missing will be added to Disco?

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Bumping priority up to high after discussions with IBM.

Changed in ubuntu-power-systems:
importance: Medium → High
Revision history for this message
Juerg Haefliger (juergh) wrote :

Confirmed that the Disco kernel is only missing 2b57ecd0208f ("KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()") from the patchset referenced in bug 1822870.

Changed in linux (Ubuntu Disco):
status: Confirmed → In Progress
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Per my Tests we already know that on DD2.0 HW things are fine, you can't enable CCF which is expected, but it doesn't break formerly working cases there.
And I'm not sure if there is DD2.3 HW in the wild already.

Furthermore I was in contact with Leonardo yesterday, he is working with the Authors of the patches to let us know if we can safely release the qemu changes before the kernel OR if we have to unroll them for now until this is fixed in the kernel.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-09-03 19:45 EDT-------
I ran the tests mentioned in launchpad comment #40 on a DD2.3 witherspoon machine with GA firmware. Aside from the issue caused by the missing kernel patch, QEMU behaved as expected.

One thing of note is that the following firmware features are disabled:

ibm,opal/fw-features/fw-bcctrl-serialized
ibm,opal/fw-features/fw-count-cache-disabled

which means that 'cap-ibs=fixed-ibs' and 'cap-ibs=fixed-ccd' are always refused by KVM in this machine.

I attached the test results as qemu-dd2.3-sanity.txt

Revision history for this message
bugproxy (bugproxy) wrote : qemu-dd2.3-sanity.txt

------- Comment (attachment only) From <email address hidden> 2019-09-03 19:44 EDT-------

Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: Confirmed → Fix Committed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks a lot <email address hidden>.
Especially for noting the known firmware featues influencing this in your case and then combining cap-ibs=workaround,cap-ccf-assist=on to prove the new features work.

I see that cap-ccf-assist=on can be used and successfully grants the guest
[ 0.000000] count-cache-flush: hardware assisted flush sequence enabled

The one thing I wondered is your #7 showing cap-ibs=workaround not working.
Could that be another missed kernel patch as we have seen it working in #2.

Could you please add and run the following cases to your list:
*** 8- Bionic-proposed kernel + Disco-updates QEMU
*** 9- Bionic-proposed kernel + Disco-proposed QEMU
In those (at least) test "cap-ibs=workaround" and "cap-ibs=workaround,cap-ccf-assist=on"

With those two tests on top we can check if:
- if cap-ibs=workaround works in #8 but we know it failed in #7
  => the Disco kernel broke it in #7
  => We'd need to find what else the Disco kernel misses vs Bionic.
- if cap-ibs=workaround works in #8 but fails in #9
  => the new disco qemu update breaks it
  => We'd need to find why

Revision history for this message
Fabiano Rosas (farosas) wrote :

That is the effect of the lack of "2b57ecd0208f KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()" in Disco.

QEMU checks for KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE which is introduced in the above commit:

(From lp-1832622-0002-target-ppc-spapr-Add-workaround-option-to-SPAPR_CAP_.patch)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index f0f5bf9391..4d46314276 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2392,7 +2392,13 @@ static int parse_cap_ppc_safe_bounds_check(struct kvm_ppc_cpu_char c)

 static int parse_cap_ppc_safe_indirect_branch(struct kvm_ppc_cpu_char c)
 {
- if (c.character & c.character_mask & H_CPU_CHAR_CACHE_COUNT_DIS) {
+ if ((~c.behaviour & c.behaviour_mask & H_CPU_BEHAV_FLUSH_COUNT_CACHE) &&
+ (~c.character & c.character_mask & H_CPU_CHAR_CACHE_COUNT_DIS) &&
+ (~c.character & c.character_mask & H_CPU_CHAR_BCCTRL_SERIALISED)) {
+ return SPAPR_CAP_FIXED_NA;
+ } else if (c.behaviour & c.behaviour_mask & H_CPU_BEHAV_FLUSH_COUNT_CACHE) { <---
+ return SPAPR_CAP_WORKAROUND;
+ } else if (c.character & c.character_mask & H_CPU_CHAR_CACHE_COUNT_DIS) {
         return SPAPR_CAP_FIXED_CCD;
     } else if (c.character & c.character_mask & H_CPU_CHAR_BCCTRL_SERIALISED) {
         return SPAPR_CAP_FIXED_IBS;

But I'll test the extra two scenarios anyway.

Revision history for this message
Fabiano Rosas (farosas) wrote :

Here is test #9 (#8 is the same as #4 from my previous tests. And not of much help since Disco-updates QEMU (v=1:3.1+dfsg-2ubuntu3.3) does not have cap-ibs=workaround):

*** 9- Bionic-proposed kernel + Disco-proposed QEMU
 $ uname -r; qemu-system-ppc64 --version | head -n 1
 4.15.0-60-generic
 QEMU emulator version 3.1.0 (Debian 1:3.1+dfsg-2ubuntu3.4)

 $ dmesg | grep count-cache
 [ 0.000000] count-cache-flush: hardware assisted flush sequence enabled

 $ qemu-system-ppc64 -machine pseries,? 2>&1 | grep "\|ibs\|ccf"
 cap-ibs=string (Indirect Branch Speculation (broken, workaround, fixed-ibs,fixed-ccd, fixed-na))
 cap-ccf-assist=bool (Count Cache Flush Assist via HW Instruction)

 - cap-ibs=broken
 $ dmesg | grep count-cache
 [ 0.000000] count-cache-flush: software flush disabled.

 - cap-ibs=workaround
 $ dmesg | grep count-cache
 [ 0.000000] count-cache-flush: full software flush sequence enabled.

 - cap-ibs=fixed-ccd
 qemu-system-ppc64: Requested safe indirect branch capability level not supported by kvm, try cap-ibs=workaround

 - cap-ibs=fixed-ibs
 qemu-system-ppc64: Requested safe indirect branch capability level not supported by kvm, try cap-ibs=workaround

 - cap-ccf-assist=off
 $ dmesg | grep count-cache
 [ 0.000000] count-cache-flush: software flush disabled.

 - cap-ccf-assist=on
 $ dmesg | grep count-cache
 [ 0.000000] count-cache-flush: software flush disabled.

 - cap-ibs=workaround,cap-ccf-assist=on
 $ dmesg | grep count-cache
 [ 0.000000] count-cache-flush: hardware assisted flush sequence enabled

 - cap-ibs=workaround,cap-ccf-assist=off
 $ dmesg | grep count-cache
 [ 0.000000] count-cache-flush: full software flush sequence enabled.

So my interpretation of the results is that the Disco kernel is indeed to blame for cap-ibs=workaround not working with QEMU 1:3.1+dfsg-2ubuntu3.4 and a DD 2.3 machine.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks a lot Fabiano!

So I summarize:
- #7 is in no way a degradation to #4:
  - all cap-ibs= modes are failing on that before and after
  - that means the new qemu didn't break anything in that regard
- #9 confirms that as soon as we have a fixed kernel under that new disco-qemu it will work for cap-ibs=workaround as well as cap-ccf-assist=off/on.

And IMHO that means we have confirmed that:
a) the new fix in qemu works
b) the new fix in qemu does not degrade it if used on the current kernel
c) we need the kernel change to eventually fully work (well we have known that)

With that I think we can declare qemu in disco verified and let it release.
And the upcoming kernel update will resolve ibs/ccf to be really usable in Disco.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

After discussing this with the Team I really think it is ok to release this.
As stated before we confirmed:
- that on a good kernel the fix works
- the fix doesn't break features if not running on the new kernel
- the fix is confirmed to get in the kernel soon (this kernel cycle)

In addition releasing this now gives us the benefit of reaching earlier CloudArchive based on Disco which on the Bionic kernel will work right away.

People can always run with a newer/older kernel, so in this case just as with other SRUs where we say confirmed by install and "configuration" here the "configuration" for now in Disco is to provide a kernel with the change applied.

Therefore I'm now marking it verified in Disco.

Thanks everyone for all your involvement and looking forward to the kernel change verified and then landing at probably the end of this month.

tags: added: verification-done verification-done-disco
removed: verification-failed-disco verification-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu7.18

---------------
qemu (1:2.11+dfsg-1ubuntu7.18) bionic; urgency=medium

  * d/p/ubuntu/lp-1832622-*: count cache flush Spectre v2 mitigation for ppc64
    (LP: #1832622)
  * d/p/ubuntu/lp-1840745-*: add amd ssbd / no-ssbd features (LP: #1840745)
  * d/p/ubuntu/lp-1836154-*: add HW CPU model for newer s390x machines
    (LP: #1836154)

 -- Christian Ehrhardt <email address hidden> Thu, 13 Jun 2019 08:08:33 +0200

Changed in qemu (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:3.1+dfsg-2ubuntu3.4

---------------
qemu (1:3.1+dfsg-2ubuntu3.4) disco; urgency=medium

  * d/p/ubuntu/lp-1832622-*: count cache flush Spectre v2 mitigation for ppc64
    (LP: #1832622)
  * d/p/ubuntu/lp-1836154-*: add HW CPU model for newer s390x machines
    (LP: #1836154)

 -- Christian Ehrhardt <email address hidden> Thu, 13 Jun 2019 08:40:55 +0200

Changed in qemu (Ubuntu Disco):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (126.4 KiB)

This bug was fixed in the package linux - 5.0.0-31.33

---------------
linux (5.0.0-31.33) disco; urgency=medium

  * disco/linux: 5.0.0-31.33 -proposed tracker (LP: #1846026)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * /proc/self/maps paths missing on live session (was vlc won't start; eoan
    19.10 & bionic 18.04 ubuntu/lubuntu/kubuntu/xubuntu/ubuntu-mate dailies)
    (LP: #1842382)
    - SAUCE: Revert "UBUNTU: SAUCE: shiftfs: enable overlayfs on shiftfs"

linux (5.0.0-30.32) disco; urgency=medium

  * disco/linux: 5.0.0-30.32 -proposed tracker (LP: #1844362)

  * Disco update: upstream stable patchset 2019-08-20 (LP: #1840846)
    - Revert "e1000e: fix cyclic resets at link up with active tx"
    - e1000e: start network tx queue only when link is up
    - Input: synaptics - enable SMBUS on T480 thinkpad trackpad
    - nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header
    - drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
    - firmware: improve LSM/IMA security behaviour
    - irqchip/gic-v3-its: Fix command queue pointer comparison bug
    - clk: ti: clkctrl: Fix returning uninitialized data
    - efi/bgrt: Drop BGRT status field reserved bits check
    - perf/core: Fix perf_sample_regs_user() mm check
    - ARM: dts: gemini Fix up DNS-313 compatible string
    - ARM: omap2: remove incorrect __init annotation
    - afs: Fix uninitialised spinlock afs_volume::cb_break_lock
    - x86/apic: Fix integer overflow on 10 bit left shift of cpu_khz
    - be2net: fix link failure after ethtool offline test
    - ppp: mppe: Add softdep to arc4
    - sis900: fix TX completion
    - ARM: dts: imx6ul: fix PWM[1-4] interrupts
    - pinctrl: mcp23s08: Fix add_data and irqchip_add_nested call order
    - dm table: don't copy from a NULL pointer in realloc_argv()
    - dm verity: use message limit for data block corruption message
    - x86/boot/64: Fix crash if kernel image crosses page table boundary
    - x86/boot/64: Add missing fixup_pointer() for next_early_pgt access
    - HID: chicony: add another quirk for PixArt mouse
    - pinctrl: mediatek: Ignore interrupts that are wake only during resume
    - cpu/hotplug: Fix out-of-bounds read when setting fail state
    - pinctrl: mediatek: Update cur_mask in mask/mask ops
    - linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL
    - genirq: Delay deactivation in free_irq()
    - genirq: Fix misleading synchronize_irq() documentation
    - genirq: Add optional hardware synchronization for shutdown
    - x86/ioapic: Implement irq_get_irqchip_state() callback
    - x86/irq: Handle spurious interrupt after shutdown gracefully
    - x86/irq: Seperate unused system vectors from spurious entry again
    - ARC: hide unused function unw_hdr_alloc
    - s390: fix stfle zero padding
    - s390/qdio: (re-)initialize tiqdio list entries
    - s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
    - crypto: talitos - move struct talitos_edesc into talitos.h
    - crypto: talitos - fix hash on SEC1.
    - crypto/NX: Set receive window credits to max number of CRBs in RxFIFO
    - drm/udl: introduce a macro to convert dev t...

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.