nested virtualization w/first level trusty guests has odd MDS behavior

Bug #1829555 reported by Steve Beattie
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
qemu (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

When nested kvm virtualization is used (with host-passthrough), if the first level guest is a trusty vm, odd behavior is seen in the second level guest:

  host os:
  disco/5.0.0-15.16-generic/qemu 1:3.1+dfsg-2ubuntu3.1
  contents of /sys/devices/system/cpu/vulnerabilities/mds:
     Mitigation: Clear CPU buffers; SMT vulnerable

  1st level vm:
  trusty/4.4.0-148.174~14.04.1-generic/qemu 2.0.0+dfsg-2ubuntu1.46
  contents of /sys/devices/system/cpu/vulnerabilities/mds:
    Mitigation: Clear CPU buffers; SMT Host state unknown

  2nd level vm:
  bionic/4.15.0-50.54-generic
  contents of /sys/devices/system/cpu/vulnerabilities/mds:
    Not affected

This behavior is not seen when the first level guest is a xenial or bionic vm (same bare metal hardware):

  1st level vm:
  bionic/4.15.0-50.54-generic/qemu 1:2.11+dfsg-1ubuntu7.13
  contents of /sys/devices/system/cpu/vulnerabilities/mds:
    Mitigation: Clear CPU buffers; SMT Host state unknown

  2nd level vm:
  bionic/4.15.0-50.54-generic
  contents of /sys/devices/system/cpu/vulnerabilities/mds:
    Mitigation: Clear CPU buffers; SMT Host state unknown

and:

  1st level vm:
  xenial/4.4.0-148.174-generic/qemu 1:2.5+dfsg-5ubuntu10.39
  contents of /sys/devices/system/cpu/vulnerabilities/mds:
    Mitigation: Clear CPU buffers; SMT Host state unknown

  2nd level vm:
  bionic/4.15.0-50.54-generic
  contents of /sys/devices/system/cpu/vulnerabilities/mds:
    Mitigation: Clear CPU buffers; SMT Host state unknown

It's not clear whether this is an issue with linux/kvm or qemu in trusty.
---
ApportVersion: 2.14.1-0ubuntu3.29
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 2239 F.... pulseaudio
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=4fa9460d-7ed4-49db-8e22-86a5107d0062
InstallationDate: Installed on 2019-02-14 (92 days ago)
InstallationMedia: Ubuntu 14.04.5 LTS "Trusty Tahr" - Release amd64 (20160803)
Lsusb:
 Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
Package: qemu 2.0.0+dfsg-2ubuntu1.46
PackageArchitecture: amd64
ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 qxldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-148-generic root=UUID=9a35107e-83fa-4010-81e1-235a4ea14fe6 ro quiet splash vt.handoff=7
ProcVersionSignature: User Name 4.4.0-148.174~14.04.1-generic 4.4.177
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-148-generic N/A
 linux-backports-modules-4.4.0-148-generic N/A
 linux-firmware 1.127.24
RfKill:

Tags: trusty trusty
Uname: Linux 4.4.0-148-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip libvirtd lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.12.0-1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-bionic
dmi.modalias: dmi:bvnSeaBIOS:bvr1.12.0-1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-bionic:cvnQEMU:ct1:cvrpc-i440fx-bionic:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-bionic
dmi.sys.vendor: QEMU

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1829555

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Steve Beattie (sbeattie) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Steve Beattie (sbeattie) wrote : BootDmesg.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : CRDA.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : Dependencies.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : IwConfig.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : Lspci.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : ProcModules.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : PulseList.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : UdevDb.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : UdevLog.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote : WifiSyslog.txt

apport information

Revision history for this message
Steve Beattie (sbeattie) wrote :

Informatio above collected from the trusty 1st level guest.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Steve Beattie (sbeattie) wrote :

Oh, I just realized both the trusty and xenial 1st level vms are both using the 4.4 kernel, so this is likely an issue with trusty's qemu.

Revision history for this message
Steve Beattie (sbeattie) wrote :

For the record, attempting to boot a bionic guest in a trusty vm running the current trusty 3.13 kernel 3.13.0-170.220-generic results in the bionic kernel oopsing repeatedly.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

TL;DR only occurs with HWE kernel - odd

Results of:
$ cat /sys/devices/system/cpu/vulnerabilities/mds

Host Bionic: 4.18.0-20-generic / 2.11+dfsg-1ubuntu7.13

Guest-lvl1: 3.13.0-170-generic / 2.0.0+dfsg-2ubuntu1.46
  Vulnerable: Clear CPU buffers attempted, SMT Host state unknown
Guest-lvl2: 3.13.0-170-generic
  Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown

It is "ok" that without passthrough it hits "Clear CPU buffers attempted"

Using passthrough:
Guest-lvl1: 3.13.0-170-generic / 2.0.0+dfsg-2ubuntu1.46 + passthrough
  Mitigation: Clear CPU buffers; SMT Host state unknown
Guest-lvl2: 3.13.0-170-generic + passthrough
  Mitigation: Clear CPU buffers; SMT Host state unknown

So this does not trigger witrh 3.13 kernels for me.

Upgrading to 4.4 in both trusty levels now.

Guest-lvl1: 4.4.0-148-generic / 2.0.0+dfsg-2ubuntu1.46 + passthrough
  Mitigation: Clear CPU buffers; SMT Host state unknown
Guest-lvl1: 4.4.0-148-generic
$ cat /sys/devices/system/cpu/vulnerabilities/mds
Not affected

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

E.g. in /proc/cpuinfo the whole section is missing:
  bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds
^^ this is not there when "Not affected" is reported

Separating kernels:
3.13 on both: Works
4.4 on lvl1, 3.13 on lvl2: Fail
3.13 on lvl1, 4.4 on lvl2: Works

So it is only with the 4.4 kernel on lvl1.
I'll take a look to combine that with some qemu versions tomorrow.

Revision history for this message
Steve Beattie (sbeattie) wrote :

Interesting, I see different behaviors in my setup:

4.4 on lvl1, 4.15 lvl2, trusty lvl1 qemu:
/proc/cpuinfo in lvl2 contains:
  bugs : cpu_meltdown spectre_v1 spectre_v2 l1tf
(note missing mds, hence "Not Affected")

4.4 on lvl1, 4.15 lvl2, xenial lvl1 qemu:
/proc/cpuinfo in lvl2 contains:
  bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note: support on nested is and always was "best effort" as it is famously known to work great until it doesn't. Recently upstreams stance on this changed and in the last few versions nested x86 got some love (due to some big players using it now), but I'm more looking to 20.04 than anything before it to call it good. So we might after this analysis call it "known but won't fix" (depending on complexity). IIRC the rule was always "feel free to use and it will be great, but not for production as there might be dragons"

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Further I realized that I can trigger this with T3.13/Q2.5/B4.15:

trusty-lvl1-mitaka kernel: [ 931.946357] kvm [2356]: vcpu0 unhandled rdmsr: 0x140
trusty-lvl1-mitaka kernel: [ 932.236914] kvm [2356]: vcpu0 unhandled rdmsr: 0x1c9
trusty-lvl1-mitaka kernel: [ 932.238337] kvm [2356]: vcpu0 unhandled rdmsr: 0x1a6
trusty-lvl1-mitaka kernel: [ 932.239622] kvm [2356]: vcpu0 unhandled rdmsr: 0x1a7
trusty-lvl1-mitaka kernel: [ 932.240956] kvm [2356]: vcpu0 unhandled rdmsr: 0x3f6
trusty-lvl1-mitaka kernel: [ 932.242179] kvm [2356]: vcpu0 unhandled rdmsr: 0x3f7
trusty-lvl1-mitaka kernel: [ 935.038854] kvm [2356]: vcpu0 unhandled rdmsr: 0x64e
trusty-lvl1-mitaka kernel: [ 935.040086] kvm [2356]: vcpu0 unhandled rdmsr: 0x34
Which in the guest is a crash
[ 0.000000] XSAVE consistency problem, dumping leaves
[ 0.000000] WARNING: CPU: 0 PID: 0 at /build/linux-3btXxq/linux-4.15.0/arch/x86/kernel/fpu/xstate.c:614 do_extra_xstate_size_checks+0x303/0x3e6
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.15.0-50-generic #54-Ubuntu
[ 0.000000] RIP: 0010:do_extra_xstate_size_checks+0x303/0x3e6
[ 0.000000] RSP: 0000:ffffffffa6003d50 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
[ 0.000000] RAX: 0000000000000000 RBX: 000000000000000a RCX: ffffffffa60627a8
[ 0.000000] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000047
[ 0.000000] RBP: ffffffffa6003d90 R08: 657661656c20676e R09: 0000000000000007
[ 0.000000] R10: ffffffffa625a600 R11: 0000000000000000 R12: 0000000000000100
[ 0.000000] R13: 0000000000000340 R14: ffffffffa6003d54 R15: ffffffffa6003d50
[ 0.000000] FS: 0000000000000000(0000) GS:ffffffffa627f000(0000) knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.000000] CR2: ffff88800008a000 CR3: 0000000013d22000 CR4: 00000000000406a0
[ 0.000000] Call Trace:
[ 0.000000] ? init_scattered_cpuid_features+0x86/0x110
[ 0.000000] fpu__init_system_xstate+0x183/0x484
[ 0.000000] fpu__init_system+0x213/0x265
[ 0.000000] ? early_init_intel+0x270/0x450
[ 0.000000] early_cpu_init+0x269/0x270
[ 0.000000] ? 0xffffffffa4c00000
[ 0.000000] setup_arch+0xcb/0xc82
[ 0.000000] ? printk+0x52/0x6e
[ 0.000000] start_kernel+0x6d/0x4fd
[ 0.000000] x86_64_start_reservations+0x24/0x26
[ 0.000000] x86_64_start_kernel+0x74/0x77
[ 0.000000] secondary_startup_64+0xa5/0xb0

I can avoid that particular error with a modification like:
  <cpu mode='host-passthrough'>
    <feature policy='disable' name='xsave'/>
  </cpu>

But then another issue shows up ... (and so on)

I eventually got things running (for the tests) with
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <feature policy='require' name='md-clear'/>
  </cpu>

That might be an issue with xsave and other features in old nested, but this further underlines my point on nested being nice but unreliable - at least "in the past".

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

oO :-/, that crash also happens to the older qemu on trusty as soon as you have more than 1 lvl1 or lvl2 guest it seems. Anyway - same workaround to tets MDS applies

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Of course I spoke too soon, on T3.13/Q2.0/B4.15 I now hit an FPU issue.
That builds up to a kernel stack crash (recursive)

[ 2.394255] Bad FPU state detected at fpu__clear+0x6b/0xd0, reinitializing FPU registers.
[...]
BUG: stack guard page was hit at (ptrval) (stack is (ptrval).. (ptrval))

That is again elated to MSR handling.
So disabling a few but keeping MDS as needed for this test helps:
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>kvm64</model>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='md-clear'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='pcid'/>
  </cpu>

You have not really tested with 3.13 at the LVL1 as far as I read your updates.
I'm expecting that even 3.13 -> 4.4 already has quite some nested fixes that made this "better but not perfect" - so you haven't seen it.

Have I already said that nested KVM on x86 can be unreliable?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm with that workaround for 4.15 I get the md bug affects in the guest, but not the md-clear feature.

$ uname -r; cat /sys/devices/system/cpu/vulnerabilities/mds; cat /proc/cpuinfo | grep -e ^bug -e ^flags | grep md
4.15.0-50-generic
Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds

Despite the commandline adding the feature
  -cpu kvm64,+ssbd,+md-clear,+pdpe1gb,+pcid

Maybe going down to "kvm64" was too much.
I was looking for a middle ground for a while, but it seems in this combination "T3.13/Q2.0/B4.15" I can only get it either hang or being without the md-clear.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I re-checked all that you and I found, lets write a List with all that we know if there are patterns.

Host (should not matter, but be rather new) - in my case B4.18 Q2.11

For new qemu I'm using Mitaka.
In this case being from https://launchpad.net/~ubuntu-cloud-archive/+archive/ubuntu/mitaka-staging
to get those libvirt/qemu with the MDS fixes which are still waiting to be released.

The check is like:
$ uname -r; cat /sys/devices/system/cpu/vulnerabilities/mds; cat /proc/cpuinfo | grep -e ^bug -e ^flags | grep md

An example result would look like
a) 4.4.0-148-generic
b) Mitigation: Clear CPU buffers; SMT Host state unknown
c) flags : [...] md_clear
d) bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds

a) to verify the kernel level is as expected
b) to show what the kernel thinks about the Mitigation status
c) check the md_clear in cpu flags
d) show bugs the cpu is affected (not present on 3.13)

Only if all those above are ok we call it good, otherwise add a comment what fails

Test list:
lvl 1 kernel T3.13 / T4.4
lvl 2 kernel T3.13 / T4.4 / B4.15
Qemu T2.0 / M2.5

T LVL1 LVL2 Result
01 T3.13 / Q2.0 T3.13 ok
02 T3.13 / Q2.0 T4.4 ok
03 T3.13 / Q2.0 B4.15 full passthrough crashes, md-clear feature not passed
04 T3.13 / Q2.5 T3.13 ok
05 T3.13 / Q2.5 T4.4 ok
06 T3.13 / Q2.5 B4.15 ok
07 T4.4 / Q2.0 T3.13 shows not-affected, md-clear available
08 T4.4 / Q2.0 T4.4 shows not-affected, md-clear available
09 T4.4 / Q2.0 B4.15 shows not-affected, md-clear available
10 T4.4 / Q2.5 T3.13 ok
11 T4.4 / Q2.5 T4.4 ok
12 T4.4 / Q2.5 B4.15 ok

Of these testcases we have two fields of errors.
#03 : base Trusty with a rather new guest having issues
      Fix to that seems to be in the kernel as 3.13 -> 4.4 fixes it

#07-09: The qemu 2.0 in trusty seems to have issues if used with the HWE 4.4 kernel
        The fix to that seems to be in a newer qemu as 2.0 -> 2.5 fixes it

Changed in qemu (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

My initial testcase will be 07 "T4.4 / Q2.0 T3.13"

Bisect is rather complex as we'd need the md-clear patches on top at each step.
Sorry that it took a while.

Adaptions:
- non ubuntu machine type (using 2.0 to work on all builds)
- remove VNC in xml as we built a reduced feature qemu
- place built qemu in systems path

Attaching the bisect scripts and a set of patches (rebased spectre and MDS as needed) used in this case.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Bisect result is

git bisect start
# good: [a9e8aeb3755bccb7b51174adcf4a3fc427e0d147] Update version for v2.0.0 release
git bisect good a9e8aeb3755bccb7b51174adcf4a3fc427e0d147
# bad: [a8c40fa2d667e585382080db36ac44e216b37a1c] Update version for v2.5.0 release
git bisect bad a8c40fa2d667e585382080db36ac44e216b37a1c
# bad: [73706bd1275bef2e7b3962d1be18a20cb8df7f66] virtio-balloon: use standard headers
git bisect bad 73706bd1275bef2e7b3962d1be18a20cb8df7f66
# bad: [fe08275db9b88ecf3a30c7540b894c25aec150c2] vfio: Enable NVIDIA 88000 region quirk regardless of VGA
git bisect bad fe08275db9b88ecf3a30c7540b894c25aec150c2
# good: [3016dca06cba0ef9511f1c81c7e73bfc805fb254] PPC: e500: implement PCI INTx routing
git bisect good 3016dca06cba0ef9511f1c81c7e73bfc805fb254
# bad: [1399c60d70a261acbeb65f614a00eab2dbf4237b] virtio-net: use virtio wrappers to access headers
git bisect bad 1399c60d70a261acbeb65f614a00eab2dbf4237b
# good: [0a99aae5fab5ed260aab96049c274b0334eb4085] Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
git bisect good 0a99aae5fab5ed260aab96049c274b0334eb4085
# good: [857aee337c22dc6c56304146e938efacaf587e1c] target-i386: Simplify reporting of unavailable features
git bisect good 857aee337c22dc6c56304146e938efacaf587e1c
# bad: [ac8076ac8638428e2a96d5f6c7e80f2014f9e379] Merge remote-tracking branch 'remotes/qmp-unstable/queue/qmp' into staging
git bisect bad ac8076ac8638428e2a96d5f6c7e80f2014f9e379
# bad: [6026db4501f773caaa2895cde7f93022960c7169] spapr: Define a 2.1 pseries machine
git bisect bad 6026db4501f773caaa2895cde7f93022960c7169
# bad: [8589744aaf07b62e7be4233727c45b8866d27d43] Merge remote-tracking branch 'remotes/afaerber/tags/qom-cpu-for-2.1' into staging
git bisect bad 8589744aaf07b62e7be4233727c45b8866d27d43
# good: [fefb41bf3485a1c9a44c15e382d28035c6fb5f4b] target-i386: Support check/enforce flags in TCG mode, too
git bisect good fefb41bf3485a1c9a44c15e382d28035c6fb5f4b
# bad: [303752a9068bfe84b9b05f1cd5ad5ff65b7f3ea6] target-i386: Support "invariant tsc" flag
git bisect bad 303752a9068bfe84b9b05f1cd5ad5ff65b7f3ea6
# bad: [120eee7d1fdb2eba15766cfff7b9bcdc902690b4] target-i386: Set migratable=yes by default on "host" CPU mooel
git bisect bad 120eee7d1fdb2eba15766cfff7b9bcdc902690b4
# good: [84f1b92f974fbb19967c5f10ac6c3f4a04fb86dd] target-i386: Add "migratable" property to "host" CPU model
git bisect good 84f1b92f974fbb19967c5f10ac6c3f4a04fb86dd
# first bad commit: [120eee7d1fdb2eba15766cfff7b9bcdc902690b4] target-i386: Set migratable=yes by default on "host" CPU mooel

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

That finds us the fix as:

commit 120eee7d1fdb2eba15766cfff7b9bcdc902690b4
Author: Eduardo Habkost <email address hidden>
Date: Tue Jun 17 17:31:53 2014 -0300

    target-i386: Set migratable=yes by default on "host" CPU mooel

    Having only migratable flags reported by default on the "host" CPU model
    is safer for the following reasons:

     * Existing users may expect "-cpu host" to be migration-safe, if they
       take care of always using compatible host CPUs, host kernels, and
       QEMU versions.
     * Users who don't care aboug migration and want to enable all features
       supported by the host kernel can simply change their setup to use
       migratable=no.

    Without this change, people using "-cpu host" will stop being able to
    migrate, because now "invtsc" is getting enabled by default.

    We are not setting migratable=yes by default on all X86CPU subclasses,
    because users should be able to get non-migratable features enabled if
    they ask for them explicitly.

    Reviewed-by: Marcelo Tosatti <email address hidden>
    Signed-off-by: Eduardo Habkost <email address hidden>
    Signed-off-by: Andreas Färber <email address hidden>

Which is interesting, because that just is a flip in a default setting of qemu.
Maybe the interaction that qemu 2.5 and kernel 4.4 (remember it does not occur with 3.13 kernel) comes down to a user accessible switch?

If we are running as "host-passthrough" (which we do) this change means it does no more pass "all" flags. Instead only flags considered "migratable" are passed to the guest.

In qemu 2.0 this differentiation didn't even exist (the infrastructure for it was added in qemu 2.1).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Theory given what we know so far:
- only fails if LVL1 is at 4.4
- not failing if LVL1 is at 3.13
- 4.4 might have more CPU features
- qemu 2.0 when using host-model is passing ALL features
- qemu 2.5 works, but we now know it filters some flags that 2.0 doesn't
=> one of these extra flags disturbs the guests bug detection

Check extra flags in LVL1 between 3.13 and 4.4

3.13 -> 4.4 has in addition (Host):
> clflushopt
> kaiser
> mpx
> tsc_known_freq
> xgetbv1
> xsavec
< eagerfpu

Comparing LVL2 between case 07 and 10
< arch_capabilities
> arat

So interestingly, none of the flags that are added on 4.4 on LVL1 show up in the guest.
But one more that also seems interesting is showing up "arch_capabilities".

I haven't found a good way to control arch_capabilities yet.
It is part of the Spectre backports actually like [1] - I haven't seen it like that in the code that you added to qemu 2.0 but it is at least related.

So the LVL1 4.4 has some empty flags/features that the older qemu 2.0 does not filter and hence the guest gets an broken MSR for MSR_IA32_ARCH_CAPABILITIES.
That is what breaks the guests.

Given that:
- nested (especially in these much older versions of KVM/Qemu) is not very well supported
- this issue seems to depend on other security fixes (in the 4.4 kernel)
- qemu 2.0 is out in ESM, and this is not a fix required for that

I'd call it confirmed but prio wishlist and probably, unless convinced won't work on it for now.

I hope the analysis helps if e.g. the security Team wants to take a look at all MSR_IA32_ARCH_CAPABILITIES related changes. One could e-g- actually read CPUID_7_0_EDX_ARCH_CAPABILITIES in the LVL2 guest that is broken. I'm rather sure it has malformed or incomplete content.

[1]: https://lwn.net/Articles/746119/

Changed in qemu (Ubuntu):
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.