Ubuntu-5.0.0-33.35 introduces KVM regression with old Intel CPUs and Linux guests

Bug #1851709 reported by Thomas Lamprecht on 2019-11-07
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Bionic
High
Thadeu Lima de Souza Cascardo
Disco
High
Thadeu Lima de Souza Cascardo

Bug Description

[Impact]
On CPUs with no EPT support, or when disabling kvm-intel ept support by use of ept=0 module parameter, users are not able to launch a linux VM.

[Test case]
# modprobe kvm-intel ept=0
# cat /sys/module/kvm_intel/parameters/ept
N
# qemu-system-x86_64 -enable-kvm -kernel /boot/vmlinuz-4.15.0-68-generic

Make sure you get console log at all. With the bug, there is not a single line of output.

[Regression potential]
The fix might cause some very specific use of virtualization to fail, but no pratical case is known.

===============================

Mostly the same info as on a related kernel.org bugzilla entr[0].

[0]: https://bugzilla.kernel.org/show_bug.cgi?id=205441

We got issues reported with old Intel CPUs and Linux guest run with QEMU/KVM after a recent kernel update which is based on Ubuntu-5.0.0-33.35.

I bisected this here, with following result:
git bisect log
# bad: [3b931173c97b0d73f80ea55b72bb2966a246167f] UBUNTU: Ubuntu-5.0.0-33.35
# good: [5d5a6b36e94909962297fae609bff487de3cc43a] UBUNTU: Ubuntu-5.0.0-30.32
git bisect start '3b931173c97b0d73f80ea55b72bb2966a246167f'
'5d5a6b36e94909962297fae609bff487de3cc43a'
# good: [7b4f844b33969ab166800f8936beef153fab736e] net/ibmvnic: free reset work
of removed device from queue
git bisect good 7b4f844b33969ab166800f8936beef153fab736e
# bad: [6c1fc88702a4f33886b44ce5b6f374893b95e369] arm64: tlb: Ensure we execute
an ISB following walk cache invalidation
git bisect bad 6c1fc88702a4f33886b44ce5b6f374893b95e369
# good: [e627a027b54eccc95f9e374d69aead7f1498877b] loop: Add LOOP_SET_DIRECT_IO
to compat ioctl
git bisect good e627a027b54eccc95f9e374d69aead7f1498877b
# good: [29919eff6333bc67ec580b454afdd8b49883df2f] libata/ahci: Drop PCS quirk
for Denverton and beyond
git bisect good 29919eff6333bc67ec580b454afdd8b49883df2f
# good: [cb44193f94af73928f8df049ffbb6b4a0be136ae] PM / devfreq: passive: fix
compiler warning
git bisect good cb44193f94af73928f8df049ffbb6b4a0be136ae
# good: [b1d479b27b26966aea931094b31864979d7f8102] scsi: implement .cleanup_rq
callback
git bisect good b1d479b27b26966aea931094b31864979d7f8102
# bad: [ec15813844b05d8cbd4352c65a20e57d16f9f936] media: sn9c20x: Add MSI
MS-1039 laptop to flip_dmi_table
git bisect bad ec15813844b05d8cbd4352c65a20e57d16f9f936
# good: [e83601f51a90d9739ced9ff42b6f202f8f802c72] parisc: Disable HP HSC-PCI
Cards to prevent kernel crash
git bisect good e83601f51a90d9739ced9ff42b6f202f8f802c72
# good: [6d393bdf3b3f4b629070329488d3c6a3e142602b] KVM: x86: set
ctxt->have_exception in x86_decode_insn()
git bisect good 6d393bdf3b3f4b629070329488d3c6a3e142602b
# bad: [208007519a7385a57b0c0a3c180142a521594876] KVM: x86: Manually calculate
reserved bits when loading PDPTRS
git bisect bad 208007519a7385a57b0c0a3c180142a521594876
# first bad commit: [208007519a7385a57b0c0a3c180142a521594876] KVM: x86:
Manually calculate reserved bits when loading PDPTRS

Which is:

   KVM: x86: Manually calculate reserved bits when loading PDPTRS

    BugLink: https://bugs.launchpad.net/bugs/1848367

    commit 16cfacc8085782dab8e365979356ce1ca87fd6cc upstream.

    Manually generate the PDPTR reserved bit mask when explicitly loading
    PDPTRs. The reserved bits that are being tracked by the MMU reflect the
    current paging mode, which is unlikely to be PAE paging in the vast
    majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation,
    __set_sregs(), etc... This can cause KVM to incorrectly signal a bad
    PDPTR, or more likely, miss a reserved bit check and subsequently fail
    a VM-Enter due to a bad VMCS.GUEST_PDPTR.

    Add a one off helper to generate the reserved bits instead of sharing
    code across the MMU's calculations and the PDPTR emulation. The PDPTR
    reserved bits are basically set in stone, and pushing a helper into
    the MMU's calculation adds unnecessary complexity without improving
    readability.

    Oppurtunistically fix/update the comment for load_pdptrs().

    Note, the buggy commit also introduced a deliberate functional change,
    "Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was
    effectively (and correctly) reverted by commit cd9ae5fe47df ("KVM: x86:
    Fix page-tables reserved bits"). A bit of SDM archaeology shows that
    the SDM from late 2008 had a bug (likely a copy+paste error) where it
    listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved
    for 2mb entries. I.e. the SDM contradicted itself, and bits 6:5 are and
    always have been reserved.

    Fixes: 20c466b56168d ("KVM: Use rsvd_bits_mask in load_pdptrs()")
    Cc: <email address hidden>
    Cc: Nadav Amit <email address hidden>
    Reported-by: Doug Reiland <email address hidden>
    Signed-off-by: Sean Christopherson <email address hidden>
    Reviewed-by: Peter Xu <email address hidden>
    Signed-off-by: Paolo Bonzini <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>
    Signed-off-by: Kamal Mostafa <email address hidden>
    Signed-off-by: Kleber Sacilotto de Souza <email address hidden>

This one is also included in the 4.19.81 (or more correctly, it's there since
v4.19.77) with commit 496cf984a60edb5534118a596613cc9971e406e8 [0] or
upstream commit 16cfacc8085782dab8e365979356ce1ca87fd6cc [1].

[0]:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/commit/?h=v4.19.82&id=496cf984a60edb5534118a596613cc9971e406e8
[1]: https://git.kernel.org/torvalds/c/16cfacc8085782dab8e365979356ce1ca87fd6cc

Funny thing is: I cannot reproduce this with a 5.3.7 (Eoan) kernel, which _also_
includes above commit. So possible another patch is missing in the backport,
did not find anything obvious though...

So summary for reproducer:
* dust of an host with old Intel CPU, e.g.: Intel Core2Duo CPU E8500 @3.16GHz
  (something else westmer, conroe or the like should work too, or if it's
released
   over 10 years ago.
* Install a Linux Distro or just boot the installer of that in a VM, I used
Debian 9,
  as our users had issues with that but *not* with an ubuntu 19.10 VM.
* see how it boot loops once a stable-kernel with above[0] backported
  is used on the host

CVE References

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1851709

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Thomas Lamprecht (t-lamprecht) wrote :

Thanks Mr. Kernel Bot but I really do not think that this bug misses logs :)

Changed in linux (Ubuntu):
status: Incomplete → Confirmed

https://<email address hidden>/ contains upstream discussion, with mention of a backported fix for 4.14 and 4.19..

https://lore<email address hidden>/ is the fix for 4.19

Changed in linux (Ubuntu Bionic):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu Disco):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Stefan Bader (smb) on 2019-11-12
Changed in linux (Ubuntu Disco):
importance: Undecided → High
description: updated
Changed in linux (Ubuntu Disco):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Stefan Bader (smb) on 2019-11-12
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed
Stefan Bader (smb) on 2019-11-12
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

Patch submitted to the list was tested on disco as well. It fixed the problem and no new regressions were found compared to previous tests.

Cascardo.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.0.0-36.39

---------------
linux (5.0.0-36.39) disco; urgency=medium

  * Ubuntu-5.0.0-33.35 introduces KVM regression with old Intel CPUs and Linux
    guests (LP: #1851709)
    - Revert "KVM: x86: Manually calculate reserved bits when loading PDPTRS"

  * Incomplete i915 fix for 64-bit x86 kernels (LP: #1852141) // CVE-2019-0155
    - SAUCE: drm/i915/cmdparser: Fix jump whitelist clearing

 -- Stefan Bader <email address hidden> Tue, 12 Nov 2019 10:33:14 +0100

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.15.0-70.79

---------------
linux (4.15.0-70.79) bionic; urgency=medium

  * Ubuntu-5.0.0-33.35 introduces KVM regression with old Intel CPUs and Linux
    guests (LP: #1851709)
    - Revert "KVM: x86: Manually calculate reserved bits when loading PDPTRS"

  * Incomplete i915 fix for 64-bit x86 kernels (LP: #1852141) // CVE-2019-0155
    - SAUCE: drm/i915/cmdparser: Fix jump whitelist clearing

 -- Stefan Bader <email address hidden> Tue, 12 Nov 2019 10:54:50 +0100

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers