Bionic QEMU with Bionic Kernel hangs in AMD FX-8350 with cpu-host as passthrough

Bug #1834522 reported by Rafael David Tinoco on 2019-06-27
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Bionic
Medium
Rafael David Tinoco

Bug Description

[Impact]

 * Nested AMD KVM guest does not work in AMD CPUs when using host-passthrough as cpu-mode.
 * QEMU does not start, hanging before the VM initialization.

[Test Case]

 * Bionic KVM GUEST tries to use nested KVM in AMD CPU
 * to use the following XML file: https://paste.ubuntu.com/p/BSyFY7ksR5/
 * to have AMD FX(tm)-8350 Eight-Core Processor CPU or similar
 * to use Xenial qemu on top of a HWE kernel -> works

[Regression Potential]

 * KVM SVM could be affected but patch is from upstream, fixes the specific issue and has been tested by me in my currently developing workstation (heavy usage).

[Other Info]

 * Patches have been sent to kernel team mailing list.

CVE References

Changed in linux (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in linux (Ubuntu Bionic):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in linux (Ubuntu):
importance: Medium → Undecided
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
status: Confirmed → Fix Released
importance: Undecided → Medium
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
Download full text (3.7 KiB)

# BISECT LOG

git bisect start
# bad: [0adb32858b0bddf4ada5f364a84ed60b196dbcda] Linux 4.16
git bisect bad 0adb32858b0bddf4ada5f364a84ed60b196dbcda
# good: [d8a5b80568a9cb66810e75b182018e9edb68e8ff] Linux 4.15
git bisect good d8a5b80568a9cb66810e75b182018e9edb68e8ff
# good: [c14376de3a1befa70d9811ca2872d47367b48767] printk: Wake klogd when passing console_lock owner
git bisect good c14376de3a1befa70d9811ca2872d47367b48767
# good: [2246edfaf88dc368e8671b04afd54412625df60a] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
git bisect good 2246edfaf88dc368e8671b04afd54412625df60a
# good: [dfe8db22372873d205c78a9fd5370b1b088a2b87] Merge tag 'drm-misc-fixes-2018-02-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
git bisect good dfe8db22372873d205c78a9fd5370b1b088a2b87
# bad: [4665c6b04651e96c1e2eb9129a30d6055040ff73] Merge tag 'linux-can-fixes-for-4.16-20180312' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
git bisect bad 4665c6b04651e96c1e2eb9129a30d6055040ff73
# bad: [3499de32fa6b608ba646380ac3838d30a2558ead] Merge tag 'linux-kselftest-4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
git bisect bad 3499de32fa6b608ba646380ac3838d30a2558ead
# good: [65738c6b461a8bb0b056e024299738f7cc9a28b7] Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
git bisect good 65738c6b461a8bb0b056e024299738f7cc9a28b7
# good: [c23a75759191e84f4ba15b85ea4f97bd544b5362] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good c23a75759191e84f4ba15b85ea4f97bd544b5362
# bad: [d4858aaf6bd8a90e2dacc0dfec2077e334dcedbf] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect bad d4858aaf6bd8a90e2dacc0dfec2077e334dcedbf
# good: [0eb578009a1d530a11846d7c4733a5db04730884] tools/kvm_stat: use a more pythonic way to iterate over dictionaries
git bisect good 0eb578009a1d530a11846d7c4733a5db04730884
# good: [fe2a3027e74e40a3ece3a4c1e4e51403090a907a] KVM: x86: fix backward migration with async_PF
git bisect good fe2a3027e74e40a3ece3a4c1e4e51403090a907a
# bad: [7607b7174405aec7441ff6c970833c463114040a] KVM: SVM: install RSM intercept
git bisect bad 7607b7174405aec7441ff6c970833c463114040a
# good: [e5699f56bc91a286f006b0728085e0b4e8f5749b] crypto: ccp: Fix sparse, use plain integer as NULL pointer
git bisect good e5699f56bc91a286f006b0728085e0b4e8f5749b
# good: [3e233385ef4a217a2812115ed84d4be36eb16817] KVM: SVM: no need to call access_ok() in LAUNCH_MEASURE command
git bisect good 3e233385ef4a217a2812115ed84d4be36eb16817
# first bad commit: [7607b7174405aec7441ff6c970833c463114040a] KVM: SVM: install RSM intercept

# NOTE

I was doing "invert" bisection.. so the bad commit is actually what seems to have fixed the issue:

commit 7607b7174405aec7441ff6c970833c463114040a
Author: Brijesh Singh <email address hidden>
Date: Mon Feb 19 10:14:44 2018 -0600

    KVM: SVM: install RSM intercept

    RSM instruction is used by the SMM handler to return from SMM mode.
    Currently, rsm causes a #UD - which results in instruction fetch, decode,
    and emulate. By installing the RSM inte...

Read more...

Changed in linux (Ubuntu Bionic):
status: Confirmed → In Progress

I've included patch above into Bionic tree:

commit 9bff5f095923aab04411cf4e9135b975b70e3ead (tag: Ubuntu-4.15.0-58.64, origin/master, origin/HEAD)
Author: Stefan Bader <email address hidden>
Date: Tue Aug 6 10:45:37 2019

    UBUNTU: Ubuntu-4.15.0-58.64

And it, indeed, fixed the issue.

Also needed:

commit 35be0aded76b54a24dc8aa678a71bca22273e8d8
Author: Sean Christopherson <email address hidden>
Date: Thu Aug 23 17:56:47 2018

    KVM: x86: SVM: Set EMULTYPE_NO_REEXECUTE for RSM emulation

    Re-execution after an emulation decode failure is only intended to
    handle a case where two or vCPUs race to write a shadowed page, i.e.
    we should never re-execute an instruction as part of RSM emulation.

    Add a new helper, kvm_emulate_instruction_from_buffer(), to support
    emulating from a pre-defined buffer. This eliminates the last direct
    call to x86_emulate_instruction() outside of kvm_mmu_page_fault(),
    which means x86_emulate_instruction() can be unexported in a future
    patch.

    Fixes: 7607b7174405 ("KVM: SVM: install RSM intercept")
    Cc: Brijesh Singh <email address hidden>
    Signed-off-by: Sean Christopherson <email address hidden>
    Cc: <email address hidden>
    Signed-off-by: Radim Krčmář <email address hidden>

Waiting kernel team sponsorship. Thx.

BTW - Thanks for the work on this Rafael!

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic

It solves the problem and I'm sorry for taking so long to verify...

Commit:

Author: Sean Christopherson <email address hidden>
Date: Thu Aug 29 14:06:58 2019

    KVM: x86: SVM: Set EMULTYPE_NO_REEXECUTE for RSM emulation

    BugLink: https://bugs.launchpad.net/bugs/1834522

Indeed solves the problem.

Thank you!

tags: added: verification-done verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (20.2 KiB)

This bug was fixed in the package linux - 4.15.0-65.74

---------------
linux (4.15.0-65.74) bionic; urgency=medium

  * bionic/linux: 4.15.0-65.74 -proposed tracker (LP: #1844403)

  * arm64: large modules fail to load (LP: #1841109)
    - arm64/kernel: kaslr: reduce module randomization range to 4 GB
    - arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
    - arm64: fix undefined reference to 'printk'
    - arm64/kernel: rename module_emit_adrp_veneer->module_emit_veneer_for_adrp
    - [config] Remove CONFIG_ARM64_MODULE_CMODEL_LARGE

  * CVE-2018-20976
    - xfs: clear sb->s_fs_info on mount failure

  * br_netfilter: namespace sysctl operations (LP: #1836910)
    - net: bridge: add bitfield for options and convert vlan opts
    - net: bridge: convert nf call options to bits
    - netfilter: bridge: port sysctls to use brnf_net
    - netfilter: bridge: namespace bridge netfilter sysctls
    - netfilter: bridge: prevent UAF in brnf_exit_net()

  * tuntap: correctly set SOCKWQ_ASYNC_NOSPACE (LP: #1830756)
    - tuntap: correctly set SOCKWQ_ASYNC_NOSPACE

  * Bionic update: upstream stable patchset 2019-08-30 (LP: #1842114)
    - HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT
    - MIPS: kernel: only use i8253 clocksource with periodic clockevent
    - mips: fix cacheinfo
    - netfilter: ebtables: fix a memory leak bug in compat
    - ASoC: dapm: Fix handling of custom_stop_condition on DAPM graph walks
    - bonding: Force slave speed check after link state recovery for 802.3ad
    - can: dev: call netif_carrier_off() in register_candev()
    - ASoC: Fail card instantiation if DAI format setup fails
    - st21nfca_connectivity_event_received: null check the allocation
    - st_nci_hci_connectivity_event_received: null check the allocation
    - ASoC: ti: davinci-mcasp: Correct slot_width posed constraint
    - net: usb: qmi_wwan: Add the BroadMobi BM818 card
    - qed: RDMA - Fix the hw_ver returned in device attributes
    - isdn: mISDN: hfcsusb: Fix possible null-pointer dereferences in
      start_isoc_chain()
    - netfilter: ipset: Fix rename concurrency with listing
    - isdn: hfcsusb: Fix mISDN driver crash caused by transfer buffer on the stack
    - perf bench numa: Fix cpu0 binding
    - can: sja1000: force the string buffer NULL-terminated
    - can: peak_usb: force the string buffer NULL-terminated
    - net/ethernet/qlogic/qed: force the string buffer NULL-terminated
    - NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
    - HID: input: fix a4tech horizontal wheel custom usage
    - SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
    - net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
    - net: hisilicon: make hip04_tx_reclaim non-reentrant
    - net: hisilicon: fix hip04-xmit never return TX_BUSY
    - net: hisilicon: Fix dma_map_single failed on arm64
    - libata: have ata_scsi_rw_xlat() fail invalid passthrough requests
    - libata: add SG safety checks in SFF pio transfers
    - x86/lib/cpu: Address missing prototypes warning
    - drm/vmwgfx: fix memory leak when too many retries have occurred
    - perf ftrace: Fix failure to set cpuma...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers