Regression: KVM no longer supports Intel CPUs without Virtual NMI

Bug #1741655 reported by Nathan Rennie-Waldock on 2018-01-06
34
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Artful
High
Joseph Salisbury
Bionic
High
Joseph Salisbury

Bug Description

== SRU Justification ==
The following mainline commit introduced a regression in v4.12-rc1:
2c82878b0cb3 ("KVM: VMX: require virtual NMI support")

This regression caused the kvm-intel module fail to load with the following error:
"modprobe: ERROR: could not insert 'kvm_intel': Input/output error"

This error would happen because suppor for CPUs without virtual NMI was removed
by commit 2c82878b0cb3.

Mainline commit 8a1b43922d0d fixes this regression and was added to mainline in v4.15-rc1.

== Fix ==
commit 8a1b43922d0d1279e7936ba85c4c2a870403c95f
Author: Paolo Bonzini <email address hidden>
Date: Mon Nov 6 13:31:12 2017 +0100

    kvm: vmx: Reinstate support for CPUs without virtual NMI

== Regression Potential ==
Low. This patch fixes a current regression. It was cc'd to upstream stable
so had additional upstream review.

## Original Bug Description ##
Since upgrading from zesty to artful, I'm not longer able to use KVM on my server:
# modprobe kvm-intel
modprobe: ERROR: could not insert 'kvm_intel': Input/output error

Searching tells me this is caused by requiring Virtual NMI support[1]

Running the script provided on the mailing list[1] to check virtualization features confirms my CPU (Xeon E5345) doesn't support Virtual NMIs:
# python features.py | grep NMI
  NMI exiting yes
  Virtual NMIs no
  NMI-window exiting no

Virtual NMI support was required in v4.12[1] and later reverted in v4.14.3[2] as some models (including Xeons) don't support it, even if others with the same core do.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1490803
[2] https://lkml.org/lkml/2017/8/7/231
[3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/arch/x86/kvm/vmx.c?h=linux-4.13.y&id=2c82878b0cb38fd516fd612c67852a6bbf282003
[4] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/arch/x86/kvm/vmx.c?h=linux-4.14.y&id=a77360e989f3dc06e4f177a0837d533d13a20d91

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1741655

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc6

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key

4.15-rc6 is functional:
# lsmod | grep kvm
kvm_intel 200704 0
kvm 581632 1 kvm_intel
irqbypass 16384 1 kvm
# kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used

As is 4.13.0-21-generic with that single commit applied.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-fixed-upstream
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Changed in linux (Ubuntu Artful):
status: New → Triaged
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
Changed in linux (Ubuntu Artful):
status: Triaged → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with a pick of commit 8a1b439. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1741655

Can you test this kernel and see if it resolves this bug?

Thanks in advance!

Oliver (ojehle) wrote :

The Problem exists now also in 16.04 LTS with the HWE Stack and the newest 4.13 kernel series with kpti.

Oliver (ojehle) wrote :

Kernel is working, please fix also the HWE for 16.04 with the patches

Linux XXXXXXXX 4.13.0-25-generic #29~lp1741655 SMP

~# lsmod | grep kvm
kvm_intel 200704 7
kvm 585728 1 kvm_intel

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 5110 @ 1.60GHz
stepping : 6
microcode : 0xd2
cpu MHz : 1596.587
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm co
nstant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm pti tpr_shadow dtherm
bugs : cpu_insecure
bogomips : 3193.17
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual

That build is also working for me:
$ uname -a
Linux localhost 4.13.0-25-generic #29~lp1741655 SMP Thu Jan 11 19:45:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ lsmod | grep kvm
kvm_intel 200704 24
kvm 585728 1 kvm_intel
irqbypass 16384 21 kvm
$ kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used

Oliver (ojehle) wrote :

The server works now over 20 hours with approx 10 vm's without any problems. so i think the problem is solved for me. let's hope it's included fast in the 4.13 packages of hwe

no longer affects: linux-hwe (Ubuntu Bionic)
no longer affects: linux-hwe (Ubuntu Artful)
no longer affects: linux-hwe (Ubuntu)
Joseph Salisbury (jsalisbury) wrote :
description: updated
Seth Forshee (sforshee) on 2018-01-19
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Súper de Algar (superdealgar) wrote :

Latest 4.13 lowlatency kernel in 16.04.3 (xenial), still has the regression.

bohemius (bohemius) wrote :

After installing HWE stack and getting to 16.04.4 and updating kernel to 4.13.0-36 I still get this issue. So yes latest Xenial 16.04.4 still has it.

Stefan Bader (smb) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
Oliver (ojehle) wrote :

tested it on artful and 16.04 LTE. On 16.04 LTE the kernel from comment #6 working since mid of January without any problems.

added tag verification-done-artful

tags: added: verification-done-artful
removed: verification-needed-artful
Súper de Algar (superdealgar) wrote :

Yes proposed works like it should.
There's one thing I'd like to know, though. If the fix was in upstream such little time after the regression was introduced, why are we still having the regression in ubuntu after so long?

Launchpad Janitor (janitor) wrote :
Download full text (18.9 KiB)

This bug was fixed in the package linux - 4.13.0-38.43

---------------
linux (4.13.0-38.43) artful; urgency=medium

  * linux: 4.13.0-38.43 -proposed tracker (LP: #1755762)

  * Servers going OOM after updating kernel from 4.10 to 4.13 (LP: #1748408)
    - i40e: Fix memory leak related filter programming status
    - i40e: Add programming descriptors to cleaned_count

  * [SRU] Lenovo E41 Mic mute hotkey is not responding (LP: #1753347)
    - platform/x86: ideapad-laptop: Increase timeout to wait for EC answer

  * fails to dump with latest kpti fixes (LP: #1750021)
    - kdump: write correct address of mem_section into vmcoreinfo

  * headset mic can't be detected on two Dell machines (LP: #1748807)
    - ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
    - ALSA: hda - Fix headset mic detection problem for two Dell machines
    - ALSA: hda - Fix a wrong FIXUP for alc289 on Dell machines

  * CIFS SMB2/SMB3 does not work for domain based DFS (LP: #1747572)
    - CIFS: make IPC a regular tcon
    - CIFS: use tcon_ipc instead of use_ipc parameter of SMB2_ioctl
    - CIFS: dump IPC tcon in debug proc file

  * i2c-thunderx: erroneous error message "unhandled state: 0" (LP: #1754076)
    - i2c: octeon: Prevent error message on bus error

  * hisi_sas: Add disk LED support (LP: #1752695)
    - scsi: hisi_sas: directly attached disk LED feature for v2 hw

  * EDAC, sb_edac: Backport 1 patch to Ubuntu 17.10 (Fix missing DIMM sysfs
    entries with KNL SNC2/SNC4 mode) (LP: #1743856)
    - EDAC, sb_edac: Fix missing DIMM sysfs entries with KNL SNC2/SNC4 mode

  * [regression] Colour banding and artefacts appear system-wide on an Asus
    Zenbook UX303LA with Intel HD 4400 graphics (LP: #1749420)
    - drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA

  * DVB Card with SAA7146 chipset not working (LP: #1742316)
    - vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems

  * [Asus UX360UA] battery status in unity-panel is not changing when battery is
    being charged (LP: #1661876) // AC adapter status not detected on Asus
    ZenBook UX410UAK (LP: #1745032)
    - ACPI / battery: Add quirk for Asus UX360UA and UX410UAK

  * ASUS UX305LA - Battery state not detected correctly (LP: #1482390)
    - ACPI / battery: Add quirk for Asus GL502VSK and UX305LA

  * support thunderx2 vendor pmu events (LP: #1747523)
    - perf pmu: Extract function to get JSON alias map
    - perf pmu: Pass pmu as a parameter to get_cpuid_str()
    - perf tools arm64: Add support for get_cpuid_str function.
    - perf pmu: Add helper function is_pmu_core to detect PMU CORE devices
    - perf vendor events arm64: Add ThunderX2 implementation defined pmu core
      events
    - perf pmu: Add check for valid cpuid in perf_pmu__find_map()

  * lpfc.ko module doesn't work (LP: #1746970)
    - scsi: lpfc: Fix loop mode target discovery

  * Ubuntu 17.10 crashes on vmalloc.c (LP: #1739498)
    - powerpc/mm/book3s64: Make KERN_IO_START a variable
    - powerpc/mm/slb: Move comment next to the code it's referring to
    - powerpc/mm/hash64: Make vmalloc 56T on hash

  * ethtool -p fails to light NIC LED on HiSilicon D05 systems (LP: #1748567)
    - net...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released

This has been fixed for some time, so when can we expect this to be in an updated Bionic kernel?

Thanks!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers