kvm_intel not loadable in a quantal guest

Bug #1031090 reported by Jamie Strandboge
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned
Precise
Fix Released
High
Stefan Bader

Bug Description

SRU Justification:
Impact: A KVM guest running a v3.3+ kernel will fail to load the kvm_intel module when running on a v3.2 host. This is because the newer kernel module will check for the presence of a CPU specific flag (which any real CPU would have) in the MSRs that get passed to the guest. But older kernels would not set it in the nested case.

Fix: Just add the required flag to the MSRs passed to guests. This change is picked from the patch that enabled the feature but does not enable anything beyond. It has been reviewed upstream and sent to upstream stable.

Testcase: KVM host (Intel CPU) running Precise (3.2). Quantal KVM guest will not be able to modprobe kvm-intel (while a Precise guest can). With this change, the Quantal guest, too can load the module. Successfully installed another Quantal guest in the L1 guest. Running "perf test" in the Quantal guest will fail the RDPMC test as unsupported.

---

12.04 LTS supports nested virtualization. If I use a 12.04 LTS kernel with kvm_intel nested=1 (the default on 12.04 LS) and boot a 12.04 LTS guest, I am able to load the kvm_intel kernel module in the guest. If I use the same host and boot a 12.10 guest, the kvm_intel module does not load:

$ sudo modprobe kvm_intel
FATAL: Error inserting kvm_intel (/lib/modules/3.5.0-6-generic/kernel/arch/x86/kvm/kvm-intel.ko): Input/output error

The kvm module loads fine in the guest. The guest is up to date 12.10 amd64. The host is up to date 12.04 LTS amd64 using an Intel i7 CPU.

description: updated
description: updated
Revision history for this message
Dave Walker (davewalker) wrote :

Confirmed that this works with precise host and amd guest, but fails with intel guest.

$ sudo modprobe kvm_intel
FATAL: Error inserting kvm_intel (/lib/modules/3.5.0-6-generic/kernel/arch/x86/kvm/kvm-intel.ko): Input/output error

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

FYI, this affects parts of Canonistack as well as security team support for OpenStack, since we all have Intel.

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key quantal
Stefan Bader (smb)
Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
Revision history for this message
Stefan Bader (smb) wrote :

To narrow this down I tested with a Precise and Quantal user-space and a mainline v3.3 kernel build. Both fail in the same way (-EIO) in module init. Seems this has been that way for quite a bit then.

Revision history for this message
Stefan Bader (smb) wrote :

Bisection turned up the following patch (proved to be correct by reverting from Quantal):

commit fee84b079d5ddee2247b5c1f53162c330c622902
Author: Avi Kivity <email address hidden>
Date: Thu Nov 10 14:57:25 2011 +0200

    KVM: VMX: Intercept RDPMC

    Intercept RDPMC and forward it to the PMU emulation code.

    Signed-off-by: Avi Kivity <email address hidden>
    Signed-off-by: Gleb Natapov <email address hidden>
    Signed-off-by: Avi Kivity <email address hidden>

For Intel this seems to expect a certain feature based on the CPU but when trying to use/activate this from the nested setup it fails because the outer kvm module has no support for it.

Revision history for this message
Stefan Bader (smb) wrote :

FYI, that patch was part of v3.3-rc1. So basically things were broken for the whole of Quantal.

Revision history for this message
Stefan Bader (smb) wrote :

On 08/01/2012 06:07 PM, Nadav Har'El wrote:
> On Wed, Aug 01, 2012, Avi Kivity wrote about "Re: Nested kvm_intel broken on pre 3.3 hosts":
>> Right - it's not just kvm-as-a-guest that will trip on this. But
>> there's no point in everyone backporting it on their own. If you're
>> doing the backport, please post it here and we'll forward it to the
>> stable branch.
>
> If I understand correctly, the failure occurs because new versions of
> KVM refuse to work if the processor doesn't support CPU_BASED_RDPMC_EXITING -
> which older versions of nested VMX didn't say that they did.

Right.

> But must the KVM guest refuse to work if this feature isn't supported?
> I.e., why not move in setup_vmcs_config() the CPU_BASED_RDPMC_EXITING
> from "min" to "opt"? Isn't losing the PMU feature a lesser evil than
> not working at all? In any case, perhaps the original reporter can use
> this as a workaround, at least, because it requires modifying the (L1)
> guest, not the host.

Real processors that don't support RDPMC exiting don't exist (and
logically cannot exist unless they also drop support for the RDPMC
instruction). Given it's a clear host bug I'd rather fix it than making
the guest more complicated, even by a small amount.

The hypervisor needs to be updated on a regular schedule anyway, so
there's no risk of locking users out for more than a short while.

Revision history for this message
Stefan Bader (smb) wrote :

IOW, this is seen as something that should be fixed by adding support to the host kvm module. Since this was done in 3.3, the only affected (probably around the first kernel that supported nesting) is 3.2. But that requires careful pick of patches and hopefully a way to test. So its hard to give any ETA.

Stefan Bader (smb)
description: updated
Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
Changed in linux (Ubuntu Precise):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Stefan Bader (stefan-bader-canonical)
Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel for Precise in -proposed solves the problem (3.2.0-30.47). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-precise' to 'verification-done-precise'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-precise
Revision history for this message
Stefan Bader (smb) wrote :

Verified running the 3.2.0-30.47 kernel on a kvm host and loading the kvm-intel module in a quantal guest.

Revision history for this message
Luis Henriques (henrix) wrote :

Tagging as verified as per comment #10

tags: added: verification-done-precise
removed: verification-needed-precise
Revision history for this message
Andy Whitcroft (apw) wrote :

The bug here is actually in the outer host so the quantal task should be Invalid as there is no bug not won't fix.

Changed in linux (Ubuntu):
status: Won't Fix → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (13.6 KiB)

This bug was fixed in the package linux - 3.2.0-30.48

---------------
linux (3.2.0-30.48) precise-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1041217

  [ Upstream Kernel Changes ]

  * mutex: Place lock in contended state after fastpath_lock failure
    - LP: #1041114

linux (3.2.0-30.47) precise-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1036581

  [ Andy Whitcroft ]

  * add support for generating binary device trees and install them in
    /lib/firmware
    - LP: #1030600
  * [Config] add dtb_file configuration for highbank
    - LP: #1030600

  [ Chris Van Hoof ]

  * SAUCE: dell-laptop: additional rfkill blacklist Dell XPS 13
    - LP: #1030957
  * [Config] Add cifs support to the nfs-modules list
    - LP: #1031398

  [ Daniel P. Berrange ]

  * SAUCE: (drop after 3.6) Forbid invocation of kexec_load() outside
    initial PID namespace
    - LP: #1034125

  [ Dann Frazier ]

  * [Config] Compile the rtc-pl031 driver builtin on the highbank kernel
    flavour
    - LP: #1035110

  [ Douglas Bagnall ]

  * SAUCE: Unlock the rc_dev lock when the raw device is missing
    - LP: #1015836

  [ Rob Herring ]

  * SAUCE: ARM: highbank: add soft power and reset key event handling
    - LP: #1033853
  * SAUCE: ARM: highbank: use writel_relaxed variant for pwr requests
    - LP: #1033853
  * SAUCE: ahci: un-staticize ahci_dev_classify
    - LP: #1033853
  * SAUCE: ahci_platform: add custom hard reset for Calxeda ahci ctrlr
    - LP: #1033853

  [ Stefan Bader ]

  * (pre-stable) KVM: VMX: Set CPU_BASED_RDPMC_EXITING for nested
    - LP: #1031090

  [ Tim Gardner ]

  * [Config] updateconfigs

  [ Upstream Kernel Changes ]

  * ideapad: generate valid key event only
    - LP: #1029834
  * mm: reduce the amount of work done when updating min_free_kbytes
    - LP: #1032640
  * mm: compaction: allow compaction to isolate dirty pages
    - LP: #1032640
  * mm: compaction: determine if dirty pages can be migrated without
    blocking within ->migratepage
    - LP: #1032640
  * mm: page allocator: do not call direct reclaim for THP allocations
    while compaction is deferred
    - LP: #1032640
  * mm: compaction: make isolate_lru_page() filter-aware again
    - LP: #1032640
  * mm: compaction: introduce sync-light migration for use by compaction
    - LP: #1032640
  * mm: vmscan: when reclaiming for compaction, ensure there are sufficient
    free pages available
    - LP: #1032640
  * mm: vmscan: do not OOM if aborting reclaim to start compaction
    - LP: #1032640
  * mm: vmscan: check if reclaim should really abort even if
    compaction_ready() is true for one zone
    - LP: #1032640
  * vmscan: promote shared file mapped pages
    - LP: #1032640
  * vmscan: activate executable pages after first usage
    - LP: #1032640
  * mm/vmscan.c: consider swap space when deciding whether to continue
    reclaim
    - LP: #1032640
  * mm: test PageSwapBacked in lumpy reclaim
    - LP: #1032640
  * mm: vmscan: convert global reclaim to per-memcg LRU lists
    - LP: #1032640
  * cpuset: mm: reduce large amounts of memory barrier related damage v3
    - LP: #1032640
  * mm/hugetlb: fix warni...

Changed in linux (Ubuntu Precise):
status: In Progress → Fix Released
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.