nested KVM fails on intel hardware - KVM: entry failed, hardware error 0x0
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | linux (Ubuntu) |
Medium
|
Unassigned | ||
| | Trusty |
Medium
|
Chris J Arges | ||
Bug Description
[Impact]
Using nested KVM on some hypervisors doesn't work.
[Test Case]
A script to make this easier is posted here:
https:/
1) enable nested KVM:
sudo modprobe -r kvm_intel
sudo modprobe kvm_intel nested=1
cat /sys/module/
# should say Y
2) generate an L1 guest and then generate an L2 guest inside the L1 guest
- ensure L1 has enough memory to boot L2
- if using libvirt you may need to edit the default bridge to use a different subnet than the L1 guest
3) boot the L2 guest
4) L2 guest should boot
[Fix]
These three upstream patches needed to be backported to 3.13:
* 533558bcb69ef28
- This provides necessary code changes to make backporting easier. However vmx_leave_nested function was not yet added, so that function modification was dropped.
* b6b8a1451fc4041
- This patch is necessary in order to ensure that the L1 guest doesn't crash with just 696dfd95 applied. I had to remove mpx mentions from the cherry-pick as that feature hasn't been added yet.
* 696dfd95ba98383
- This patch fixes the issue and was the result of the bisection. The APIC virtualization features need to be disabled as they cause L2 guests to not boot depending on the CPU.
--
If the L2 guest doesn't boot you can see the log:
sudo cat /var/log/
<snip>
KVM: entry failed, hardware error 0x0
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000000
DR6=00000000fff
EFER=0000000000
Code=00 66 89 d8 66 e8 02 f7 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
---
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Jun 13 18:26 seq
crw-rw---- 1 root audio 116, 33 Jun 13 18:26 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 14.04
IwConfig: Error: [Errno 2] No such file or directory
MachineType: Intel Corporation S2600WTT
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=xterm
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.127.2
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty uec-images
Uname: Linux 3.13.0-24-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy libvirtd netdev plugdev sudo video
_MarkForUpload: True
dmi.bios.date: 05/06/2014
dmi.bios.vendor: Intel Corporation
dmi.bios.version: GRNDSDP1.
dmi.board.
dmi.board.name: S2600WTT
dmi.board.vendor: Intel Corporation
dmi.board.version: H30334-201
dmi.chassis.
dmi.chassis.type: 23
dmi.chassis.vendor: .......
dmi.chassis.
dmi.modalias: dmi:bvnIntelCor
dmi.product.name: S2600WTT
dmi.product.
dmi.sys.vendor: Intel Corporation
| affects: | ubuntu → linux (Ubuntu) |
| Changed in linux (Ubuntu Trusty): | |
| assignee: | nobody → Chris J Arges (arges) |
| status: | New → In Progress |
| importance: | Undecided → Medium |
| Chris J Arges (arges) wrote : | #1 |
| Chris J Arges (arges) wrote : | #2 |
Work on v3.15, so fixed in Utopic.
| Changed in linux (Ubuntu): | |
| assignee: | Chris J Arges (arges) → nobody |
| status: | In Progress → Fix Released |
| Chris J Arges (arges) wrote : | #3 |
v3.15rc7 fails
v3.15rc8 works
Fix is somewhere in there...
| Chris J Arges (arges) wrote : | #4 |
This commit fixes the issue:
696dfd95ba98383
| Chris J Arges (arges) wrote : BootDmesg.txt | #5 |
apport information
| tags: | added: apport-collected uec-images |
| description: | updated |
| Chris J Arges (arges) wrote : CRDA.txt | #6 |
apport information
| Chris J Arges (arges) wrote : CurrentDmesg.txt | #7 |
apport information
| Chris J Arges (arges) wrote : Lspci.txt | #8 |
apport information
| Chris J Arges (arges) wrote : Lsusb.txt | #9 |
apport information
| Chris J Arges (arges) wrote : ProcCpuinfo.txt | #10 |
apport information
apport information
| Chris J Arges (arges) wrote : ProcModules.txt | #12 |
apport information
| Chris J Arges (arges) wrote : UdevDb.txt | #13 |
apport information
| Chris J Arges (arges) wrote : UdevLog.txt | #14 |
apport information
| Chris J Arges (arges) wrote : | #16 |
Attached info from an affected machine.
| Chris J Arges (arges) wrote : | #17 |
So I've been able to get this working with the following patches (and notes about how I resolved the conflicts)
533558bcb69ef28
b6b8a1451fc4041
f4124500c2c13eb
696dfd95ba98383
| madbiologist (me-again) wrote : | #18 |
Glad to hear that a fix is in the pipeline.
I don't know anything about KVM, but I saw this the other day:
| Chris J Arges (arges) wrote : | #19 |
Test kernel posted here:
http://
| Chris J Arges (arges) wrote : | #21 |
Ok a simplified patchset, I've had this work for me with limited testing:
http://
| description: | updated |
| description: | updated |
| mage2 (t-w-otto) wrote : | #22 |
I have been testing the inital kernel, and so far it is looking good.
Im going to be installing the new version in just a few and will circle back
| mage2 (t-w-otto) wrote : | #23 |
Running newer patch. looks good so far.
| Changed in linux (Ubuntu Trusty): | |
| status: | In Progress → Fix Committed |
| Kiran Koushik Agrahara (kkoushik) wrote : | #24 |
works for me - tested it on devstack
| Brad Figg (brad-figg) wrote : | #25 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
| tags: | added: verification-needed-trusty |
| mage2 (t-w-otto) wrote : | #26 |
I tested and I am currently using the latest patch. It works for me.
| tags: |
added: verification-done-trusty removed: verification-needed-trusty |
| tags: | added: ua |
| Launchpad Janitor (janitor) wrote : | #27 |
This bug was fixed in the package linux - 3.13.0-36.63
---------------
linux (3.13.0-36.63) trusty; urgency=low
[ Joseph Salisbury ]
* Release Tracking Bug
- LP: #1365052
[ Feng Kan ]
* SAUCE: (no-up) irqchip:gic: change access of gicc_ctrl register to read
modify write.
- LP: #1357527
* SAUCE: (no-up) arm64: optimized copy_to_user and copy_from_user
assembly code
- LP: #1358949
[ Ming Lei ]
* SAUCE: (no-up) Drop APM X-Gene SoC Ethernet driver
- LP: #1360140
* [Config] Drop XGENE entries
- LP: #1360140
* [Config] CONFIG_NET_XGENE=m for arm64
- LP: #1360140
[ Stefan Bader ]
* SAUCE: Add compat macro for skb_get_hash
- LP: #1358162
* SAUCE: bcache: prevent crash on changing writeback_running
- LP: #1357295
[ Suman Tripathi ]
* SAUCE: (no-up) arm64: Fix the csr-mask for APM X-Gene SoC AHCI SATA PHY
clock DTS node.
- LP: #1359489
* SAUCE: (no-up) ahci_xgene: Skip the PHY and clock initialization if
already configured by the firmware.
- LP: #1359501
* SAUCE: (no-up) ahci_xgene: Fix the link down in first attempt for the
APM X-Gene SoC AHCI SATA host controller driver.
- LP: #1359507
[ Tuan Phan ]
* SAUCE: (no-up) pci-xgene-msi: fixed deadlock in irq_set_affinity
- LP: #1359514
[ Upstream Kernel Changes ]
* iwlwifi: mvm: Add a missed beacons threshold
- LP: #1349572
* mac80211: reset probe_send_count also in HW_CONNECTION_
- LP: #1349572
* genirq: Add an accessor for IRQ_PER_CPU flag
- LP: #1357527
* arm64: perf: add support for percpu pmu interrupt
- LP: #1357527
* cifs: sanity check length of data to send before sending
- LP: #1283101
* KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
- LP: #1329434
* KVM: nVMX: Rework interception of IRQs and NMIs
- LP: #1329434
* KVM: vmx: disable APIC virtualization in nested guests
- LP: #1329434
* HID: Add transport-driver functions to the USB HID interface.
- LP: #1353021
* ahci_xgene: Removing NCQ support from the APM X-Gene SoC AHCI SATA Host
Controller driver.
- LP: #1358498
* fold d_kill() and d_free()
- LP: #1354234
* fold try_prune_
- LP: #1354234
* new helper: dentry_free()
- LP: #1354234
* expand the call of dentry_lru_del() in dentry_kill()
- LP: #1354234
* dentry_kill(): don't try to remove from shrink list
- LP: #1354234
* don't remove from shrink list in select_collect()
- LP: #1354234
* more graceful recovery in umount_collect()
- LP: #1354234
* dcache: don't need rcu in shrink_
- LP: #1354234
* lift the "already marked killed" case into shrink_
* split dentry_kill()
- LP: #1354234
* expand dentry_kill(dentry, 0) in shrink_
- LP: #1354234
* shrink_
- LP: #1354234
* dealing with the rest of shrink_
- LP: #1354234
* dentry_kill() doesn't need the second argument now
- LP: #1354234
* dcache: add missing lockdep annotation
- LP: #1354234
* fs: convert use of typedef ctl_table to struct ctl_table
...
| Changed in linux (Ubuntu Trusty): | |
| status: | Fix Committed → Fix Released |


Another report on different hardware: /bugs.launchpad .net/ubuntu/ +source/ linux/+ bug/1278531
https:/
I've been able to test with a mainline 3.16 kernel (dfb945473ae852 8fd885607b6fa84 3c676745e0c)
and it worked fine. Time to bisect...