L2 guest failed to boot under nested KVM: entry failed, hardware error 0x0

Bug #1739585 reported by James Page
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Charm Test Infra
Fix Released
Critical
David Ames
Ubuntu Cloud Archive
Invalid
Undecided
Unassigned
qemu (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

During testing of the Queens b2 milestone, I see this particular error when the test cloud attempts to boot instances on specific hosts on our cloud.

The base cloud is running:

  4.4.0-72-generic

The test instance on the cloud saw the same issue with these kernels:

  4.10.0-42-generic
  4.4.0-97-generic

I don't think we're seeing the same issue with pre-bionic versions of libvirt/qemu on these hosts.

Error from libvirt qemu instance log:

KVM: entry failed, hardware error 0x0
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306d2
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=ff ff 66 5b 66 83 c4 08 66 5b 66 5e 66 c3 cd 19 cb cd 18 cb <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Error on host:

[22353622.446568] nested_vmx_exit_handled failed vm entry 7

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: qemu-system-x86 1:2.10+dfsg-0ubuntu5~cloud0 [origin: Canonical]
ProcVersionSignature: Ubuntu 4.10.0-42.46~16.04.1-generic 4.10.17
Uname: Linux 4.10.0-42-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.14
Architecture: amd64
CrashDB:
 {
                "impl": "launchpad",
                "project": "cloud-archive",
                "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml",
             }
Date: Thu Dec 21 09:58:30 2017
Ec2AMI: ami-00000259
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND
Lsusb:
 Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: OpenStack Foundation OpenStack Nova
ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.0-42-generic root=UUID=d7006b2f-ace6-464d-8b21-17180b3ed360 ro console=tty1 console=ttyS0
SourcePackage: qemu
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.10.1-1ubuntu1~cloud0
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-zesty
dmi.modalias: dmi:bvnSeaBIOS:bvr1.10.1-1ubuntu1~cloud0:bd04/01/2014:svnOpenStackFoundation:pnOpenStackNova:pvr15.0.7:cvnQEMU:ct1:cvrpc-i440fx-zesty:
dmi.product.name: OpenStack Nova
dmi.product.version: 15.0.7
dmi.sys.vendor: OpenStack Foundation

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Discussed on IRC, clarification:

HW: Xenial (4.4) + Ocata stack (~zesty)
running
L1: Xenial (4.4 or 4.10) + Queens stack (~bionic)
running guests
L2: Xenial (4.4 or 4.10)

The "specific HW" seem to be the newer systems.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.8 KiB)

Tried a local repro with

HW: Xenial + Ocata
L1: Bionic
L2: Bionic

But my case is just working.
We knew it is HW related.

There was a set of similar issues in 2014/2015.
There it was around kernel 3.10/3.13 in RH.
See:
https://bugzilla.redhat.com/show_bug.cgi?id=1086058
https://bugzilla.redhat.com/show_bug.cgi?id=1069089
https://www.spinics.net/lists/kvm/msg102458.html

Back then it was related to features being passed through which should not and then fail on L2.
Chances are high that it is in this area again.

OTOH my defaults are not specifying CPU and only have base features set.
So the guests on both levels are like:
# no cpu
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>

IIRC Openstack will try to do a host-model or even host-pasthrough spec.
That might enable "too much" and thereby break.
Since those defintions and what is meant to be detected/added is version dependent that might even explain why you think you only see it on newer libvirt/qemu.

I checked my Test env:

## HW / L0 ##

/sys/module/kvm_intel/parameters/emulate_invalid_guest_state : Y
/sys/module/kvm_intel/parameters/enable_apicv : N
/sys/module/kvm_intel/parameters/enable_shadow_vmcs : N
/sys/module/kvm_intel/parameters/ept : Y
/sys/module/kvm_intel/parameters/eptad : Y
/sys/module/kvm_intel/parameters/fasteoi : Y
/sys/module/kvm_intel/parameters/flexpriority : Y
/sys/module/kvm_intel/parameters/nested : Y
/sys/module/kvm_intel/parameters/ple_gap : 128
/sys/module/kvm_intel/parameters/ple_window : 4096
/sys/module/kvm_intel/parameters/ple_window_grow : 2
/sys/module/kvm_intel/parameters/ple_window_max : 1073741823
/sys/module/kvm_intel/parameters/ple_window_shrink : 0
/sys/module/kvm_intel/parameters/pml : N
/sys/module/kvm_intel/parameters/unrestricted_guest : Y
/sys/module/kvm_intel/parameters/vmm_exclusive : Y
/sys/module/kvm_intel/parameters/vpid : Y

## L1 ##

$ for i in /sys/module/kvm_intel/parameters/*; do echo "$i : $(cat $i)"; done
/sys/module/kvm_intel/parameters/emulate_invalid_guest_state : Y
/sys/module/kvm_intel/parameters/enable_apicv : N
/sys/module/kvm_intel/parameters/enable_shadow_vmcs : N
/sys/module/kvm_intel/parameters/ept : Y
/sys/module/kvm_intel/parameters/eptad : N
/sys/module/kvm_intel/parameters/fasteoi : Y
/sys/module/kvm_intel/parameters/flexpriority : Y
/sys/module/kvm_intel/parameters/nested : Y
/sys/module/kvm_intel/parameters/ple_gap : 0
/sys/module/kvm_intel/parameters/ple_window : 4096
/sys/module/kvm_intel/parameters/ple_window_grow : 2
/sys/module/kvm_intel/parameters/ple_window_max : 1073741823
/sys/module/kvm_intel/parameters/ple_window_shrink : 0
/sys/module/kvm_intel/parameters/pml : N
/sys/module/kvm_intel/parameters/preemption_timer : Y
/sys/module/kvm_intel/parameters/unrestricted_guest : Y
/sys/module/kvm_intel/parameters/vpid : Y

$ cat /proc/cpuinfo
model name : QEMU Virtual CPU version 2.5+
flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl xtopology cpuid pni vmx cx16 x2apic hypervisor lahf_lm tpr_shadow vnmi flexpriority ept vpid

## L2 ##
model name : QEMU Virtual CPU version 2.5+
flags : fpu de ...

Read more...

Changed in cloud-archive:
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@James - since you have good/bad (=old/new) systems comparing those on the same SW level might be great as well.

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

L1 in that set of attachments is a new hypervisor host.

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

L2-Guest-Running.xml is from an older compute hosts where the nested KVM instance launches successfully.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So the construction of this custom cpu model for L1 and/or the follow on for the same in L2 might be the reason why this affects just this system.

As you already pointed out the detection of an amd cpu - this really seems all awkward.

I added a single:
<cpu mode="host-model" match="exact">
<topology sockets="1" cores="1" threads="1"/>
</cpu>
as you attached it in comment #9
I wanted to try if host-model on its own goes so crazy and auto-extends to what we see in comment #12 and #13.

But it worked well on both levels for me.
Also I do not see how the xml in comment #12 and #9 relate - that should be the same L2 xml right? Maybe in understanding this difference is also a way to solve this.

In general is thre a way you could stop openstack doing all the cpu mode=custom magic for these nested tests?
I'd expect that if you could just leave out the cpu it might even work (falling back on likely better defaults).

Revision history for this message
James Page (james-page) wrote :

#9 is what nova passes to libvirt, #12 is what I get from a dumpxml from libvirt for the resulting machine.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Odd I started host-model guests but if I didn't look at something wrong they do not change the runtime definition to the "feature listed" version.

Note from IRC: this issue seems to be a lonstanding one, seems to also happen on Xenial/Pike

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In between Christmas and NewYear I tried to two more systems I had (host-model on both levels), none triggered the issue, but then also none were new enough I guess.

@James - Any further updates to this like testing with more combinations of cpu specifications or anything else that guides us to where we would actually tackle this issue?

@James - another chance to debug could be to just do host-model on both levels like I do in my tests (to be close to openstack), but to do that on your systems. Can you (or I) get a host login to those to try that?

Revision history for this message
Kai (kaixxx) wrote :

I have the same error (as described) and guess this is related to #1682077

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Reconfirming that we are consistently hitting this bug:

https://paste.ubuntu.com/26397596/

Linux caipora 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

processor : 47
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping : 1
microcode : 0xb00001e
cpu MHz : 1710.000
cache size : 30720 KB
physical id : 1
siblings : 24
core id : 13
cpu cores : 12
apicid : 59
initial apicid : 59
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
bugs :
bogomips : 4397.44
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

Revision history for this message
Ryan Beisner (1chb1n) wrote :

https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_amulet_full/openstack/charm-nova-cloud-controller/527133/7/1019/index.html

2018-01-15 15:59:41.690+0000: starting up libvirt version: 1.2.16, package: 1.2.16-2ubuntu11.15.10.4~cloud0, qemu version: 2.3.0 (Debian 1:2.3+dfsg-5ubuntu9.4~cloud4)
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name instance-00000001 -S -machine pc-i440fx-vivid,accel=kvm,usb=off -cpu Broadwell,+abm,+pdpe1gb,+hypervisor,+rdrand,+f16c,+osxsave,+vmx,+ss,+vme -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 1ae8c174-48d4-4889-8461-892b344a0c47 -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=12.0.6,serial=b85ff9d3-9e9f-429a-9c36-7fa58bdf4335,uuid=1ae8c174-48d4-4889-8461-892b344a0c47,family=Virtual Machine -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000001.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/1ae8c174-48d4-4889-8461-892b344a0c47/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev file,id=charserial0,path=/var/lib/nova/instances/1ae8c174-48d4-4889-8461-892b344a0c47/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on
char device redirected to /dev/pts/5 (label charserial1)
KVM: entry failed, hardware error 0x0
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306d2
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 66 89 d8 66 e8 95 ab ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Changed in charm-test-infra:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
David Ames (thedac) wrote :

This may be a duplicate of [0]

After finding hints in the above bug. We checked if APIC was enabled. On three of our compute nodes it was:

cat /sys/module/kvm_intel/parameters/enable_apicv
Y

We disabled APCI by setting the following in /etc/modprobe.d/qemu-system-x86.conf and rebooting per [1]:
options kvm-intel nested=y enable_apicv=n

Now
cat /sys/module/kvm_intel/parameters/enable_apicv
N

Initial testing we have had a number of successful nested KVMs on the compute nodes in question.

[0] https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1682077
[1] https://www.juniper.net/documentation/en_US/vsrx/topics/task/installation/security-vsrx-kvm-nested-virt-enable.html

Changed in charm-test-infra:
status: Confirmed → Fix Released
assignee: nobody → David Ames (thedac)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks David, so it is a configuration detail when going into nested virt.
No change in qemu needed, setting tasks accordingly.

Changed in cloud-archive:
status: Incomplete → Invalid
Changed in qemu (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.