qemu guest hangs on nested kvm startup with host kernel oops

Bug #1448269 reported by Serge Hallyn
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Utopic
Fix Released
Medium
Chris J Arges

Bug Description

[Impact]
Users of nested KVM may experience the L1 VM hanging when booting an L2 VM. Overall this seems to be due to issues with external interrupts not reaching L1 when L2 gets booted.

[Test Case]
Run a nested KVM instance:
https://gist.github.com/arges/9d21c6da03a8c10d3980

[Fix]
commit 4fa7734c62cdd8c07edd54fa5a5e91482273071a
commit f3380ca5d7edb5e31932998ab2e29dfdce39c5ed

--

I'm creating a vivid qemu guest on a trusty host with 3.13.0-48-generic kernel. When I start a guest inside that guest, I get the oops below on the host while the first guest hangs and must be (virsh) destroyed.

Apr 24 20:40:08 sergeh2 kernel: [1575627.844208] ------------[ cut here ]------------
Apr 24 20:40:08 sergeh2 kernel: [1575627.844227] WARNING: CPU: 2 PID: 17176 at /build/buildd/linux-3.13.0/arch/x86/kvm/vmx.c:8414 nested_vmx_vmexit+0x11c/0x150 [kvm_intel]()
Apr 24 20:40:08 sergeh2 kernel: [1575627.844229] Modules linked in: vhost_net vhost macvtap macvlan xts gf128mul xt_conntrack ipt_REJECT ip6table_filter ip6_tables ebtable_nat ebtables veth xt_nat xt_CHECKSUM iptable_mangle ipt_MASQUERADE
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables dm_crypt gpio_ich coretemp kvm_intel kvm i7core_edac edac_core lpc_ich shpchp mac_hid serio_raw lp parp
ort btrfs libcrc32c raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear dm_snapshot raid1 nouveau mxm_wmi video i2c_algo_bit ttm drm_kms_helper drm ahci r8169 libahci mii wmi
Apr 24 20:40:08 sergeh2 kernel: [1575627.844281] CPU: 2 PID: 17176 Comm: qemu-system-x86 Not tainted 3.13.0-48-generic #80-Ubuntu
Apr 24 20:40:08 sergeh2 kernel: [1575627.844283] Hardware name: MSI MS-7522/MSI X58 Pro (MS-7522) , BIOS V8.14B8 11/09/2012
Apr 24 20:40:08 sergeh2 kernel: [1575627.844286] 0000000000000009 ffff880907561c98 ffffffff81721506 0000000000000000
Apr 24 20:40:08 sergeh2 kernel: [1575627.844290] ffff880907561cd0 ffffffff810677dd ffff880bfa808000 0000000000000014
Apr 24 20:40:08 sergeh2 kernel: [1575627.844293] ffff8806da7a7000 ffff880bfca9c800 0000000000000000 ffff880907561ce0
Apr 24 20:40:08 sergeh2 kernel: [1575627.844297] Call Trace:
Apr 24 20:40:08 sergeh2 kernel: [1575627.844305] [<ffffffff81721506>] dump_stack+0x45/0x56
Apr 24 20:40:08 sergeh2 kernel: [1575627.844310] [<ffffffff810677dd>] warn_slowpath_common+0x7d/0xa0
Apr 24 20:40:08 sergeh2 kernel: [1575627.844314] [<ffffffff810678ba>] warn_slowpath_null+0x1a/0x20
Apr 24 20:40:08 sergeh2 kernel: [1575627.844321] [<ffffffffa081f8ec>] nested_vmx_vmexit+0x11c/0x150 [kvm_intel]
Apr 24 20:40:08 sergeh2 kernel: [1575627.844327] [<ffffffffa081fafd>] vmx_queue_exception+0xfd/0x140 [kvm_intel]
Apr 24 20:40:08 sergeh2 kernel: [1575627.844347] [<ffffffffa03b7020>] vcpu_enter_guest+0x9f0/0xce0 [kvm]
Apr 24 20:40:08 sergeh2 kernel: [1575627.844364] [<ffffffffa03bb2d8>] kvm_arch_vcpu_ioctl_run+0x1e8/0x460 [kvm]
Apr 24 20:40:08 sergeh2 kernel: [1575627.844376] [<ffffffffa03a5042>] kvm_vcpu_ioctl+0x2a2/0x5e0 [kvm]
Apr 24 20:40:08 sergeh2 kernel: [1575627.844381] [<ffffffff810aaa38>] ? __wake_up_common+0x58/0x90
Apr 24 20:40:08 sergeh2 kernel: [1575627.844387] [<ffffffff811ffc91>] ? fsnotify+0x241/0x320
Apr 24 20:40:08 sergeh2 kernel: [1575627.844391] [<ffffffff811d11c0>] do_vfs_ioctl+0x2e0/0x4c0
Apr 24 20:40:08 sergeh2 kernel: [1575627.844406] [<ffffffffa03b0504>] ? kvm_on_user_return+0x74/0x80 [kvm]
Apr 24 20:40:08 sergeh2 kernel: [1575627.844409] [<ffffffff811d1421>] SyS_ioctl+0x81/0xa0
Apr 24 20:40:08 sergeh2 kernel: [1575627.844414] [<ffffffff81731fbd>] system_call_fastpath+0x1a/0x1f
Apr 24 20:40:08 sergeh2 kernel: [1575627.844416] ---[ end trace 351396e62b6ef224 ]---
Apr 24 20:48:29 sergeh2 dnsmasq-dhcp[1409]: DHCPREQUEST(lxcbr0) 10.0.3.104 00:16:3e:72:73:32

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-48-generic 3.13.0-48.80
ProcVersionSignature: Ubuntu 3.13.0-48.80-generic 3.13.11-ckt16
Uname: Linux 3.13.0-48-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Apr 10 14:22 seq
 crw-rw---- 1 root audio 116, 33 Apr 10 14:22 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.10
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
CurrentDmesg: Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
Date: Fri Apr 24 20:59:31 2015
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: MSI MS-7522
PciMultimedia:

ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-48-generic root=UUID=d1920c3b-419d-484b-b1f2-5cbc69ef62f5 ro nomodeset intel_pstate=enable nomdmonddf nomdmonisw
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:

dmi.bios.date: 11/09/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: V8.14B8
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: MSI X58 Pro (MS-7522)
dmi.board.vendor: MSI
dmi.board.version: 3.0
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: MICRO-STAR INTERNATIONAL CO.,LTD
dmi.chassis.version: 3.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrV8.14B8:bd11/09/2012:svnMSI:pnMS-7522:pvr3.0:rvnMSI:rnMSIX58Pro(MS-7522):rvr3.0:cvnMICRO-STARINTERNATIONALCO.,LTD:ct3:cvr3.0:
dmi.product.name: MS-7522
dmi.product.version: 3.0
dmi.sys.vendor: MSI

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1448269

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
tags: added: bot-stop-nagging
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Chris J Arges (arges) wrote :

1) Can you dump the domain XML (if using libvirt), or qemu command used to invoke the VM. I'm wondering if there is some cpu feature mismatch going on.
2) Can you do 'tail /sys/module/kvm_intel/parameters/*'?

Looks like WARN here:
/*
 * Emulate an exit from nested guest (L2) to L1, i.e., prepare to run L1
 * and modify vmcs12 to make it see what it would expect to see there if
 * L2 was its real guest. Must only be called when in L2 (is_guest_mode())
 */
static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
                              u32 exit_intr_info,
                              unsigned long exit_qualification)
{
        struct vcpu_vmx *vmx = to_vmx(vcpu);
        int cpu;
        struct vmcs12 *vmcs12 = get_vmcs12(vcpu);

        /* trying to cancel vmlaunch/vmresume is a bug */
        WARN_ON_ONCE(vmx->nested.nested_run_pending);

Changed in linux (Ubuntu):
assignee: nobody → Chris J Arges (arges)
Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1448269] Re: qemu guest hangs on nested kvm startup with host kernel oops
Download full text (3.2 KiB)

Quoting Chris J Arges (<email address hidden>):
> 1) Can you dump the domain XML (if using libvirt), or qemu command used to invoke the VM. I'm wondering if there is some cpu feature mismatch going on.

<domain type='kvm'>
  <name>p9</name>
  <uuid>558ac100-65d0-437b-b19b-7d8946b92b8d</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='unsafe'/>
      <source file='/var/lib/uvtool/libvirt/images/p9.qcow'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='unsafe'/>
      <source file='/var/lib/uvtool/libvirt/images/p9-ds.qcow'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <interface type='network'>
      <mac address='52:54:00:0b:4c:35'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
</domain>

> 2) Can you do 'tail /sys/module/kvm_intel/parameters/*'?

==> /sys/module/kvm_intel/parameters/emulate_invalid_guest_state <==
Y

==> /sys/module/kvm_intel/parameters/enable_apicv <==
N

==> /sys/module/kvm_intel/parameters/enable_shadow_vmcs <==
N

==> /sys/module/kvm_intel/parameters/ept <==
Y

==> /sys/module/kvm_intel/parameters/eptad <==
N

==> /sys/module/kvm_intel/parameters/fasteoi <==
Y

==> /sys/module/kvm_intel/parameters/flexpriority <==
Y

==> /sys/module/kvm_intel/parameters/nested <==
Y

==> /sys/module/kvm_intel/parameters/ple_gap <==
0

==> /sys/module/kvm_intel/parameters/ple_window <==
4096

==> /sys/module/kvm_intel/parameters/unrestricted_guest <==
N

==> /sys/module/kvm_intel/parameters/vmm_exclusive <==
Y

==> /sys/module/kvm_in...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

With 3.16.0-37-generic on the host, i no longer get an oops on the host, and the guest instantly reboots rather than hanging.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

This happens with both (first-level) guest kernels 3.19.0-20-generic and 3.19.0-18-generic

Chris J Arges (arges)
Changed in linux (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Stefan Kooman (stefan-n1) wrote :

I had this issue (kernel oops) when I explicitly disabled "ignore_msrs" (Ubuntu Trusty, 3.13.0-54-generic):

cat /etc/modprobe.d/qemu-system-x86.conf
options kvm-intel nested=y ept=y
options kvm ignore_msrs=1

Removing the option "kvm ignore_msrs=1" made the guest (L2) run but give a "KVM: entry failed, hardware error 0x7" when the L1 guest (guest hypervisor) was booted with these cpu flags (libvirt):

<cpu> <arch>x86_64</arch> <model>Nehalem</model> <vendor>Intel</vendor> <topology sockets='1' cores='2' threads='2'/> <feature name='rdtscp'/> <feature name='dca'/> <feature name='pdcm'/> <feature name='xtpr'/> <feature name='tm2'/> <feature name='est'/> <feature name='vmx'/> <feature name='ds_cpl'/> <feature name='monitor'/> <feature name='dtes64'/> <feature name='pbe'/> <feature name='tm'/> <feature name='ht'/> <feature name='ss'/> <feature name='acpi'/> <feature name='ds'/> <feature name='vme'/> </cpu>

Guest VM (L2) running fine with these cpu flags (libvirt) for guest VM (L1):

<cpu match='exact'> <cpu mode='host-passthrough'/> <model>Nehalem</model> <feature policy='require' name='vmx'/> </cpu>

@Serge Hallyn: I wonder what cpu parameters you have defined for your L1 guest (guest hypervisor)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I had made no changes, i.e.

options kvm_intel nested=1

I've changed that to 0 to get past crashes during automated testing,
but have not added ignore_msrs=1.

Revision history for this message
Chris J Arges (arges) wrote :

Ok I can repro on this end. I'll start debugging this.

Chris J Arges (arges)
Changed in linux (Ubuntu Utopic):
assignee: nobody → Chris J Arges (arges)
Changed in linux (Ubuntu):
assignee: Chris J Arges (arges) → nobody
status: In Progress → Fix Released
Changed in linux (Ubuntu Utopic):
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Medium → Undecided
Changed in linux (Ubuntu Utopic):
status: New → In Progress
Chris J Arges (arges)
description: updated
Revision history for this message
Chris J Arges (arges) wrote :

A test build with the fix is available here:
http://people.canonical.com/~arges/lp1448269/

Revision history for this message
Chris J Arges (arges) wrote :

I've tested this on my own workstation starting nested VMs and also testing other types of VMs I run on my system.
In addition I've also run kvm-unit-tests on this patchset and it has the same results as before the patches.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@arges

that kernel allows me to run accelerated kvm nested - thanks!

Brad Figg (brad-figg)
Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Released
status: Fix Released → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-utopic' to 'verification-done-utopic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-utopic
Revision history for this message
Chris J Arges (arges) wrote :

Verified this on my desktop.

tags: added: verification-done-utopic
removed: verification-needed-utopic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.4 KiB)

This bug was fixed in the package linux - 3.16.0-44.59

---------------
linux (3.16.0-44.59) utopic; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1472030

  [ Iyappan Subramanian ]

  * SAUCE: (no-up) drivers: net: xgene: fix: Out of order descriptor bytes
    read
    - LP: #1425576

  [ Upstream Kernel Changes ]

  * Revert "tools/vm: fix page-flags build"
    - LP: #1471170
  * NVMe: Add shutdown timeout as module parameter.
    - LP: #1465136
  * Drivers: hv: vmbus: Add support for VMBus panic notifier handler
    - LP: #1463584
  * Drivers: hv: vmbus: Correcting truncation error for constant
    HV_CRASH_CTL_CRASH_NOTIFY
    - LP: #1463584
  * KVM: nVMX: fix lifetime issues for vmcs02
    - LP: #1448269
  * KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
    - LP: #1448269
  * mm/slab_common: support the slub_debug boot option on specific object
    size
    - LP: #1456952
  * kvm: x86: fix kvm_apic_has_events to check for NULL pointer
  * cpuidle: powernv: Populate cpuidle state details by querying the
    device-tree
    - LP: #1470404
  * cpuidle: powernv: Read target_residency value of idle states from DT if
    available
    - LP: #1470404
  * cpuidle: powernv: Avoid endianness conversions while parsing DT
    - LP: #1470404
  * cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state
    - LP: #1470404
  * iio: adis16400: Report pressure channel scale
    - LP: #1471170
  * iio: adis16400: Use != channel indices for the two voltage channels
    - LP: #1471170
  * iio: adis16400: Compute the scan mask from channel indices
    - LP: #1471170
  * iio: adis16400: Remove unused variable
    - LP: #1471170
  * iio: adis16400: Fix burst mode
    - LP: #1471170
  * iio: adis16400: Fix burst transfer for adis16448
    - LP: #1471170
  * USB: serial: ftdi_sio: Add support for a Motion Tracker Development
    Board
    - LP: #1471170
  * iio: adc: twl6030-gpadc: Fix modalias
    - LP: #1471170
  * serial: imx: Fix DMA handling for IDLE condition aborts
    - LP: #1471170
  * usb: dwc3: gadget: Fix incorrect DEPCMD and DGCMD status macros
    - LP: #1471170
  * ALSA: usb-audio: Add mic volume fix quirk for Logitech Quickcam Fusion
    - LP: #1471170
  * n_tty: Fix auditing support for cannonical mode
    - LP: #1471170
  * drm/i915/hsw: Fix workaround for server AUX channel clock divisor
    - LP: #1471170
  * x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
    - LP: #1471170
  * lib: Fix strnlen_user() to not touch memory after specified maximum
    - LP: #1471170
  * Input: elantech - fix detection of touchpads where the revision matches
    a known rate
    - LP: #1471170
  * ALSA: hda/realtek - Add a fixup for another Acer Aspire 9420
    - LP: #1471170
  * ALSA: usb-audio: add MAYA44 USB+ mixer control names
    - LP: #1471170
  * ALSA: usb-audio: fix missing input volume controls in MAYA44 USB(+)
    - LP: #1471170
  * USB: cp210x: add ID for HubZ dual ZigBee and Z-Wave dongle
    - LP: #1471170
  * Input: elantech - add new icbody type
    - LP: #1471170
  * MIPS: Fix enabling of DEBUG_STACKOVERFLOW
    - LP: #1471170
  * xfrm: fix a race in xfrm_state_lookup_byspi
    ...

Read more...

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Phil Regnauld (regnauld-f) wrote :

I'm seeing a similar issue with trying to install the latest 16.04.1 i386 edition (http://releases.ubuntu.com/16.04/ubuntu-16.04.1-server-i386.iso) under libvirt/virt-manager, on an amd64 VM running 16.04.1 (4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux).

qemu-kvm hangs, have to virsh destroy/kill the KVM process.

cat /etc/modprobe.d/qemu-system-x86.conf:

options kvm_intel nested=1

... and kvm-ok says acceleration is enabled.

Should I open a new bug report on this ?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Please open a new bug - thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.