[Regression] Stuck CPU1-x when booting as Xen HVM guest on certain Intel hosts

Bug #1157757 reported by Stefan Bader on 2013-03-20
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Precise
Undecided
Unassigned
Quantal
Undecided
Unassigned
xen (Ubuntu)
High
Stefan Bader
Precise
Medium
Unassigned
Quantal
Medium
Unassigned

Bug Description

SRU Justification:

Impact: When booting a kernel version 3.5 or later on aHVM guest with multiple VCPUs on a that supports Supervisor Mode Execution Protection (SMEP), only the boot processor is running. All additional VCPUs get stuck.
This happens because Xen is using paging even if the guest VCPU has not enabled paging mode, but the pages are not set up to grant execution rights.

Fix: A set of three patches backported from upstream Xen will mask off the SMEP bit from the hardware register as long as the guest VCPU is not in paging mode.

Testcase: Set up Xen host (Intel CPU that supports SMEP), install a HVM guest (Quantal or later) with more that one VCPU. After boot /proc/cpuinfo only shows one CPU and dmesg contains "Stuck" messages. With the fix, all CPUs come up.

---

Architecture: amd64
Xen version: 4.2.1

When testing I found that when I boot a Xen HVM guest on newer Intel based systems (maybe starting with Sandy Bridge) none of the additional VCPUs come online:

cpu 1 spinlock event irq 70
Booting Node 0, Processors #1
CPU1: Stuck ??

This does not happen on my AMD Opteron host and neither on a box with an old i7 (one of the first ones that came out). This started with kernels between 3.4 and 3.5-rc1, so Quantal and onwards. I was able to limit the range via bisect (unfortunately within that range the kernel does not build):

323f90a xen-acpi-processor: Add missing #include <xen/xen.h>
2ee93ab acpi, bgrd: Add missing <linux/io.h> to drivers/acpi/bgrt.c
638d957 x86, realmode: Change EFER to a single u64 field
1371270 x86, realmode: Move kernel/realmode.c to realmode/init.c
51edbe6 x86, realmode: Move not-common bits out of trampoline_common.S
7960387 x86, realmode: Mask out EFER.LMA when saving trampoline EFER
34d0b02 x86, realmode: Fix no cache bits test in reboot_32.S
0f6f11eb x86, realmode: Make sure all generated files are listed in targets
c5403ae x86, realmode: build fix: remove duplicate build
cda846f x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline
bf8b88e x86, realmode: fixes compilation issue in tboot.c
f2604c1 x86, realmode: move relocs from scripts/ to arch/x86/tools
f37240f x86, realmode: header for trampoline code
c484547 x86, realmode: flattened rm hierachy
b429dbf x86, realmode: don't copy real_mode_header
8e029fc x86, realmode: fix 64-bit wakeup sequence
6feb592 x86, realmode: Fix always-zero test in reboot_32.S
be60828 x86, realmode: Move trampoline_*.S early in the link order
e5684ec x86, realmode: Replace open-coded ljmpw with a macro
968ff9e x86, realmode: Remove indirect jumps in trampoline_32 and wakeup_asm
056a43a x86, realmode: Remove indirect jumps in trampoline_64.S
f7436a9 x86, realmode: Align .data section in trampoline_32.S

Not sure why this only affects certain Intel CPUs, maybe some VMX feature that has some side-effect on the changes in the realmode code.

Stefan Bader (smb) on 2013-03-20
description: updated
Stefan Bader (smb) wrote :

Verified that the issue still exists on v3.9-rc3 upstream.

Stefan Bader (smb) wrote :

A bit more detail gather with dynamic debugging from smpboot.c:

[ 0.060482] Switched APIC routing to physical flat.
[ 0.064169] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=0 pin2=0
[ 0.104917] smpboot: CPU0: Genuine Intel(R) CPU 0000 @ 2.60GHz (fam: 06, model: 3c, stepping: 02)
[ 0.108010] Xen: using vcpuop timer interface
[ 0.108983] installing Xen timer for CPU 0
[ 0.110050] cpu 0 spinlock event irq 70
[ 0.110990] Performance Events: generic architected perfmon, Intel PMU driver.
[ 0.112000] ... version: 3
[ 0.112014] ... bit width: 48
[ 0.112444] ... generic registers: 8
[ 0.112855] ... value mask: 0000ffffffffffff
[ 0.113308] ... max period: 000000007fffffff
[ 0.113773] ... fixed-purpose events: 3
[ 0.114198] ... event mask: 00000007000000ff
[ 0.115947] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 0.116065] register_vcpu_info failed: err=-22
[ 0.116530] cpu 1 spinlock event irq 71
[ 0.117006] smpboot: ++++++++++++++++++++=_---CPU UP 1
[ 0.117454] smpboot: Booting Node 0, Processors #1
[ 0.118035] smpboot: Setting warm reset code and vector.
[ 0.118865] smpboot: Asserting INIT
[ 0.119271] smpboot: Waiting for send to finish...
[ 0.129476] smpboot: Deasserting INIT
[ 0.130030] smpboot: Waiting for send to finish...
[ 0.131008] smpboot: #startup loops: 2
[ 0.131987] smpboot: Sending STARTUP #1
[ 0.132018] smpboot: After apic_write
[ 0.133330] smpboot: Startup point 1
[ 0.134282] smpboot: Waiting for send to finish...
[ 0.135493] smpboot: Sending STARTUP #2
[ 0.136021] smpboot: After apic_write
[ 0.137283] smpboot: Startup point 1
[ 0.138224] smpboot: Waiting for send to finish...
[ 0.139395] smpboot: After Startup
[ 0.140016] smpboot: Before Callout 1
[ 0.140982] smpboot: After Callout 1
[ 5.051050] smpboot: CPU1: Stuck ??
[ 5.051712] smpboot: do_boot_cpu failed 1
[ 5.052062] Brought up 1 CPUs
[ 5.052520] smpboot: Boot done

On Wed, Mar 20, 2013 at 03:51:41PM -0000, Stefan Bader wrote:
> A bit more detail gather with dynamic debugging from smpboot.c:

.. snip..
> [ 0.115947] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
> [ 0.116065] register_vcpu_info failed: err=-22

That fails. Any idea why?

Stefan Bader (smb) wrote :

No, but it also fails in pre v3.5-rc1 kernels (which succeed in bringing up all VCPUs on the same host), so I ignore(d) it for the time being. Certainly another thing to find out.

Stefan Bader (smb) wrote :

Somehow the message "Stuck ??" seems to indicate that the trampoline has run but the VCPU did not report back via the callin flags. I assume that would be done when the IPI signalling succeeds, so it seems (pointing vaguely) something there, Well, interaction of APIC, NMI signalling, ...

Stefan Bader (smb) wrote :

I got around to manually narrow down the bisection by re-ordering the patches in the already obtained range. This got me to

* cda846f x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline

I also got around to wire up my Intel test box to have a real serial port and then use this to handle Xen debug keys. Dumping the registers with a stuck HVM VCPU shows that eax and cr4 are still the same. That would indicate that code execution got at least to the place that assigns CR4 but not much further (EAX would get replaced quite soon).

So the contents written into CR4 were: 0x1407f0. My first suspect was the PGE flag since that looks to be depending on the PG flag in CR0 to be set first. However masking that off had no effect. What turned out to be the offender was the SMEP (supervisor mode execution protection) which is also set in the CR4 contents that seem to be passed in by Xen. By manually masking that off in trampoline_64.S:startup_32 all APs again get started successfully.

Now the question is probably whether the realmode code should be more conservative or whether it is the responsibility of the hypervisor to hide this from the system. Even more as to my understanding the SMEP bit in CR4 should actually not be set at all on this CPU as CPUID[7] does not indicate support in bit7 of EBX (looked at that after a boot into bare-metal mode).

Stefan Bader (smb) wrote :

Some information that I got while discussion this upstream: This is a problem with Xen. In fact the same flags get used when doing a bare metal boot. The explanation for the different behaviour is that Xen does use paging even in non-paging mode of the guest (just a identity mapped table). SMEP would be ignored in real non-paging mode but in the Xen case it is used but the pages are not set up correctly.

Right now this can be worked-around by using "smep=0" as a hypervisor boot argument, or "nosmep" on the grub command line of the guest.

There was already a change that fixes a similar issue but in my testing it seems not to be in effect for this problem.

Stefan Bader (smb) wrote :

It looks like with this specific setup (using xm because of libvirt, and that because to enable the use of libxl in libvirt the packaging in xen has to change to include all required parts in libxen-dev) the problem is that the code in newer Xen versions that would filter SMEP only works if HAP is used (which is not). But somehow despite the comment in the code saying something else, Xen always seems to be in paging mode, even if the guest VCPU is not. This makes me think that maybe the check should only rely on !hvm_paging_enabled(v). Using this patch on the xen source prevents the hangs.

Changed in xen (Ubuntu):
importance: Undecided → High
status: New → In Progress
assignee: nobody → Stefan Bader (stefan-bader-canonical)
tags: added: patch
Stefan Bader (smb) wrote :

While this is caused by a change in the Linux kernel, this only breaks running as a Xen guest because only Xen seems to emulate non-paging mode in a way that uses identity-mapped paging. On bare-metal the SMEP bit has no effect while the CPU is not in paging-mode. So I would not expect upstream to make any changes for that in Linux code.

Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
status: In Progress → Won't Fix
Stefan Bader (smb) on 2013-04-05
Changed in linux (Ubuntu Precise):
status: New → Won't Fix
Changed in linux (Ubuntu Quantal):
status: New → Won't Fix
Changed in xen (Ubuntu Precise):
status: New → Triaged
Changed in xen (Ubuntu Quantal):
status: New → Triaged
Changed in xen (Ubuntu Precise):
importance: Undecided → High
importance: High → Medium
Changed in xen (Ubuntu Quantal):
importance: Undecided → Medium
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xen - 4.2.1-0ubuntu3

---------------
xen (4.2.1-0ubuntu3) raring; urgency=low

  * Fix FTBS on i386
    - 0007-x86-Fix-i386-virtual-apic.patch
  * Fix HVM VCPUs getting stuck on boot when host supports SMEP (LP: #1157757)
    - 0008-vmx-Simplify-cr0-update-handling-by-deferring-cr4-ch.patch
    - 0009-VMX-disable-SMEP-feature-when-guest-is-in-non-paging.patch
    - 0010-VMX-Always-disable-SMEP-when-guest-is-in-non-paging-.patch
 -- Stefan Bader <email address hidden> Fri, 05 Apr 2013 16:39:45 +0200

Changed in xen (Ubuntu):
status: In Progress → Fix Released
Stefan Bader (smb) wrote :
description: updated
Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :

Tested on Intel based host for Quantal and Precise and cross-checked on an AMD based host for Precise.

Hello Stefan, or anyone else affected,

Accepted xen into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/xen/4.1.2-2ubuntu2.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in xen (Ubuntu Precise):
status: Triaged → Fix Committed
tags: added: verification-needed
Changed in xen (Ubuntu Quantal):
status: Triaged → Fix Committed
Adam Conrad (adconrad) wrote :

Hello Stefan, or anyone else affected,

Accepted xen into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/xen/4.1.3-3ubuntu1.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Stefan Bader (smb) wrote :

I got the proposed packages installed (both Quantal and Precise) and see the bug fixed while I have not observed a regression.

tags: added: verification-done
removed: verification-needed

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xen - 4.1.3-3ubuntu1.4

---------------
xen (4.1.3-3ubuntu1.4) quantal-proposed; urgency=low

  * Fix HVM VCPUs getting stuck on boot when host supports SMEP (LP: #1157757)
    - 0008-vmx-Simplify-cr0-update-handling-by-deferring-cr4-ch.patch
    - 0009-VMX-disable-SMEP-feature-when-guest-is-in-non-paging.patch
    - 0010-VMX-Always-disable-SMEP-when-guest-is-in-non-paging-.patch
 -- Stefan Bader <email address hidden> Mon, 08 Apr 2013 14:37:33 +0200

Changed in xen (Ubuntu Quantal):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xen - 4.1.2-2ubuntu2.7

---------------
xen (4.1.2-2ubuntu2.7) precise-proposed; urgency=low

  * Fix HVM VCPUs getting stuck on boot when host supports SMEP (LP: #1157757)
    - 0008-vmx-Simplify-cr0-update-handling-by-deferring-cr4-ch.patch
    - 0009-VMX-disable-SMEP-feature-when-guest-is-in-non-paging.patch
    - 0010-VMX-Always-disable-SMEP-when-guest-is-in-non-paging-.patch
 -- Stefan Bader <email address hidden> Mon, 08 Apr 2013 17:53:45 +0200

Changed in xen (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers