lxc c2-m2 focal VM causes KVM internal error during PCI init

Bug #1935880 reported by dann frazier
44
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
linux-kvm (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Launching a 2 CPU 2G VM with lxc often cause a KVM internal error during boot.

Reproducer:
lxc launch ubuntu:20.04 dannf-test2 -t c2-m2 --vm

QEMU will report a KVM internal error:

KVM internal error. Suberror: 3
extra data[0]: 800000ec
extra data[1]: 31
extra data[2]: 81
extra data[3]: 30000
RAX=0000000000000000 RBX=0000000000000001 RCX=0000000000000001 RDX=000000000000021e
RSI=ffff88807851cba8 RDI=0000000000000001 RBP=ffffc90000077e90 RSP=ffffc90000077e78
R8 =0000000029417eca R9 =0000000000000000 R10=0000000000000400 R11=0000000000000400
R12=0000000000000001 R13=ffff8880001c8c80 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff81757a74 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 ffffffff 00c00000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c00000
DS =0000 0000000000000000 ffffffff 00c00000
FS =0000 0000000000000000 ffffffff 00c00000
GS =0000 ffff888078500000 ffffffff 00c00000
LDT=0000 0000000000000000 ffffffff 00c00000
TR =0040 fffffe0000036000 0000206f 00008b00 DPL=0 TSS64-busy
GDT= fffffe0000034000 0000007f
IDT= fffffe0000000000 00000fff
CR0=80050033 CR2=00000000ffffffff CR3=000000000240a000 CR4=001006a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=00 85 c0 7e 07 0f 00 2d 16 93 4b 00 fb f4 8b 05 14 61 78 00 <65> 44 8b 25 b4 86 8b 7e 85 c0 0f 8f 85 00 00 00 5b 41 5c 41 5d 5d c3 65 8b 05 9e 86 8b 7e

The last lines on the console:
acpi PNP0A08:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
acpi PNP0A08:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
pci_bus 0000:00: root bus resource [mem 0x7a100000-0xafffffff window]
pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
pci_bus 0000:00: root bus resource [mem 0x800000000-0xfffffffff window]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:00.0: [8086:29c0] type 00 class 0x060000
pci 0000:00:01.0: [1b36:000c] type 01 class 0x060400
pci 0000:00:01.0: reg 0x10: [mem 0xc1245000-0xc1245fff]
pci 0000:00:01.1: [1b36:000c] type 01 class 0x060400
pci 0000:00:01.1: reg 0x10: [mem 0xc1244000-0xc1244fff]
pci 0000:00:01.2: [1b36:000c] type 01 class 0x060400
pci 0000:00:01.2: reg 0x10: [mem 0xc1243000-0xc1243fff]
pci 0000:00:01.3: [1b36:000c] type 01 class 0x060400

Revision history for this message
dann frazier (dannf) wrote :
Download full text (7.9 KiB)

I attached gdb to the VM[*] and retrieved the following backtrace from the guest kernel:

(gdb) bt full
#0 pci_conf1_read (seg=<optimized out>, bus=<optimized out>, devfn=<optimized out>, reg=16, len=4, value=0xffffc9000001fa0c) at /home/dannf/linux-kvm-5.4.0/arch/x86/pci/direct.c:35
        flags = 582
        __dummy = <optimized out>
        __dummy2 = <optimized out>
        __dummy = <optimized out>
        __dummy2 = <optimized out>
#1 0xffffffff815a1964 in raw_pci_read (domain=<optimized out>, bus=<optimized out>, devfn=<optimized out>, reg=<optimized out>, len=<optimized out>, val=<optimized out>) at /home/dannf/linux-kvm-5.4.0/arch/x86/pci/common.c:46
No locals.
#2 0xffffffff815a19a7 in pci_read (bus=<optimized out>, devfn=<optimized out>, where=<optimized out>, size=<optimized out>, value=<optimized out>) at /home/dannf/linux-kvm-5.4.0/arch/x86/include/asm/pci.h:45
No locals.
#3 0xffffffff8144f4c5 in pci_bus_read_config_dword (bus=<optimized out>, devfn=<optimized out>, pos=<optimized out>, value=0xffffc9000001fa58) at /home/dannf/linux-kvm-5.4.0/drivers/pci/access.c:65
        res = -2147480816
        flags = <optimized out>
        data = 0
#4 0xffffffff8144f7f2 in pci_read_config_dword (dev=<optimized out>, where=<optimized out>, val=<optimized out>) at /home/dannf/linux-kvm-5.4.0/drivers/pci/access.c:550
No locals.
#5 0xffffffff8145142a in __pci_read_base (dev=0xffff888000d8d000, type=pci_bar_unknown, res=0xffff888000d8d278, pos=16) at /home/dannf/linux-kvm-5.4.0/drivers/pci/probe.c:196
        l = 0
        sz = 0
        mask = <optimized out>
        l64 = <optimized out>
        sz64 = <optimized out>
        mask64 = <optimized out>
        orig_cmd = 7
        region = {start = 18446683600570153584, end = 18446744071584749924}
        inverted_region = {start = 18446683600570153600, end = 18446744071584749991}
#6 0xffffffff81451757 in pci_read_bases (dev=0xffff888000d8d000, howmany=2, rom=56) at /home/dannf/linux-kvm-5.4.0/drivers/pci/probe.c:334
        res = <optimized out>
        pos = 0
        reg = <optimized out>
        res = <optimized out>
#7 0xffffffff81451dd4 in pci_setup_device (dev=0xffff888000d8d000) at /home/dannf/linux-kvm-5.4.0/drivers/pci/probe.c:1854
        class = 1540
        cmd = 34750
        hdr_type = <optimized out>
        pos = <optimized out>
        region = {start = 18446683600670950407, end = 18446744071583369946}
        res = <optimized out>
#8 0xffffffff81452950 in pci_scan_device (devfn=<optimized out>, bus=<optimized out>) at /home/dannf/linux-kvm-5.4.0/drivers/pci/probe.c:2301
        dev = 0xffff888000d8d000
        l = 793398
        dev = <optimized out>
        l = <optimized out>
#9 pci_scan_single_device (devfn=<optimized out>, bus=<optimized out>) at /home/dannf/linux-kvm-5.4.0/drivers/pci/probe.c:2474
        dev = <optimized out>
        dev = <optimized out>
#10 pci_scan_single_device (bus=0xffff888000c76c00, devfn=11) at /home/dannf/linux-kvm-5.4.0/drivers/pci/probe.c:2464
        dev = 0x0 <fixed_percpu_data>
#11 0xffffffff81452a02 in pci_scan_slot (bus=0xffff888000c76c00, devfn=8) at /home/dannf/linux-kvm-5.4.0/drivers/pci/probe.c:2560
        fn = 3
       ...

Read more...

Changed in linux (Ubuntu):
status: New → Confirmed
description: updated
Revision history for this message
dann frazier (dannf) wrote :

I filed this issue on the kernel because it initially appeared to follow the host kernel version. While it's very reproducible w/ v5.4, I wasn't able to reproduce it with v5.8 or v5.7. However, I've now found that was a false negative - it is still reproducible with v5.13, so the kernel may not be the correct component.

Revision history for this message
dann frazier (dannf) wrote :

I got back to this a couple days ago, here's what I've learned.

I tried decoding the QEMU syndrome - but be warned, I've never done this before. I believe what it is reporting is an EPT[*] misconfiguration.

I noticed that the 5.4 -generic kernel does not reproduce the issue, while the -kvm one does. I tried comparing the configs, but there's really nothing obvious there. Disabling DYNAMIC_FTRACE did seem to cause the -generic kernel to fail, but turning it on did not fix the -kvm kernel, so that seems like a red herring.

I noticed that newer guest kernels do not seem to reproduce the problem. I bisected the guest kernels and hit this commit:

  f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM

This was introduced upstream in v5.13 and appears to be a workaround for BIOSes that corrupt memory under 1M. By default, both the -generic and -kvm kernels build with CONFIG_X86_RESERVE_LOW=64, which avoids using the first 64K of memory. So this suggests that something might be getting corrupted in the 64K->1M region. There's also a kernel parameter you can use to modify this called "reservelow", and if I add "reservelow=1024k" to the cmdline, the -kvm kernel no longer crashes.

It seems like some kind of corruption maybe going on - possibly by edk2/ovmf. As a next step I think I'll try to figure out if I can run the VM under GDB and have it trap writes to that memory area.

[*] https://en.wikipedia.org/wiki/Second_Level_Address_Translation#Extended_Page_Tables

Revision history for this message
Stéphane Graber (stgraber) wrote :

Adding linux-kvm to the bug. It looks like if we can have the commit above backported, it would take care of this issue for most users.

Changed in linux-kvm (Ubuntu):
status: New → Confirmed
Revision history for this message
dann frazier (dannf) wrote :

fyi, I do recall trying what I suggested in Comment #3, trying to debug under GDB. However, I didn't find a way to trap writes to that region, and didn't have any other good ideas.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.