Ubuntu
qemu package

Not able to passthrough > 32 PCIe devices to a KVM Guest

Bug #1771238 reported by David Coronel on 2018-05-15

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Invalid	Undecided	Unassigned
	qemu (Ubuntu)	Invalid	Undecided	Unassigned

Bug Description

Using an Ubuntu Server 16.04-based host with KVM hypervisor installed, we are unable to launch a vanilla Ubuntu Server 16.04.4 guest with >= 32 PCIe devices. It is 100% reproducible. Using fewer PCIe devices works fine. We are using the vanilla kvm and qemu packages from the Canonical repos.

The ultimate goal is to create a KVM Guest wherein I can passthrough 44 PCI devices.

When a KVM Guest launches, it also has some internal PCIe devices including host bridge, USB, IDE (for virtual disk), and virtual nic etc.

Script used to launch all devices looks like this:

#!/bin/bash
NAME=16gpuvm

sudo qemu-img create -f qcow2 /home/lab/kvm/images/${NAME}.img 40G &&
sudo virt-install \
--name ${NAME} \
--ram 716800 \
--vcpus 88 \
--disk path=/home/lab/kvm/images/${NAME}.img,format=qcow2 \
--network bridge=virbr0 \
--graphics none \
--host-device 34:00.0 \
--host-device 36:00.0 \
--host-device 39:00.0 \
--host-device 3b:00.0 \
--host-device 57:00.0 \
--host-device 59:00.0 \
--host-device 5c:00.0 \
--host-device 5e:00.0 \
--host-device 61:00.0 \
--host-device 62:00.0 \
--host-device 63:00.0 \
--host-device 65:00.0 \
--host-device 66:00.0 \
--host-device 67:00.0 \
--host-device 35:00.0 \
--host-device 3a:00.0 \
--host-device 58:00.0 \
--host-device 5d:00.0 \
--host-device 2e:00.0 \
--host-device 2f:00.0 \
--host-device 51:00.0 \
--host-device 52:00.0 \
--host-device b7:00.0 \
--host-device b9:00.0 \
--host-device bc:00.0 \
--host-device be:00.0 \
--host-device e0:00.0 \
--host-device e2:00.0 \
--host-device e5:00.0 \
--host-device e7:00.0 \
--host-device c1:00.0 \
--host-device c2:00.0 \
--host-device c3:00.0 \
--host-device c5:00.0 \
--host-device c6:00.0 \
--host-device c7:00.0 \
--host-device b8:00.0 \
--host-device bd:00.0 \
--host-device e1:00.0 \
--host-device e6:00.0 \
--host-device b1:00.0 \
--host-device b2:00.0 \
--host-device da:00.0 \
--host-device db:00.0 \
--console pty,target_type=serial \
--location http://ftp.ubuntu.com/ubuntu/dists/xenial/main/installer-amd64 \
--initrd-inject=/home/lab/kvm/images/preseed.cfg \
--extra-args="
console=ttyS0,115200
locale=en_US
console-keymaps-at/keymap=us
console-setup/ask_detect=false
console-setup/layoutcode=us
keyboard-configuration/layout=USA
keyboard-configuration/variant=USA
hostname=${NAME}
file=file:/preseed.cfg
"

Passing > 32 device causes this issue: 32nd device hits a DPC error and the host/HV crashes:

Apr 25 22:34:35 xpl-evt-16 kernel: [18125.977496] dpc 0000:5b:10.0:pcie210: DPC containment event, status:0x0009 source:0x0000
Apr 25 22:34:35 xpl-evt-16 kernel: [18125.977500] dpc 0000:5b:10.0:pcie210: DPC unmasked uncorrectable error detected, remove downstream devices
Apr 25 22:34:35 xpl-evt-16 kernel: [18125.994326] vfio-pci 0000:5e:00.0: Refused to change power state, currently in D3
Apr 25 22:34:35 xpl-evt-16 kernel: [18125.994427] iommu: Removing device 0000:5e:00.0 from group 92

From syslog (attached)

Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194358] dpc 0000:bb:04.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194387] dpc 0000:bb:10.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194413] dpc 0000:d9:00.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194439] dpc 0000:d9:01.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194472] dpc 0000:d9:02.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194499] dpc 0000:d9:03.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194526] dpc 0000:d9:04.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194553] dpc 0000:d9:0c.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194583] dpc 0000:df:00.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194619] dpc 0000:df:04.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194649] dpc 0000:df:10.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194679] dpc 0000:e4:00.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194709] dpc 0000:e4:04.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194742] dpc 0000:e4:10.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+
2.194763] pciehp 0000:00:1c.0:pcie004: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ LLActRep+
2.195036] pciehp 0000:60:02.0:pcie204: Slot #2 AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- LLActRep+
2.195278] pciehp 0000:60:0a.0:pcie204: Slot #10 AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- LLActRep+
2.195513] pciehp 0000:c0:02.0:pcie204: Slot #2 AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- LLActRep+
2.195753] pciehp 0000:c0:0a.0:pcie204: Slot #10 AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- LLActRep+
2.196196] efifb: probing for efifb
2.196242] efifb: framebuffer at 0x9c000000, using 1920k, total 1920k
2.196247] efifb: mode is 800x600x32, linelength=3200, pages=1
2.196250] efifb: scrolling: redraw
2.196254] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
2.206652] Console: switching to colour frame buffer device 100x37
2.217034] fb0: EFI VGA frame buffer device
2.217173] intel_idle: MWAIT substates: 0x2020
2.217174] intel_idle: v0.4.1 model 0x55
2.220874] intel_idle: lapic_timer_reliable_states 0xffffffff
2.221219] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
2.221590] ACPI: Power Button [PWRF]
2.231089] ERST: Error Record Serialization Table (ERST) support is initialized.
2.231312] pstore: using zlib compression
2.231444] pstore: Registered erst as persistent store backend
2.232503] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.

All PCI devices go offline include NVMe.

OS Drives go away, RAID-1 is remounted as RO, and eventually system crashes.

Apr 25 22:37:13 xpl-evt-16 rsyslogd-2007: action 'action 9' suspended, next retry is Wed Apr 25 22:37:43 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Apr 25 22:37:13 xpl-evt-16 systemd-udevd[1383]: Process '/sbin/mdadm --incremental /dev/nvme1n1p2 --offroot' failed with exit code 1.
Apr 25 22:37:13 xpl-evt-16 systemd-udevd[1371]: Process '/sbin/mdadm --incremental /dev/nvme0n1p2 --offroot' failed with exit code 1.
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Starting Apply Kernel Variables...
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Mounted Configuration File System.
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Mounted FUSE Control File System.
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Started Apply Kernel Variables.
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Found device /dev/disk/by-uuid/269E-631A.
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Starting File System Check on /dev/disk/by-uuid/269E-631A...
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Started File System Check Daemon to report status.
Apr 25 22:37:13 xpl-evt-16 systemd-fsck[1576]: fsck.fat 3.0.28 (2015-05-16)
Apr 25 22:37:13 xpl-evt-16 systemd-fsck[1576]: /dev/nvme0n1p1: 10 files, 1168/130812 clusters
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Started File System Check on /dev/disk/by-uuid/269E-631A.
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Mounting /boot/efi...
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Mounted /boot/efi.
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Reached target Local File Systems.
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Starting Preprocess NFS configuration...
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Starting Create Volatile Files and Directories...
Apr 25 22:37:13 xpl-evt-16 systemd-tmpfiles[1714]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Apr 25 22:37:13 xpl-evt-16 systemd[1]: Starting openibd - configure Mellanox devices...
Apr 25 22:37:13 xpl-evt-16 kernel: [ 0.000000] random: get_random_bytes called from start_kernel+0x42/0x50d with crng_init=0

David Coronel (davecore) on 2018-05-15

Changed in qemu (Ubuntu):
importance:	Undecided → Critical

David Coronel (davecore) on 2018-05-15

Changed in qemu (Ubuntu):
importance:	Critical → High

Revision history for this message

Thomas Huth (th-huth) wrote on 2018-05-15:

If the host kernel crashes, this is certainly rather a KVM bug than a QEMU bug, so you should report this to the KVM / kernel mailing list instead of opening an (upstream) QEMU bug ticket.

David Coronel (davecore) on 2018-05-15

Changed in qemu (Ubuntu):
importance:	High → Undecided

Revision history for this message

Khaled El Mously (kmously) wrote on 2018-05-15:

@David Coronel: It's not clear to me - is this a regression?

Revision history for this message

David Coronel (davecore) wrote on 2018-05-15:

@Khaled El Mously: It's more a feature request.

no longer affects:

qemu

Revision history for this message

David Coronel (davecore) wrote on 2018-05-15:

I removed upstream QEMU from this bug pending further analysis and so we can go through the right channels if this needs to go upstream.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-05-16:

At least for the DPC protection it seems more a kernel issue in regard to PCIe handling than anything else for now. Due to that I'm adding a task for the kernel as well.

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2018-05-16: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1771238

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Andy Whitcroft (apw) on 2018-05-16

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Alex Williamson (alex-l-williamson) wrote on 2018-05-17:

Agreed with the initial analysis, there's nothing in device assignment to limit to 32 devices except where downstream distros have intentionally added a limit for support purposes. The issue here is that the host hit a PCIe Downstream Port Containment uncorrectable error, apparently causing at least a sub-hierarchy of the PCIe topology to go offline. This is potentially more likely a hardware issue than a software issue. It may be possible to mask the issue by unbinding the interconnect devices in the affected sub-hierarchy from the dpc driver. It might also be interesting to test with a subset of devices to understand if there are specific devices triggering spurious DPC errors, it may only be a sub-set or single device triggering spurious errors, or perhaps it's the succession of bus resets for GPU assignment that trigger such a fault. The system firmware logs may provide additional information regarding the source(s) of the fault.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-06-06:

Thanks Alex for cross checking.
I got handed a seabios change that was mentioned to make this work.
I'll clean it up and build/test on my own.

Once I have a good feeling I'll submit an RFC upstream for review and set you on CC.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-06-11:

I got access to such a machine and successfully added >32 cards via hotplug as well as statically in the initial guest xml of libvirt.

I face maybe related issue around "accel/kvm/kvm-all.c:952: kvm_irqchip_commit_routes: Assertion `ret == 0' failed." now but noting seems like the old DPC issue.

Closing this bug (and considering to open a new one for the different issue mentioned above).

Changed in qemu (Ubuntu):
status:	New → Invalid
Changed in linux (Ubuntu):
status:	Confirmed → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuqemu package

Not able to passthrough > 32 PCIe devices to a KVM Guest

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
qemu package