[Zesty/Artful] On ARM64 PCIE physical function passthrough guest fails to boot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Critical
|
Manoj Iyer | ||
Zesty |
Won't Fix
|
Critical
|
Manoj Iyer | ||
Artful |
Fix Released
|
Critical
|
Manoj Iyer | ||
Bionic |
Fix Released
|
Critical
|
Manoj Iyer |
Bug Description
[Impact]
Passing through a physical function like the Mellanox PCIE ethernet controller causes the guest to fail booting, and host reports Hardware Error.
== Host ==
[109920.834703] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[109920.842142] {1}[Hardware Error]: event severity: recoverable
[109920.847848] {1}[Hardware Error]: precise tstamp: 2017-11-16 23:20:05
[109920.854385] {1}[Hardware Error]: Error 0, type: recoverable
[109920.860111] {1}[Hardware Error]: section_type: PCIe error
[109920.865718] {1}[Hardware Error]: port_type: 0, PCIe end point
[109920.871708] {1}[Hardware Error]: version: 3.0
[109920.876343] {1}[Hardware Error]: command: 0x0006, status: 0x0010
[109920.882559] {1}[Hardware Error]: device_id: 0000:01:00.0
[109920.888113] {1}[Hardware Error]: slot: 0
[109920.892285] {1}[Hardware Error]: secondary_bus: 0x00
[109920.897489] {1}[Hardware Error]: vendor_id: 0x15b3, device_id: 0x1013
[109920.904172] {1}[Hardware Error]: class_code: 000002
[109920.909378] vfio-pci 0000:01:00.0: aer_status: 0x00040000, aer_mask: 0x00000000
[109920.916675] Malformed TLP
[109920.916678] vfio-pci 0000:01:00.0: aer_layer=
[109920.924573] vfio-pci 0000:01:00.0: aer_uncor_severity: 0x00062010
[109920.930736] vfio-pci 0000:01:00.0: TLP Header: 4a008040 00000100 01000000 00000000
[109920.938548] vfio-pci 0000:01:00.0: broadcast error_detected message
[109921.965056] pcieport 0000:00:00.0: downstream link has been reset
[109921.965062] vfio-pci 0000:01:00.0: broadcast mmio_enabled message
[109921.965066] vfio-pci 0000:01:00.0: broadcast resume message
[109921.965070] vfio-pci 0000:01:00.0: AER: Device recovery successful
== Guest ==
EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
[ 1.518252] kvm [1]: HYP mode not available
[ 2.578929] mlx5_core 0000:05:00.0: mlx5_core_
[ 2.582424] mlx5_core 0000:05:00.0: failed to set issi
[ 2.616756] mlx5_core 0000:05:00.0: mlx5_load_one failed with error code -1
This is because, virtualization of physical functions are broken on systems with Maximum Payload Size bigger than 128. QDF2400 FW tries to maximize this setting. We have observed an MPS of 512 on QDF2400 systems.
[Fix]
Patches are in linux-next:
523184972b28 vfio/pci: Virtualize Maximum Payload Size
cf0d53ba4947 vfio/pci: Virtualize Maximum Read Request Size
[Testing]
With the above patches applied the guest is able to boot when PCIE physical function is passthrough and we don't see the errors on the host system.
== On the Guest ==
ubuntu@
00:00.0 Host bridge: Red Hat, Inc. Device 0008
00:01.0 PCI bridge: Red Hat, Inc. Device 000c
00:01.1 PCI bridge: Red Hat, Inc. Device 000c
00:01.2 PCI bridge: Red Hat, Inc. Device 000c
00:01.3 PCI bridge: Red Hat, Inc. Device 000c
00:01.4 PCI bridge: Red Hat, Inc. Device 000c
00:01.5 PCI bridge: Red Hat, Inc. Device 000c
01:00.0 Ethernet controller: Red Hat, Inc Virtio network device (rev 01)
02:00.0 Communication controller: Red Hat, Inc Virtio console (rev 01)
03:00.0 SCSI storage controller: Red Hat, Inc Virtio block device (rev 01)
04:00.0 SCSI storage controller: Red Hat, Inc Virtio block device (rev 01)
05:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
ubuntu@
mlx5_core 471040 0
devlink 36864 1 mlx5_core
ptp 28672 1 mlx5_core
[Regression Potential]
Two patches to drivers/vfio/pci were cleanly cherry picked from linux-next and applied to Artful/Zesty. Tested on ARM64 QDF2400 system and no regressions were found.
CVE References
description: | updated |
description: | updated |
Changed in linux (Ubuntu): | |
assignee: | Manoj Iyer (manjo) → Canonical Kernel Team (canonical-kernel-team) |
status: | Incomplete → In Progress |
Changed in linux (Ubuntu Artful): | |
status: | New → In Progress |
Changed in linux (Ubuntu Zesty): | |
status: | New → In Progress |
Changed in linux (Ubuntu Artful): | |
importance: | Undecided → Critical |
Changed in linux (Ubuntu Zesty): | |
importance: | Undecided → Critical |
Changed in linux (Ubuntu Bionic): | |
assignee: | Canonical Kernel Team (canonical-kernel-team) → Manoj Iyer (manjo) |
Changed in linux (Ubuntu Artful): | |
assignee: | nobody → Manoj Iyer (manjo) |
Changed in linux (Ubuntu Zesty): | |
assignee: | nobody → Manoj Iyer (manjo) |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Artful): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu): | |
status: | Fix Committed → Fix Released |
Changed in linux (Ubuntu Bionic): | |
status: | Fix Committed → Fix Released |
tags: | added: cscc |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1732804
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.