[SRU]PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU

Bug #1937295 reported by Jeff Lane 
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Undecided
Unassigned
subiquity
Invalid
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Medium
Jeff Lane 
Bionic
Invalid
Undecided
Unassigned
Focal
In Progress
Medium
Jeff Lane 
Impish
Fix Released
Medium
Jeff Lane 
Jammy
Fix Released
Medium
Jeff Lane 
linux-oem-5.14 (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned
Focal
Won't Fix
Undecided
koba
Impish
Invalid
Undecided
Unassigned
Jammy
Invalid
Undecided
Unassigned
linux-oem-5.17 (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Impish
Invalid
Undecided
Unassigned
Jammy
Invalid
Undecided
koba

Bug Description

[Impact]

A hardware partner discovered they were unable to install Ubuntu on some servers using VROC setups. They point to this issue involving DMAR that is blocking discovery of the VROC RAID devices:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2565e5b69c44b4e42469afea3cc5a97e74d1ed45

`git bisect` points to this offending commit ee81ee84f873 ("PCI:
vmd: Disable MSI-X remapping when possible"), which disables VMD MSI
remapping. The IOMMU hardware blocks the compatibility format
interrupt request because Interrupt Remapping Enable Status (IRES) and
Extended Interrupt Mode Enable (EIME) are enabled. Please refer to
section "5.1.4 Interrupt-Remapping Hardware Operation" in Intel VT-d
spec.

To fix the issue, VMD driver still enables the interrupt remapping
irrespective of VMD_FEAT_CAN_BYPASS_MSI_REMAP if the IOMMU subsystem
enables the interrupt remapping.

[Fix]

2565e5b69c44 PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU

[Test Plan]

    1. Boot into VRoC controller in uEFI Setup and create a raid10 disk.
    2. Install affected Ubuntu release on the RAID10.
    3. The system hangs at "Partitions formatting 33%".

[Where problems could occur]

The fix itself is a very small change to drivers/pci/controller/vmd.c and problems should not occur. The root cause was discovered by the hardware partner's engineers, who tested and submitted it upstream where it was accepted and landed in 5.16.

That said, I doubt this will fix 18.04.6 as it would require a respin to get the patched kernel onto the ISO. 20.04 should pick it up in ISO in 20.04.5, so there could still be the initial issue since those ISOs would be lacking the patched kernel.

[Other Info]

As noted, this would need to not only land in the kernel but land in the kernel in the ISO to resolve the issue in the installation process. I'll bring this back as far as Focal with the expectation that while 20.04.4 is too late, it will be present using the GA kernel in 20.04.5 later on.

*************************************************************************

Original Bug Summary:

A hardware partner has been testing 18.04 subiquity images on their servers with VROC enabled and configured in a RAID 10 setup.

In their own words:
Steps to reproduce:

    1. Boot into VRoC controller in uEFI Setup and create a raid10 disk.

    2. Install Ubuntu 18.04.5 on the RAID10.

    3. The system hangs at "Partitions formatting 33%".

After looking at the launchpad (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/), the fix was included in the updated kernel.

   [Quotes from the launchpad]

      The released kernels are:

              Hirsute: 5.11.0-22-generic
              Groovy: 5.8.0-59-generic
              Focal: 5.4.0-77-generic
              Bionic: 4.15.0-147-generic

I've asked them to also confirm this on 20.04.2, and check that 20.04.3 dailies fix the issue.

It is at least a very reasonable hypothesis that this will also break on all current ISO installs as none of them are respun once released to include updated SRUs in the installation media. This currently affects 20.04.2 but that will be resolved shortly when 20.04.3 releases as the GA and HWE kernels in that image should have the SRU that fixes this issue. However, 18.04 has no further releases, and even the 18.04.5 daily-live and daily images on cdimages.ubuntu.com are not built after 18.04.5 was released.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I'm a bit confused, I wouldn't expect vroc installs to work at all with 18.04.5. Is this after updating to the edge channel or something?

affects: syslinux (Ubuntu) → subiquity
Revision history for this message
shangsong (shangsong2) wrote :

Hi Michael,
  Yes, the VROC disk can be shown after update edge channel under 18.04.5.

Revision history for this message
shangsong (shangsong2) wrote :

Try to install latest 20.10, the installer also crash and resyncing also need to take a long time(About half an hour).

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

The 20.10 ISO probably has a bad kernel on too. Can you try the one from http://cdimage.ubuntu.com/ubuntu-server/daily-live/current/? That will become 20.04.3 in a few days.

I don't have any good ideas for bionic I'm afraid. It's probably possible to cobble together an ISO with a newer kernel but it would be a bit of a hack.

Revision history for this message
shangsong (shangsong2) wrote :

Hi Michel,
  The latest image(impish-live-server-amd64.iso 2021-08-24 11:00) is more worse and vroc disk can not dispaly at installer.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Re: [Bug 1937295] Re: Installation hangs on VROC systems during Bionic installs

I don't see any evidence of a vroc (or any raid) device in the udev data in
that report, so I think something must be broken at a lower level than the
installer.

On Wed, 25 Aug 2021 at 22:41, shangsong <email address hidden> wrote:

> Hi Michel,
> The latest image(impish-live-server-amd64.iso 2021-08-24 11:00) is more
> worse and vroc disk can not dispaly at installer.
>
> ** Attachment added:
> "sosreport-ubuntu-server-raid10-2021-08-25-nbgxkng.tar.xz"
>
> https://bugs.launchpad.net/subiquity/+bug/1937295/+attachment/5520451/+files/sosreport-ubuntu-server-raid10-2021-08-25-nbgxkng.tar.xz
>
> --
> You received this bug notification because you are subscribed to
> subiquity.
> https://bugs.launchpad.net/bugs/1937295
>
> Title:
> Installation hangs on VROC systems during Bionic installs
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/subiquity/+bug/1937295/+subscriptions
>
>

Revision history for this message
Jeff Lane  (bladernr) wrote : Re: Installation hangs on VROC systems during Bionic installs

They will retry the daily for Impish to see if this is fixed there

Revision history for this message
shangsong (shangsong2) wrote :

Ubuntu server 18.04.6 LTS with HWE cannot discover disk with DMAR error

Revision history for this message
Adrian Huang (ahuang12) wrote :

[from kern.log]
> Jan 18 13:21:31 ubuntu-server kernel: [ 3.997351] DMAR: DRHD: handling fault status reg 2
> Jan 18 13:21:31 ubuntu-server kernel: [ 3.997647] DMAR: [INTR-REMAP] Request device [64:00.5] fault index 0 [fault reason 37] Blocked a compatibility format interrupt request

The DMAR error is the known issue. Upstream commit 2565e5b69c44 ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU") has fixed the issue. Please consider to include the commit.

Fix commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2565e5b69c44b4e42469afea3cc5a97e74d1ed45

Jeff Lane  (bladernr)
Changed in subiquity:
status: New → Incomplete
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1937295

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: New → Incomplete
Changed in linux (Ubuntu Focal):
status: New → Incomplete
Changed in linux (Ubuntu Impish):
status: New → Incomplete
Jeff Lane  (bladernr)
summary: - Installation hangs on VROC systems during Bionic installs
+ Installation hangs on VROC systems during Bionic installsPCI: vmd: Do
+ not disable MSI-X remapping if interrupt remapping is enabled by IOMMU
summary: - Installation hangs on VROC systems during Bionic installsPCI: vmd: Do
- not disable MSI-X remapping if interrupt remapping is enabled by IOMMU
+ PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is
+ enabled by IOMMU
Jeff Lane  (bladernr)
Changed in linux (Ubuntu Bionic):
status: Incomplete → Invalid
Changed in linux (Ubuntu Focal):
status: Incomplete → In Progress
Changed in linux (Ubuntu Impish):
status: Incomplete → In Progress
Changed in linux (Ubuntu Jammy):
status: Incomplete → In Progress
Changed in linux (Ubuntu Focal):
assignee: nobody → Jeff Lane (bladernr)
Changed in linux (Ubuntu Impish):
assignee: nobody → Jeff Lane (bladernr)
Changed in linux (Ubuntu Jammy):
assignee: nobody → Jeff Lane (bladernr)
importance: Undecided → Medium
Changed in linux (Ubuntu Impish):
importance: Undecided → Medium
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
description: updated
description: updated
Jeff Lane  (bladernr)
summary: - PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is
+ [SRU]PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is
enabled by IOMMU
Revision history for this message
Jeff Lane  (bladernr) wrote :

Marking it invalid for subiquity, since they seem pretty confident this is a kernel problem rather than a subiquity one.

Changed in subiquity:
status: Incomplete → Invalid
Changed in linux (Ubuntu Impish):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.13.0-32.35 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-impish' to 'verification-done-impish'. If the problem still exists, change the tag 'verification-needed-impish' to 'verification-failed-impish'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-impish
Revision history for this message
Adrian Huang (ahuang12) wrote :

Would it be possible to include the fix patch in Ubuntu 22.04? We need Ubuntu 22.04 (Jammy) daily build because the issue can be produced during the installation stage with VMD enabled.

Revision history for this message
shangsong (shangsong2) wrote :

It does not reproduce after update linux 5.13.0-32.35 on impish.

Revision history for this message
Adrian Huang (ahuang12) wrote :

BTW, would it be possible to include the upstream commit (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/iommu/intel?id=b00833768e170a31af09268f7ab96aecfcca9623) in Ubuntu 22.04 kernel?

This is another VMD issue. It would be appreciated if it can be merged in 22.04 kernel.

The commit is also in stable kernel v5.15.27.

Revision history for this message
Adrian Huang (ahuang12) wrote :

Thanks, I saw two commits are in Ubuntu-5.15.0-23.23.

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Adding verification-done-impish based on comment #14. Thank you for the verification.

tags: added: verification-done-impish
removed: verification-needed-impish
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure-5.13/5.13.0-1019.21~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (67.6 KiB)

This bug was fixed in the package linux - 5.15.0-23.23

---------------
linux (5.15.0-23.23) jammy; urgency=medium

  * jammy/linux: 5.15.0-23.23 -proposed tracker (LP: #1964573)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync dkms-build{,--nvidia-N} from LRMv5
    - debian/dkms-versions -- update from kernel-versions (main/master)

  * [22.04 FEAT] KVM: Enable GISA support for Secure Execution guests
    (LP: #1959977)
    - KVM: s390: pv: make use of ultravisor AIV support

  * intel_iommu breaks Intel IPU6 camera: isys port open ready failed -16
    (LP: #1958004)
    - SAUCE: iommu: intel-ipu: use IOMMU passthrough mode for Intel IPUs

  * CVE-2022-23960
    - ARM: report Spectre v2 status through sysfs
    - ARM: early traps initialisation
    - ARM: use LOADADDR() to get load address of sections
    - ARM: Spectre-BHB workaround
    - ARM: include unprivileged BPF status in Spectre V2 reporting
    - arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
    - arm64: Add HWCAP for self-synchronising virtual counter
    - arm64: Add Cortex-X2 CPU part definition
    - arm64: add ID_AA64ISAR2_EL1 sys register
    - arm64: cpufeature: add HWCAP for FEAT_AFP
    - arm64: cpufeature: add HWCAP for FEAT_RPRES
    - arm64: entry.S: Add ventry overflow sanity checks
    - arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
    - KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
    - arm64: entry: Make the trampoline cleanup optional
    - arm64: entry: Free up another register on kpti's tramp_exit path
    - arm64: entry: Move the trampoline data page before the text page
    - arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
    - arm64: entry: Don't assume tramp_vectors is the start of the vectors
    - arm64: entry: Move trampoline macros out of ifdef'd section
    - arm64: entry: Make the kpti trampoline's kpti sequence optional
    - arm64: entry: Allow the trampoline text to occupy multiple pages
    - arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
    - arm64: entry: Add vectors that have the bhb mitigation sequences
    - arm64: entry: Add macro for reading symbol addresses from the trampoline
    - arm64: Add percpu vectors for EL1
    - arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
    - arm64: Mitigate spectre style branch history side channels
    - KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
    - arm64: Use the clearbhb instruction in mitigations
    - arm64: proton-pack: Include unprivileged eBPF status in Spectre v2
      mitigation reporting
    - ARM: fix build error when BPF_SYSCALL is disabled

  * CVE-2021-26401
    - x86/speculation: Use generic retpoline by default on AMD
    - x86/speculation: Update link to AMD speculation whitepaper
    - x86/speculation: Warn about Spectre v2 LFENCE mitigation
    - x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT

  * CVE-2022-0001
    - x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
    - x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
    - x86/speculation: Add eIBRS + Retpoline options
    - Document...

Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (49.8 KiB)

This bug was fixed in the package linux - 5.13.0-37.42

---------------
linux (5.13.0-37.42) impish; urgency=medium

  * impish/linux: 5.13.0-37.42 -proposed tracker (LP: #1964959)

  * CVE-2022-0742
    - ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()

linux (5.13.0-36.41) impish; urgency=medium

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - debian/dkms-versions -- update from kernel-versions (main/2022.02.21)

  * Broken network on some AWS instances with focal/impish kernels
    (LP: #1961968)
    - SAUCE: Revert "PCI/MSI: Mask MSI-X vectors only on success"

  * [SRU]PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is
    enabled by IOMMU (LP: #1937295)
    - PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled
      by IOMMU

  * [UBUNTU 20.04] kernel: Add support for CPU-MF counter second version 7
    (LP: #1960182)
    - s390/cpumf: Support for CPU Measurement Facility CSVN 7
    - s390/cpumf: Support for CPU Measurement Sampling Facility LS bit

  * [UBUNTU 21.10] s390/cio: verify the driver availability for path_event call
    (LP: #1960875)
    - s390/cio: verify the driver availability for path_event call

  * Impish update: upstream stable patchset 2022-02-14 (LP: #1960861)
    - devtmpfs regression fix: reconfigure on each mount
    - orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc()
    - remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided
    - perf: Protect perf_guest_cbs with RCU
    - KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest
    - KVM: s390: Clarify SIGP orders versus STOP/RESTART
    - 9p: only copy valid iattrs in 9P2000.L setattr implementation
    - video: vga16fb: Only probe for EGA and VGA 16 color graphic cards
    - media: uvcvideo: fix division by zero at stream start
    - rtlwifi: rtl8192cu: Fix WARNING when calling local_irq_restore() with
      interrupts enabled
    - firmware: qemu_fw_cfg: fix sysfs information leak
    - firmware: qemu_fw_cfg: fix NULL-pointer deref on duplicate entries
    - firmware: qemu_fw_cfg: fix kobject leak in probe error path
    - KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all
    - ALSA: hda/realtek: Add speaker fixup for some Yoga 15ITL5 devices
    - ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master after
      reboot from Windows
    - ALSA: hda: ALC287: Add Lenovo IdeaPad Slim 9i 14ITL5 speaker quirk
    - ALSA: hda/realtek: Add quirk for Legion Y9000X 2020
    - ALSA: hda/realtek: Re-order quirk entries for Lenovo
    - powerpc/pseries: Get entry and uaccess flush required bits from
      H_GET_CPU_CHARACTERISTICS
    - mtd: fixup CFI on ixp4xx
    - KVM: x86: don't print when fail to read/write pv eoi memory
    - remoteproc: qcom: pas: Add missing power-domain "mxc" for CDSP
    - perf annotate: Avoid TUI crash when navigating in the annotation of
      recursive functions
    - ALSA: hda/realtek: Use ALC285_FIXUP_HP_GPIO_LED on another HP laptop
    - ALSA: hda/tegra: Fix Tegra194 HDA reset failure

  * CVE-2022-0516
    - KVM: s390: Return error on SIDA memop on normal guest

  * CVE-2022-04...

Changed in linux (Ubuntu Impish):
status: Fix Committed → Fix Released
Revision history for this message
shangsong (shangsong2) wrote :

Hi Jeff,
  Now it fail to install Ubuntu server 22.04 with the latest ISO image, but it can pass after update subiquity to the latest. If the latest subiquitythe can be merged into ISO image?

Jeff Lane  (bladernr)
tags: added: verification-done-focal
removed: verification-needed-focal
koba (kobako)
Changed in linux-oem-5.14 (Ubuntu Focal):
assignee: nobody → koba (kobako)
status: New → In Progress
Changed in linux-oem-5.17 (Ubuntu Jammy):
assignee: nobody → koba (kobako)
status: New → In Progress
koba (kobako)
tags: added: oem-priority originate-from-1967153 somerville
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.17 (Ubuntu Jammy):
status: In Progress → Invalid
Changed in linux-oem-5.14 (Ubuntu Bionic):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Impish):
status: New → Invalid
Changed in linux-oem-5.17 (Ubuntu Impish):
status: New → Invalid
Changed in linux-oem-5.17 (Ubuntu Focal):
status: New → Invalid
Changed in linux-oem-5.17 (Ubuntu Bionic):
status: New → Invalid
Changed in linux-oem-5.17 (Ubuntu):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Jammy):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
koba (kobako) wrote :

there's a regression so change it to in-progress

Timo Aaltonen (tjaalton)
Changed in linux-oem-5.14 (Ubuntu Focal):
status: Fix Committed → In Progress
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.14 (Ubuntu Focal):
status: In Progress → Won't Fix
Changed in hwe-next:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.