[SRU][22.04.2 & 23.10] OS cannot boot successfully when enabling VMD in UEFI setup

Bug #2020022 reported by Adrian Huang
262
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Jeff Lane 
Kinetic
Won't Fix
Undecided
Unassigned
Lunar
Won't Fix
Undecided
Michael Reed
Mantic
Fix Released
Undecided
Jeff Lane 
Noble
Fix Released
Undecided
Jeff Lane 

Bug Description

[Impact]
When enabling VMD in UEFI setup, OS cannot boot successfully. And, the panic leads to the system reboot. The following log is shown:

[ 166.605518] DMAR: VT-d detected Invalidation Queue Error: Reason f
[ 166.605522] DMAR: VT-d detected Invalidation Time-out Error: SID ffff
[ 166.612445] DMAR: VT-d detected Invalidation Completion Error: SID ffff
[ 166.612447] DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0
[ 166.612449] DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0
...

Additional info:
  * The issue happens on both Lenovo SE350 server and Lenovo SR850 v2 server.

Debugging info and fix commit info:
  * `git bisect` indicates the offending commit is 6aab5622296b ("PCI: vmd: Clean up domain before enumeration"). The root cause is that VMD driver tries to clear a PCI configuration space range when resetting a VMD domain (https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L544), which leads to the failure.

[Fix]
  * Another `git bisect` indicates the fix commit is 20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing). I confirmed that this commit can fix the issue.

Would it be possible to include the commit 20f3337d350c in Ubuntu 22.04.2/23.10 kernel?

[Test Plan]

Reproduce Step
1.Disable Intel VMD in BIOS settings
   System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled

2.Install OS

3.Enable Intel VMD in BIOS settings
   System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled

4.Rebooting will reproduce this issue

[ Where problems could occur ]
* Lenovo SE350 server and Lenovo SR850 v2 server
* The regression leads to the boot failure (cannot boot info OS successfully).

[ Other Info ]

https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/lunar/+ref/enable_vmd_lp_2020022

Revision history for this message
Adrian Huang (ahuang12) wrote :
information type: Public → Private Security
information type: Private Security → Private
summary: - OS cannot boot successfully when enabling VMD in UEFI setup
+ [22.04.2] OS cannot boot successfully when enabling VMD in UEFI setup
summary: - [22.04.2] OS cannot boot successfully when enabling VMD in UEFI setup
+ [22.04.2 & 23.10] OS cannot boot successfully when enabling VMD in UEFI
+ setup
Revision history for this message
Adrian Huang (ahuang12) wrote : Re: [22.04.2 & 23.10] OS cannot boot successfully when enabling VMD in UEFI setup
description: updated
Adrian Huang (ahuang12)
affects: ubuntu → linux-hwe-5.19 (Ubuntu)
Revision history for this message
Jeff Lane  (bladernr) wrote :

Kernels seen 5.19, 6.2 so far.

Can you also try 5.15 (22.04 GA) and 5.4 (20.04 GA) as both of those are certified on both 22.04 and 20.04

Changed in linux-hwe-5.19 (Ubuntu):
status: New → Incomplete
affects: linux-hwe-5.19 (Ubuntu) → linux (Ubuntu)
Revision history for this message
Adrian Huang (ahuang12) wrote :
Revision history for this message
Adrian Huang (ahuang12) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

Adrian, two things:

Can you provide steps to recreate this using the SE350 (including whatever you're setting in BIOS) so I can see if I can provide a local sample with the failure?

Second:
20f3337d350c ("x86: don't use REP_GOOD or ERMS for small memory clearing)

Which kernel tree is this commit in? I could not find it in mainline (unless it has a different mainline commit ID from being merged).

Revision history for this message
Adrian Huang (ahuang12) wrote :

Jeff,

[Reproduce Step]
1.Disable Intel VMD in BIOS settings
   System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Disabled

2.Install OS

3.Enable Intel VMD in BIOS settings
   System Settings --> Devices and I/O Ports --> Intel VMD technology --> Enable/Disable Intel VMD : Enabled

4.Rebooting will reproduce this issue

[Commit 20f3337d350c]
This commit is from Linus's tree (merged in 6.4-rc1): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/lib/memset_64.S?id=20f3337d350c4e1b4ac66d731fd4e98565bf6cc0

Jeff Lane  (bladernr)
Changed in linux (Ubuntu Kinetic):
status: New → Won't Fix
Jeff Lane  (bladernr)
Changed in linux (Ubuntu Lunar):
assignee: nobody → Jeff Lane  (bladernr)
Changed in linux (Ubuntu Mantic):
assignee: nobody → Jeff Lane  (bladernr)
Changed in linux (Ubuntu Lunar):
status: New → In Progress
Changed in linux (Ubuntu Mantic):
status: Incomplete → In Progress
Revision history for this message
Jeff Lane  (bladernr) wrote :

No need for Mantic, which already contains the patch. Only need to pull this back to Lunar.

Changed in linux (Ubuntu Mantic):
status: In Progress → Invalid
Jeff Lane  (bladernr)
Changed in linux (Ubuntu Mantic):
status: Invalid → Fix Released
Revision history for this message
Michael Reed (mreed8855) wrote :

I have created a test kernel, please test it and provide feedback.

https://people.canonical.com/~mreed/lenovo/lp_2020022_vmd/lunar/

description: updated
Revision history for this message
Michael Reed (mreed8855) wrote :

Adrian,

Can you add to the "Where problems could occur" and provide the regression risk?

summary: - [22.04.2 & 23.10] OS cannot boot successfully when enabling VMD in UEFI
- setup
+ [SRU][22.04.2 & 23.10] OS cannot boot successfully when enabling VMD in
+ UEFI setup
Adrian Huang (ahuang12)
information type: Private → Private Security
Adrian Huang (ahuang12)
information type: Private Security → Public Security
Adrian Huang (ahuang12)
description: updated
Revision history for this message
Adrian Huang (ahuang12) wrote :

The test kernel is still failed. Not sure if the patch is included correctly. Could you put the source deb package in your URL? I can check that.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Can you install one of the daily ISOs for mantic (23.10) and test to see if this is an issue? The patches you mention are included in the Mantic kernel already, so that one should not see the failure.

We're still trying to figure out how to get you a working 6.2 kernel.

Revision history for this message
Adrian Huang (ahuang12) wrote :

Confirmed that the kernel (v6.5) of 23.10 does not have the issue.

Revision history for this message
Michael Reed (mreed8855) wrote :

I created a 6.2 test kernel for Lunar. Please test

https://people.canonical.com/~mreed/lenovo/lp_2020022_vmd/lunar/12062023/

Jeff Lane  (bladernr)
Changed in linux (Ubuntu Lunar):
assignee: Jeff Lane  (bladernr) → Michael Reed (mreed8855)
Michael Reed (mreed8855)
description: updated
description: updated
Revision history for this message
Adrian Huang (ahuang12) wrote :

Confirmed that the test kernel in Comment #14 fixes the issue.

Thanks.

Changed in linux (Ubuntu Lunar):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.2.0-41.42 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar-linux' to 'verification-done-lunar-linux'. If the problem still exists, change the tag 'verification-needed-lunar-linux' to 'verification-failed-lunar-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-v2 verification-needed-lunar-linux
Revision history for this message
Brian Murray (brian-murray) wrote :

Ubuntu 23.04 (Lunar Lobster) has reached end of life, so this bug will not be fixed for that specific release.

Changed in linux (Ubuntu Lunar):
status: Fix Committed → Won't Fix
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.