DMAR: ERROR: DMA PTE for vPFN 0x8e8fe already set

Bug #1971505 reported by Michael Brazda
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I got this error after a fresh install of Ubuntu 22.4
The server is a Gen8 Microserver. I moved from Fedora 35 to Ubuntu. Everything was working before the format and reinstall.

Unfortunately, I cannot get to a command line in order to gather any other logs as the server is hung in mid boot with this error flooding the screen.
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Apr 7 19:28 seq
 crw-rw---- 1 root audio 116, 33 Apr 7 19:28 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: pass
DistroRelease: Ubuntu 22.04
InstallationDate: Installed on 2022-05-03 (0 days ago)
InstallationMedia: Ubuntu-Server 22.04 LTS "Jammy Jellyfish" - Release amd64 (20220421)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant MicroServer Gen8
NonfreeKernelModules: nvidia_modeset nvidia
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.15.0-27-generic root=UUID=6df4f807-876c-40e3-ba9d-667c4b1c504d ro
ProcVersionSignature: Ubuntu 5.15.0-27.28-generic 5.15.30
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-27-generic N/A
 linux-backports-modules-5.15.0-27-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy uec-images
Uname: Linux 5.15.0-27-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 07/16/2015
dmi.bios.vendor: HP
dmi.bios.version: J06
dmi.chassis.type: 7
dmi.chassis.vendor: HP
dmi.ec.firmware.release: 2.30
dmi.modalias: dmi:bvnHP:bvrJ06:bd07/16/2015:efr2.30:svnHP:pnProLiantMicroServerGen8:pvr:cvnHP:ct7:cvr:sku819187-001:
dmi.product.family: ProLiant
dmi.product.name: ProLiant MicroServer Gen8
dmi.product.sku: 819187-001
dmi.sys.vendor: HP

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1971505

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Michael Brazda (9-mike-w) wrote : CurrentDmesg.txt

apport information

tags: added: apport-collected jammy uec-images
description: updated
Revision history for this message
Michael Brazda (9-mike-w) wrote : Lspci.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : Lspci-vt.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : Lsusb.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : Lsusb-t.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : Lsusb-v.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : ProcModules.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : UdevDb.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : WifiSyslog.txt

apport information

Revision history for this message
Michael Brazda (9-mike-w) wrote : acpidump.txt

apport information

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Michael,

Thanks for reporting this bug.

We'd need some kernel messages to understand where it's coming from.

It looks like the Gen8 Microserver has iLO / serial console support.

Can you please check whether you can setup a serial console on it?
Then you might need linux kernel option 'console=ttyS1' (or ttyS0).

If that works and you get kernel boot output in it, you should be
able to record a boot by running 'script' then, it the shell that
it opens to record terminal history/activity, connect to the SOL
serial console over lan/network (e.g., SSH, iirc), wait for the
issue to be logged; then quit the console, and 'exit' the
script shell, and upload the recording/generated 'typescript' file.

Hope this helps,

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Your attachments came through after I sent last comment.

You could get to a shell in order to run the commands?

CurrentDmesg.txt does show an interesting error (but
it starts in the middle of the flood) going to ext4/
scsi/ata/intel DMA, which is the DMAR handler/thing.

Having messages from before/when the error started
happened might be helpful still; if it's possible
to get them.

Thanks!

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote :

This might be a duplicate of bug #1970453, which also occurs on HPE servers starting with Ubuntu 22.04.

Revision history for this message
KIMATA Tetsuya (kimata24) wrote :

As a temporary measure, disabling IOMMU as follows seems to prevent the error.

sudo vi /etc/default/grub
+ GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=off"

sudo grub-update

Revision history for this message
TJ (tj) wrote :
Download full text (3.6 KiB)

This is likely the issue addressed in upstream stable v5.15.6 commit 724ee060 - the links in that commit message seem to be very close:

https://<email address hidden>/

---
It is observed that the new PTEs formed (on the host) are same
as the original PTEs, and thus following logs, accompanied by
stacktraces, overwhelm the logs :

......
 DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
 DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
......
---

commit 724ee060d0aba28f072fc7357a20366b0a519593
Author: Alex Williamson <email address hidden>
Date: Fri Nov 26 21:55:56 2021 +0800

    iommu/vt-d: Fix unmap_pages support

    [ Upstream commit 86dc40c7ea9c22f64571e0e45f695de73a0e2644 ]

    When supporting only the .map and .unmap callbacks of iommu_ops,
    the IOMMU driver can make assumptions about the size and alignment
    used for mappings based on the driver provided pgsize_bitmap. VT-d
    previously used essentially PAGE_MASK for this bitmap as any power
    of two mapping was acceptably filled by native page sizes.

    However, with the .map_pages and .unmap_pages interface we're now
    getting page-size and count arguments. If we simply combine these
    as (page-size * count) and make use of the previous map/unmap
    functions internally, any size and alignment assumptions are very
    different.

    As an example, a given vfio device assignment VM will often create
    a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
    does not support IOMMU super pages, the unmap_pages interface will
    ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level()
    will recurse down to level 2 of the page table where the first half
    of the pfn range exactly matches the entire pte level. We clear the
    pte, increment the pfn by the level size, but (oops) the next pte is
    on a new page, so we exit the loop an pop back up a level. When we
    then update the pfn based on that higher level, we seem to assume
    that the previous pfn value was at the start of the level. In this
    case the level size is 256K pfns, which we add to the base pfn and
    get a results of 0x7fe00, which is clearly greater than 0x401ff,
    so we're done. Meanwhile we never cleared the ptes for the remainder
    of the range. When the VM remaps this range, we're overwriting valid
    ptes and the VT-d driver complains loudly, as reported by the user
    report linked below.

    The fix for this seems relatively simple, if each iteration of the
    loop in dma_pte_clear_level() is assumed to clear to the end of the
    level pte page, then our next pfn should be calculated from level_pfn
    rather than our working pfn.

    Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
    Reported-by: Ajay Garg <email address hidden>
    Signed...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.