PCIe Bus Error: Uncorrected, Transaction Layer, device [8086:51b0],AER UnsupReq

Bug #1990272 reported by rustyx
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

My dmesg is getting spammed by these AER errors. The laptop (Lenovo X1 Extreme Gen 5, Intel Core i7-12700H) is otherwise working fine.

The errors come in random waves of 60..160 back-to-back errors with an interval of 130..140 microseconds.

Device [8086:51b0] is the PCI/thunderbolt bridge.

[ 20.086628] pcieport 0000:00:1d.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:1d.0
[ 20.086671] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[ 20.086680] pcieport 0000:00:1d.0: device [8086:51b0] error status/mask=00100000/00004000
[ 20.086688] pcieport 0000:00:1d.0: [20] UnsupReq (First)
[ 20.086694] pcieport 0000:00:1d.0: AER: TLP Header: 34000000 20000052 00000000 00000000
[ 20.086810] thunderbolt 0000:22:00.0: AER: can't recover (no error_detected callback)
[ 20.086846] xhci_hcd 0000:56:00.0: AER: can't recover (no error_detected callback)
[ 20.086878] pcieport 0000:00:1d.0: AER: device recovery failed
[ 20.106033] pcieport 0000:00:1d.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:1d.0
[ 20.106082] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[ 20.106093] pcieport 0000:00:1d.0: device [8086:51b0] error status/mask=00100000/00004000
[ 20.106101] pcieport 0000:00:1d.0: [20] UnsupReq (First)
[ 20.106108] pcieport 0000:00:1d.0: AER: TLP Header: 34000000 20000052 00000000 00000000
[ 20.106219] thunderbolt 0000:22:00.0: AER: can't recover (no error_detected callback)
[ 20.106257] xhci_hcd 0000:56:00.0: AER: can't recover (no error_detected callback)
[ 20.106301] pcieport 0000:00:1d.0: AER: device recovery failed

Another wave was at 418s, 440s, 1168s, etc.

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.15.0-48-generic 5.15.0-48.54
ProcVersionSignature: Ubuntu 5.15.0-48.54-generic 5.15.53
Uname: Linux 5.15.0-48-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: me 1827 F.... pulseaudio
 /dev/snd/controlC0: me 1827 F.... pulseaudio
CasperMD5CheckResult: pass
CurrentDesktop: KDE
Date: Tue Sep 20 13:34:43 2022
InstallationDate: Installed on 2022-08-31 (19 days ago)
InstallationMedia: Kubuntu 22.04.1 LTS "Jammy Jellyfish" - Release amd64 (20220809.1)
MachineType: LENOVO 21DE001QMH
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-48-generic root=UUID=d4a7fdda-30e3-439b-b327-e87d7b2bc81e ro
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-48-generic N/A
 linux-backports-modules-5.15.0-48-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu3.5
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/21/2022
dmi.bios.release: 1.8
dmi.bios.vendor: LENOVO
dmi.bios.version: N3JET24W (1.08 )
dmi.board.asset.tag: Not Available
dmi.board.name: 21DE001QMH
dmi.board.vendor: LENOVO
dmi.board.version: SDK0T76530 WIN
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.5
dmi.modalias: dmi:bvnLENOVO:bvrN3JET24W(1.08):bd07/21/2022:br1.8:efr1.5:svnLENOVO:pn21DE001QMH:pvrThinkPadX1ExtremeGen5:rvnLENOVO:rn21DE001QMH:rvrSDK0T76530WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_21DE_BU_Think_FM_ThinkPadX1ExtremeGen5:
dmi.product.family: ThinkPad X1 Extreme Gen 5
dmi.product.name: 21DE001QMH
dmi.product.sku: LENOVO_MT_21DE_BU_Think_FM_ThinkPad X1 Extreme Gen 5
dmi.product.version: ThinkPad X1 Extreme Gen 5
dmi.sys.vendor: LENOVO

Revision history for this message
rustyx (rustyx2) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Frederick Zhang (frederick888) wrote :

I'm also experiencing this issue on a ThinkPad X1 Extreme Gen 5, and it
actually causes some problems more than just logs being spammed.

First of all, I noticed that this only happens after a reboot when I
have an external display connected via USB Type-C. The other end can be
HDMI 2.0 or DisplayPort 1.2/1.4 (also tested an LG that has a Type-C
port and the laptop doesn't detect it at all).

When this happens, my Lenovo Ethernet adapter [1] (Realtek 0bda:8156)
has to be reconnected to be detected. The same thing happens even if I
reboot into Windows. As a result, I have to do cold boots currently to
make sure I have Internet.

Another problem it causes is that after a shutdown in Linux, if I
connect a USB Type-C drive and boot up the laptop, it won't be detected
by BIOS. I have an Orico M2PAC3-G20 [2] and I have to connect it first
then reboot to see it in boot menu. It can always be detected after a
shutdown in Windows by the way.

I'm running Arch and I have this issue with both mainline 6.0.7 and LTS
5.15.77 kernels. I also tested nvidia-dkms, nvidia-open-dkms, and
nouveau but unfortunately it seemed unrelated. Things like pci=nommconf,
pcie_aspm=off, disabling Thunderbolt 4 in BIOS didn't make any
difference either.

PS: Not sure if [3] is the same issue.

[1] https://www.lenovo.com/us/en/p/4x91h17795
[2] https://www.orico.cc/us/product/detail/7207.html
[3] https://bugzilla.kernel.org/show_bug.cgi?id=215453

Revision history for this message
Bjorn Helgaas (bjorn-helgaas) wrote :

The Unsupported Request errors look like the same issue as https://bugzilla.kernel.org/show_bug.cgi?id=215453 and should be resolved by https://git.kernel.org/linus/c01163dbd1b8 ("PCI/PM: Always disable PTM for all devices during suspend"), which appeared in v6.1-rc1.

I don't know whether the NIC and USB drive issues are related.

Revision history for this message
Frederick Zhang (frederick888) wrote :

@Bjorn I installed v6.1-rc4 and I was glad to find out it fixed my NIC
issue :) Even the PXE boot option that mysteriously disappeared after
reboots started showing up stably.

In terms of the USB drive one, I realised that I'm not always to boot
from USB Type-C drives after a shutdown in Windows either. I rarely use
Windows these days and it's probably just a coincidence that it worked
when I initially tested it out. This can be something in Lenovo's
firmware and thankfully it doesn't bother me nearly as much as the NIC
one.

Anyway, thank you very much for the great work!

Revision history for this message
Frederick Zhang (frederick888) wrote :

@Bjorn I backported these patches onto Arch and the warning logs were
gone however the NIC issue still persisted. So I did a bisect and the
commit that fixed it was [1]. So after applying [1], its parent
e7fd8b684, and your patches, now everything is back to normal for me :D

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=38f34dba806a4cb54ef3b2256948e770699a5769

Revision history for this message
Tim Black (timblaktu) wrote :

@Frederick can you confirm whether the final fix got included in arch's kernel?

I'm seeing these pcie bus errors running 6.2.12-arch1-1 kernel on a Thinkpad P1 Gen5 when using a Thunderbolt 3 10G NIC. On cold boot, the device/network works fine, but after some use it eventually breaks, and i see these errors recently in the journal.

Revision history for this message
Frederick Zhang (frederick888) wrote : Re: [Bug 1990272] Re: PCIe Bus Error: Uncorrected, Transaction Layer, device [8086:51b0],AER UnsupReq

On 29/4/23 03:34, Tim Black wrote:
> @Frederick can you confirm whether the final fix got included in arch's
> kernel?
>
> I'm seeing these pcie bus errors running 6.2.12-arch1-1 kernel on a
> Thinkpad P1 Gen5 when using a Thunderbolt 3 10G NIC. On cold boot, the
> device/network works fine, but after some use it eventually breaks, and
> i see these errors recently in the journal.
>

Can you try disabling ACPI? [1] Maybe it's somehow related?

[1] https://bugzilla.kernel.org/show_bug.cgi?id=216863#c3

--
Frederick Zhang

PGP: 8BFB EA5B 4C44 BFAC C8EC 5F93 1F92 8BE6 0D8B C11D

Revision history for this message
Dan Kortschak (dan-kortschak) wrote :

I am experiencing the same issue on a Thinkpad P1 Gen6.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.