[r8169] Kernel loop PCIe Bus Error on RTL810xE

Bug #2015670 reported by corrado venturini
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Kernel loop after login
This happens with both kernels 6.2.0-18 and 6.2.0-19 on 2 different partitions of same PC:
Partition installed from beta and partition from ISO dated 04/02.
No problem on same PC different partitions of same disk with Ubuntu Focal, Jammy and Kinetic
Problem does not occur with kernel 5.19.17-051917-generic installed from https://kernel.ubuntu.com/~kernel-ppa/mainline/
Adding journalctl -b related to the problem

note: I opened another bug for this problem blaming gnome-shell but it's wrong
https://bugs.launchpad.net/ubuntu/+source/gnome-shell/+bug/2015540

ProblemType: Bug
DistroRelease: Ubuntu 23.04
Package: linux-image-6.2.0-19-generic 6.2.0-19.19
ProcVersionSignature: Ubuntu 6.2.0-19.19-generic 6.2.6
Uname: Linux 6.2.0-19-generic x86_64
ApportVersion: 2.26.0-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: corrado 1511 F.... wireplumber
 /dev/snd/seq: corrado 1508 F.... pipewire
CRDA: N/A
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Sun Apr 9 09:33:06 2023
InstallationDate: Installed on 2023-04-07 (1 days ago)
InstallationMedia: Ubuntu 23.04 "Lunar Lobster" - Daily amd64 (20230402)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 0bda:5520 Realtek Semiconductor Corp. Integrated_Webcam_HD
 Bus 001 Device 004: ID 0cf3:e009 Qualcomm Atheros Communications
 Bus 001 Device 002: ID 145f:02b5 Trust Trust Wireless Mouse
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. Inspiron 3793
ProcEnviron:
 LANG=en_US.UTF-8
 PATH=(custom, no user)
 SHELL=/bin/bash
 TERM=xterm-256color
 XDG_RUNTIME_DIR=<set>
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-19-generic root=UUID=78239576-b8af-4c66-bf15-8d5ee9a63d35 ro quiet splash
RelatedPackageVersions:
 linux-restricted-modules-6.2.0-19-generic N/A
 linux-backports-modules-6.2.0-19-generic N/A
 linux-firmware 20230323.gitbcdcfbcf-0ubuntu1
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/17/2019
dmi.bios.release: 1.5
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.5.0
dmi.board.name: 0C1PF2
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.5.0:bd12/17/2019:br1.5:svnDellInc.:pnInspiron3793:pvr:rvnDellInc.:rn0C1PF2:rvrA00:cvnDellInc.:ct10:cvr:sku097A:
dmi.product.family: Inspiron
dmi.product.name: Inspiron 3793
dmi.product.sku: 097A
dmi.sys.vendor: Dell Inc.

Revision history for this message
corrado venturini (corradoventu) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
corrado venturini (corradoventu) wrote : Re: Kernel loop PCIe Bus Error on RTL810xE

Still same problem with new kernel 6.2.0-20-generic

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Apr 09 09:30:55 corrado-n8-ll-0402 kernel: pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:01:00.0
Apr 09 09:30:55 corrado-n8-ll-0402 kernel: r8169 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Apr 09 09:30:55 corrado-n8-ll-0402 kernel: r8169 0000:01:00.0: device [10ec:8136] error status/mask=00000001/00006000
Apr 09 09:30:55 corrado-n8-ll-0402 kernel: r8169 0000:01:00.0: [ 0] RxErr (First)
Apr 09 09:30:55 corrado-n8-ll-0402 kernel: pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:01:00.0
Apr 09 09:30:55 corrado-n8-ll-0402 kernel: r8169 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Apr 09 09:30:55 corrado-n8-ll-0402 kernel: r8169 0000:01:00.0: device [10ec:8136] error status/mask=00000001/00006000
Apr 09 09:30:55 corrado-n8-ll-0402 kernel: r8169 0000:01:00.0: [ 0] RxErr (First)
Apr 09 09:30:56 corrado-n8-ll-0402 kernel: pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:01:00.0
Apr 09 09:30:56 corrado-n8-ll-0402 kernel: r8169 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Apr 09 09:30:56 corrado-n8-ll-0402 kernel: r8169 0000:01:00.0: device [10ec:8136] error status/mask=00000001/00006000
Apr 09 09:30:56 corrado-n8-ll-0402 kernel: r8169 0000:01:00.0: [ 0] RxErr (First)

summary: - Kernel loop PCIe Bus Error on RTL810xE
+ [r8169] Kernel loop PCIe Bus Error on RTL810xE
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you please attach `sudo lspci -vvnn`?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
corrado venturini (corradoventu) wrote :
Download full text (38.9 KiB)

corrado@corrado-n8-ll-0402:~$ sudo lspci -vvnn
00:00.0 Host bridge [0600]: Intel Corporation Ice Lake-LP Processor Host Bridge/DRAM Registers [8086:8a12] (rev 03)
 Subsystem: Dell Ice Lake-LP Processor Host Bridge/DRAM Registers [1028:097a]
 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
 Latency: 0
 Capabilities: [e0] Vendor Specific Information: Len=10 <?>
 Kernel driver in use: icl_uncore

00:02.0 VGA compatible controller [0300]: Intel Corporation Iris Plus Graphics G1 (Ice Lake) [8086:8a56] (rev 07) (prog-if 00 [VGA controller])
 DeviceName: To Be Filled by O.E.M.
 Subsystem: Dell Iris Plus Graphics G1 (Ice Lake) [1028:097a]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 139
 Region 0: Memory at 90000000 (64-bit, non-prefetchable) [size=16M]
 Region 2: Memory at 80000000 (64-bit, prefetchable) [size=256M]
 Region 4: I/O ports at 4000 [size=64]
 Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
 Capabilities: [40] Vendor Specific Information: Len=0c <?>
 Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
  DevCap: MaxPayload 128 bytes, PhantFunc 0
   ExtTag- RBE+ FLReset+
  DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
   RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
   MaxPayload 128 bytes, MaxReadReq 128 bytes
  DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
  DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
    10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
    EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
    FRS-
    AtomicOpsCap: 32bit- 64bit- 128bitCAS-
  DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
    AtomicOpsCtl: ReqEn-
 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
  Address: fee00018 Data: 0000
  Masking: 00000000 Pending: 00000000
 Capabilities: [d0] Power Management version 2
  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [100 v1] Process Address Space ID (PASID)
  PASIDCap: Exec- Priv-, Max PASID Width: 14
  PASIDCtl: Enable- Exec- Priv-
 Capabilities: [200 v1] Address Translation Service (ATS)
  ATSCap: Invalidate Queue Depth: 00
  ATSCtl: Enable-, Smallest Translation Unit: 00
 Capabilities: [300 v1] Page Request Interface (PRI)
  PRICtl: Enable- Reset-
  PRISta: RF- UPRGI- Stopped+
  Page Request Capacity: 00008000, Page Request Allocation: 00000000
 Kernel driver in use: i915
 Kernel modules: i915

00:04.0 Signal processing controller [1180]: Intel Corporation Processor Power and Thermal Controller [8086:8a03] (rev 03)
 Subsystem: Dell Processor Power and Thermal Controller [1028:097a]
 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Dis...

Revision history for this message
corrado venturini (corradoventu) wrote (last edit ):

attaching lspci as requested

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Norman Rieß (weuxel) wrote :

Same issue, but the device is a nvme drive for me.

[64205.887633] pcieport 0000:00:06.0: AER: Corrected error received: 0000:02:00.0
[64205.888409] nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[64205.888424] nvme 0000:02:00.0: device [144d:a80a] error status/mask=00000001/0000e000
[64205.888438] nvme 0000:02:00.0: [ 0] RxErr (First)

Revision history for this message
corrado venturini (corradoventu) wrote :

Still same problem with kernel 6.3
corrado@corrado-n8-ll-0402:~$ uname -a
Linux corrado-n8-ll-0402 6.3.0-060300-generic #202304232030 SMP PREEMPT_DYNAMIC Sun Apr 23 20:37:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
corrado@corrado-n8-ll-0402:~$

Revision history for this message
corrado venturini (corradoventu) wrote :

still same problem on Ubuntu 23.10
corrado@corrado-n8-mm-0506:~$ inxi -SCM
System:
  Host: corrado-n8-mm-0506 Kernel: 6.2.0-21-generic arch: x86_64 bits: 64
    Desktop: GNOME v: 44.0 Distro: Ubuntu 23.10 (Mantic Minotaur)
Machine:
  Type: Laptop System: Dell product: Inspiron 3793 v: N/A
    serial: <superuser required>
  Mobo: Dell model: 0C1PF2 v: A00 serial: <superuser required> UEFI: Dell
    v: 1.5.0 date: 12/17/2019
CPU:
  Info: quad core model: Intel Core i5-1035G1 bits: 64 type: MT MCP cache:
    L2: 2 MiB
  Speed (MHz): avg: 1168 min/max: 400/1000 cores: 1: 1200 2: 1200 3: 1200
    4: 1200 5: 1200 6: 944 7: 1200 8: 1200
corrado@corrado-n8-mm-0506:~$

Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
https://iso.qa.ubuntu.com/qatracker/reports/bugs/2015670

tags: added: iso-testing
Revision history for this message
corrado venturini (corradoventu) wrote :

Same problem in Ubuntu 23.10 with kernel 6.4.0-060400rc2-generic

Revision history for this message
corrado venturini (corradoventu) wrote :

After upgrade now I have the same problem on jammy

ago 05 09:36:54 corrado-n2-jammy kernel: pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:01:00.0
ago 05 09:36:54 corrado-n2-jammy kernel: r8169 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
ago 05 09:36:54 corrado-n2-jammy kernel: r8169 0000:01:00.0: device [10ec:8136] error status/mask=00000001/00006000
ago 05 09:36:54 corrado-n2-jammy kernel: r8169 0000:01:00.0: [ 0] RxErr (First)
ago 05 09:36:54 corrado-n2-jammy kernel: pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:01:00.0
ago 05 09:36:54 corrado-n2-jammy kernel: r8169 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
ago 05 09:36:54 corrado-n2-jammy kernel: r8169 0000:01:00.0: device [10ec:8136] error status/mask=00000001/00006000
ago 05 09:36:54 corrado-n2-jammy kernel: r8169 0000:01:00.0: [ 0] RxErr (First)
ago 05 09:36:55 corrado-n2-jammy kernel: pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:01:00.0
ago 05 09:36:55 corrado-n2-jammy kernel: r8169 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
ago 05 09:36:55 corrado-n2-jammy kernel: r8169 0000:01:00.0: device [10ec:8136] error status/mask=00000001/00006000

corrado@corrado-n2-jammy:~$ inxi -SCNxc
System:
  Host: corrado-n2-jammy Kernel: 6.2.0-26-generic x86_64 bits: 64
    compiler: N/A Desktop: GNOME 42.9
    Distro: Ubuntu 22.04.2 LTS (Jammy Jellyfish)
CPU:
  Info: quad core model: Intel Core i5-1035G1 bits: 64 type: MT MCP
    arch: Ice Lake rev: 5 cache: L1: 320 KiB L2: 2 MiB L3: 6 MiB
  Speed (MHz): avg: 1153 high: 1200 min/max: 400/1000 cores: 1: 1200
    2: 1200 3: 1200 4: 1200 5: 828 6: 1200 7: 1200 8: 1200 bogomips: 19046
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Network:
  Device-1: Realtek RTL810xE PCI Express Fast Ethernet vendor: Dell
    driver: r8169 v: kernel port: 3000 bus-ID: 01:00.0
  Device-2: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter
    vendor: Dell driver: ath10k_pci v: kernel bus-ID: 02:00.0
corrado@corrado-n2-jammy:~$

Revision history for this message
corrado venturini (corradoventu) wrote (last edit ):

Same on a new install of Mantic
corrado@corrado-n3-mm-0807:~$ uname -a
Linux corrado-n3-mm-0807 6.3.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 8 16:02:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
corrado@corrado-n3-mm-0807:~$

Adding "pcie_aspm=off" or " pci=nommconf" in /etc/default/grub as suggested for a similar problem in https://askubuntu.com/questions/1401726/pcieport-0000001d-0-aer-corrected-error-received-00000400-0 had no effect.
# GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
# GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pcie_aspm=off "
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=nommconf "

Revision history for this message
corrado venturini (corradoventu) wrote :

still same problem on Noble installed from Ubuntu 24.04 "Noble Numbat" - Daily amd64 (20231118)

Revision history for this message
corrado venturini (corradoventu) wrote :

In Ubuntu 24.04 adding "pcie_aspm=off" worked.

Revision history for this message
Bjorn Helgaas (bjorn-helgaas) wrote :

The "pcie_aspm=off" kernel parameter hides a problem. I would really like to fix the underlying problem so the parameter isn't needed.

If anybody is willing to help fix it, please open a bug report at https://bugzilla.kernel.org/, product Drivers/PCI, mention the hardware platform, and attach:

  - complete dmesg log (I assume this will include some Correctable Errors)
  - output of "sudo lspci -vv"

If booting with the "pcie_aspm=off" kernel parameter makes a difference, please also attach similar dmesg and lspci output for this boot.

This seems similar to https://bugzilla.kernel.org/show_bug.cgi?id=215027, which we originally thought was related to Intel VMD and/or the Samsung NVMe device you have, but I now suspect we might have an ASPM configuration problem.

Revision history for this message
corrado venturini (corradoventu) wrote (last edit ):

Same problem on Ubuntu 24.04 with kernel 6.8.0-31-generic
created bug: https://bugzilla.kernel.org/show_bug.cgi?id=218784

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.