AER: Corrected error received: id=00e0
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Unknown
|
Medium
|
|||
linux (Ubuntu) |
Triaged
|
Medium
|
Unassigned | ||
Xenial |
Triaged
|
Medium
|
Unassigned |
Bug Description
WORKAROUND: add pci=noaer to your kernel command line:
1) edit /etc/default/grub and and add pci=noaer to the line starting with GRUB_CMDLINE_
GRUB_CMDLINE_
2) run "sudo update-grub"
3) reboot
----
My dmesg gets completely spammed with the following messages appearing over and over again. It stops after one s3 cycle; it only happens after reboot.
[ 5315.986588] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[ 5315.987249] pcieport 0000:00:1c.0: can't find device of ID00e0
[ 5315.995632] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[ 5315.995664] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
[ 5315.995674] pcieport 0000:00:1c.0: device [8086:9d14] error status/
[ 5315.995683] pcieport 0000:00:1c.0: [ 0] Receiver Error
[ 5316.002772] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[ 5316.002811] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
[ 5316.002826] pcieport 0000:00:1c.0: device [8086:9d14] error status/
[ 5316.002838] pcieport 0000:00:1c.0: [ 0] Receiver Error
[ 5316.009926] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[ 5316.009964] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
[ 5316.009979] pcieport 0000:00:1c.0: device [8086:9d14] error status/
[ 5316.009991] pcieport 0000:00:1c.0: [ 0] Receiver Error
ProblemType: BugDistroRelease: Ubuntu 16.04
Package: linux-image-
ProcVersionSign
Uname: Linux 4.2.0-19-generic x86_64
ApportVersion: 2.19.2-0ubuntu8
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/pcmC0D0c: david 1502 F...m pulseaudio
/dev/snd/
CurrentDesktop: Unity
Date: Mon Nov 30 13:19:00 2015
EcryptfsInUse: Yes
HibernationDevice: RESUME=
InstallationDate: Installed on 2015-11-28 (2 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20151127)
MachineType: Dell Inc. Inspiron 13-7359
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 1.153SourcePackage: linux
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/07/2015
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 01.00.00
dmi.board.name: 0NT3WX
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.
dmi.product.name: Inspiron 13-7359
dmi.sys.vendor: Dell Inc.
David Henningsson (diwic) wrote : | #1 |
- AlsaInfo.txt Edit (31.0 KiB, text/plain; charset="utf-8")
- CRDA.txt Edit (392 bytes, text/plain; charset="utf-8")
- CurrentDmesg.txt Edit (176.9 KiB, text/plain; charset="utf-8")
- Dependencies.txt Edit (2.1 KiB, text/plain; charset="utf-8")
- IwConfig.txt Edit (479 bytes, text/plain; charset="utf-8")
- JournalErrors.txt Edit (3.7 MiB, text/plain; charset="utf-8")
- Lspci.txt Edit (7.4 KiB, text/plain; charset="utf-8")
- Lsusb.txt Edit (381 bytes, text/plain; charset="utf-8")
- ProcCpuinfo.txt Edit (4.4 KiB, text/plain; charset="utf-8")
- ProcEnviron.txt Edit (325 bytes, text/plain; charset="utf-8")
- ProcInterrupts.txt Edit (2.8 KiB, text/plain; charset="utf-8")
- ProcModules.txt Edit (8.0 KiB, text/plain; charset="utf-8")
- PulseList.txt Edit (24.2 KiB, text/plain; charset="utf-8")
- RfKill.txt Edit (112 bytes, text/plain; charset="utf-8")
- UdevDb.txt Edit (177.0 KiB, text/plain; charset="utf-8")
- WifiSyslog.txt Edit (5.1 MiB, text/plain; charset="utf-8")
Brad Figg (brad-figg) wrote : Status changed to Confirmed | #2 |
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
Joseph Salisbury (jsalisbury) wrote : | #3 |
Would it be possible for you to test the latest upstream kernel? Refer to https:/
If this bug is fixed in the mainline kernel, please add the following tag 'kernel-
If the mainline kernel does not fix this bug, please add the tag: 'kernel-
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".
Thanks in advance.
[0] http://
tags: | added: kernel-da-key |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
David Henningsson (diwic) wrote : | #4 |
I've tried upgrading BIOS to 1.2.0 (latest version on Dell website) and also with the v4.4-rc3-wily kernel. The dmesg is still spammed with the same error.
tags: | added: kernel-bug-exists-upstream |
tags: | added: bios-outdated-1.2.0 |
tags: |
added: latest-bios-1.2.0 removed: bios-outdated-1.2.0 |
penalvch (penalvch) wrote : | #5 |
David Henningsson, pending you've already tested and reproduced in 4.4-rc4, the issue you are reporting is an upstream one. Could you please report this upstream (TO Bjorn Helgaas CC linux-pci) via https:/
Please provide a direct URL to your post to the mailing list when it becomes available so that it may be tracked.
Also, could you quantify your description comment "My dmesg gets completely spammed with the following messages appearing over and over again."?
For example, it increases the log file size by 1MB per hour in comparison to when this doesn't happen?
Thank you for your understanding.
tags: | added: kernel-bug-exists-upstream-4.4-rc3 |
Changed in linux (Ubuntu Xenial): | |
status: | Confirmed → Triaged |
David Henningsson (diwic) wrote : | #6 |
The spam rate is 150 lines per second. With ~80 characters per line, that's about 50 MB per hour. As a very rough measure.
In Linux Kernel Bug Tracker #109691, cspadijer (cspadijer-linux-kernel-bugs) wrote : | #129 |
Created attachment 197891
Collection of outputs from X555U laptop
Good day.
I have updated this laptop to the latest vendor supplied BIOS 204 10/18/2015.
Attempted distribution: Ubuntu mate 15.10.
Had to use acpi=off boot parameter to install linux
Eventually found more hardware worked with the pci=nommconf boot parameter
With pci=nommconf the following still does not work:
- Realtec rtl8821ae 802.11ac wireless NIC PCIe will only run in 2.4GHz mode. 5GHz mode will not work.
- Laptop will not resume after suspend
Many boot errors show in dmesg:
ACPI: AE_NOT_FOUND errors
systemd: failed to insert module 'kdbus' function not implemented
If pci=nommconf not used as boot parameter there is a looping pci-e error message that I cant break out of. From what I can read it says:
printk messages dropped pcieport 0000:00:... id=00E5(Receiver ID)
In the attached file is the following when pci=nommconf boot parameter used:
sudo output of:
dmesg
uname -a
lspci -vvnn
dmidecode
Tarball of /proc/acpi directory
Note: I am unable to resume from hibernate everything is frozen. So I am not able to attach a copy of /var/log/kern.log.0
Beanow (beanow) wrote : | #7 |
Confirming same error messages on 4.2.0 kernel from jessie-backports with skylake i7-6700HQ. On pci port 0:1c:0, device ID [8086:a110].
According to lspci -tv this is connected to my Intel 3165 wireless card. Using a manually added ucode from https:/
Can you check with lspci -tv what device is connected to this pci slot?
Beanow (beanow) wrote : | #8 |
Found in your udev file that your slot that triggers the messages is also a wifi card. Realtek, RLT8723BE PCIe Wireless Network Adapter.
So the common ground seems to be. 4.x kernel versions. PCIe wireless cards. Intel PCIe bus. Skylake CPU series laptop.
In Linux Kernel Bug Tracker #109691, rui.zhang (rui.zhang-linux-kernel-bugs) wrote : | #130 |
There are a couple of problems here
1. "pci=nommconf" is needed to boot
2. tpm_crb driver calltrace in dmesg
3. ieee80211_tx calltrace in dmesg
4. hibernate failure
IMO, any of the first three problems may break hibernation, thus we should try to fix the first three issues separately and then check how hibernation goes on this laptop.
Move to PCI category to get Problem 1 fixed first.
In Linux Kernel Bug Tracker #109691, bjorn (bjorn-linux-kernel-bugs) wrote : | #131 |
Thank you very much for this report. It's a pretty serious problem when we can't boot at all.
"pcieport 0000:00:... id=00E5(Receiver ID)" looks like an AER message. Please try turning off AER with "pci=noaer". If you can boot with "pci=noaer" and without "pci=nommconf", please attach the dmesg log.
Here's a report of another similar AER problem:
https:/
In Linux Kernel Bug Tracker #109691, cspadijer (cspadijer-linux-kernel-bugs) wrote : | #132 |
Created attachment 198481
Updated dmesg with pci=noaer
It booted no problem after replacing pci=nommconf with pci=noaer as suggested. See updated dmesg.txt as requested.
Thanks!
David Henningsson (diwic) wrote : Re: Dmesg filled with "AER: Corrected error received" | #9 |
Hi,
Indeed booting with pci=noaer (as suggested in the other bug) works
around this issue as well. I'll use that for the time being.
Thanks for working on it!
// David
On 2015-12-29 16:58, Bjorn Helgaas wrote:
> On Fri, Dec 18, 2015 at 11:30:33AM +0100, David Henningsson wrote:
>> Hi Linux PCI maintainers,
>>
>> My dmesg gets filled with a few lines repeated over and over again:
>>
>> pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
>> pcieport 0000:00:1c.0: can't find device of ID00e0
>> pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
>> pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected,
>> type=Physical Layer, id=00e0(Receiver ID)
>> pcieport 0000:00:1c.0: device [8086:9d14] error
>> status/
>> pcieport 0000:00:1c.0: [ 0] Receiver Error
>>
>> This happens 10-30 times per second (!), so dmesg fills up quickly.
>> The bug is present in both vanilla and Ubuntu kernels.
>
> This is a pretty obvious bug in our AER code. We normally clear
> correctable errors by writing the PCI_ERR_COR_STATUS register in
> handle_
>
> aer_isr_one_error
> aer_print_port_info
> if (find_source_
> aer_process_
> handle_error_source
> pci_write_
>
> In this case, find_source_
> ID00e0" [sic] and returned false, so we don't call
> aer_process_
> we discover it again and again.
>
> I'll work on fixing this. Incidentally, there's another report
> with similar symptoms here:
>
> https:/
>
> Bjorn
>
--
David Henningsson, Canonical Ltd.
https:/
In Linux Kernel Bug Tracker #109691, bjorn (bjorn-linux-kernel-bugs) wrote : | #133 |
Great, thank you! I understand the AER bug (see http://
In Linux Kernel Bug Tracker #109691, cspadijer (cspadijer-linux-kernel-bugs) wrote : | #134 |
Excellent.
Thanks Bjorn.
Great to see you have isolated the problem.
All the best in 2016!
Any other details you require from me let me know I will update this post.
Cheers!
SqUe (sque) wrote : | #10 |
Same error on Ubuntu Gnome 15.10 running 4.2 or 4.3 or 4.4-rc8 as also on Debian testing with 4.3. I get randomly this kind of error:
[ 851.659186] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
[ 851.659208] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e5(Receiver ID)
[ 851.659219] pcieport 0000:00:1c.5: device [8086:9d15] error status/
[ 851.659227] pcieport 0000:00:1c.5: [ 0] Receiver Error (First)
SqUe (sque) wrote : | #11 |
..continuing (pressed post by mistake)
I am on intel i5-6200u and the pci port is the one that wireless card is connected too.
lspci -vt
-[0000:00]-+-00.0 Intel Corporation Sky Lake Host Bridge/DRAM Registers
+-02.0 Intel Corporation Sky Lake Integrated Graphics
+-14.0 Intel Corporation Device 9d2f
+-14.2 Intel Corporation Device 9d31
+-16.0 Intel Corporation Device 9d3a
+-17.0 Intel Corporation Device 9d03
+-1f.0 Intel Corporation Device 9d48
+-1f.2 Intel Corporation Device 9d21
+-1f.3 Intel Corporation Device 9d70
\-1f.4 Intel Corporation Device 9d23
I am also having spci -vt
-[0000:00]-+-00.0 Intel Corporation Sky Lake Host Bridge/DRAM Registers
+-02.0 Intel Corporation Sky Lake Integrated Graphics
+-14.0 Intel Corporation Device 9d2f
+-14.2 Intel Corporation Device 9d31
+-16.0 Intel Corporation Device 9d3a
+-17.0 Intel Corporation Device 9d03
+-1f.0 Intel Corporation Device 9d48
+-1f.2 Intel Corporation Device 9d21
+-1f.3 Intel Corporation Device 9d70
\-1f.4 Intel Corporation Device 9d23
The weird thing is that at some boots this error never appears and on some others this error my show early or later and repeatedly.
tags: |
added: kernel-bug-exists-upstream-4.4.1 removed: kernel-bug-exists-upstream-4.4-rc3 |
tags: | added: wily |
Jordon Bedwell (envygeeks) wrote : | #12 |
I still get this problem in Xenial as well... randomly but it happens.
In Linux Kernel Bug Tracker #109691, bugs (bugs-linux-kernel-bugs) wrote : | #135 |
Looks like I have this same problem (with the same hardware). Adding my name to the list, using Ubuntu's Xubuntu 15.10 distro. The pci=noaer works, although pci=nomsi also works.
Strangely enough, Knoppix 7.6.1 boots just fine. Hmmm...
Ehsan (azarnasab) wrote : | #13 |
On 4.4.8-300.
```text
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: can't find device of ID00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: device [8086:a110] error status/
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: [ 0] Receiver Error (First)
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: can't find device of ID00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: can't find device of ID00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: can't find device of ID00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: can't find device of ID00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: can't find device of ID00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: can't find device of ID00e0
May 05 14:02:57 dashesy.wavelet kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
```
That device is "PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #1 (rev f1)" and is used by "+-1c.0-
`pci=nomsi` solved the problem but so did `pci=noaer` which I will use for now.
I will gladly do debugging if there is a kernel to test.
In Linux Kernel Bug Tracker #109691, cspadijer (cspadijer-linux-kernel-bugs) wrote : | #136 |
Just an update.
confirmed Kelly Price's discovery: Knoppix 7.6.1 with kernel 4.2.6 boots fine.
Thanks Kelly.
I flash updated the BIOS to latest vendor supplied version 206 (2016/02/24).
Latest Ubuntu 16.04 with kernel 4.4 still has the same problem.
Abhishek Bhatia (abhigenie92) wrote : | #14 |
I tried the suggestion of pci=nomsi but it doesn't fix it. Here are the complete details. https:/
Abhishek Bhatia (abhigenie92) wrote : | #15 |
Any progress on this bug?
e633 (e633) wrote : | #16 |
Hello, i am affected too. Dell Latitude 3570. Kernel 4.4.0-21-generic x86_64 and in my case the problematic device seems to be the Qualcomm Atheros AR9462 Wireless Network Adapter. Everything seems to work though.
Full PC specs: https:/
Error:
AER: Corrected error received: id=00e0
pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e0(Transmitter ID)
pcieport 0000:00:1c.0: device [8086:9d14] error status/
pcieport 0000:00:1c.0: [12] Replay Timer Timeout
#lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Sky Lake Host Bridge/DRAM Registers [8086:1904] (rev 08)
00:02.0 VGA compatible controller [0300]: Intel Corporation Sky Lake Integrated Graphics [8086:1916] (rev 07)
00:14.0 USB controller [0c03]: Intel Corporation Device [8086:9d2f] (rev 21)
00:14.2 Signal processing controller [1180]: Intel Corporation Device [8086:9d31] (rev 21)
00:15.0 Signal processing controller [1180]: Intel Corporation Device [8086:9d60] (rev 21)
00:16.0 Communication controller [0780]: Intel Corporation Device [8086:9d3a] (rev 21)
00:17.0 SATA controller [0106]: Intel Corporation Device [8086:9d03] (rev 21)
00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d14] (rev f1)
00:1c.5 PCI bridge [0604]: Intel Corporation Device [8086:9d15] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:9d48] (rev 21)
00:1f.2 Memory controller [0580]: Intel Corporation Device [8086:9d21] (rev 21)
00:1f.3 Audio device [0403]: Intel Corporation Device [8086:9d70] (rev 21)
00:1f.4 SMBus [0c05]: Intel Corporation Device [8086:9d23] (rev 21)
01:00.0 Network controller [0280]: Qualcomm Atheros AR9462 Wireless Network Adapter [168c:0034] (rev 01)
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c)
#lspci -vt
-[0000:00]-+-00.0 Intel Corporation Sky Lake Host Bridge/DRAM Registers
+-02.0 Intel Corporation Sky Lake Integrated Graphics
+-14.0 Intel Corporation Device 9d2f
+-14.2 Intel Corporation Device 9d31
+-15.0 Intel Corporation Device 9d60
+-16.0 Intel Corporation Device 9d3a
+-17.0 Intel Corporation Device 9d03
+-1f.0 Intel Corporation Device 9d48
+-1f.2 Intel Corporation Device 9d21
+-1f.3 Intel Corporation Device 9d70
\-1f.4 Intel Corporation Device 9d23
pci=noaer helps.
Игорь (ifree92) wrote : | #17 |
I have the same "spam" in my dmesg
And as said upper... I have "Intel Corporation Wireless 3165" card connected.
So strange....
erika jonell (erika-jonell) wrote : | #18 |
In order to supress the error and boot at all you must add pci=noaer to your kernel boot parameters. You can do it in the install launcher's GRUB menu or during boot, then regen your grub.cfg with it included for future boots.
This is not an ubuntu unique problem, as i can confirm it exists in other distros as well (Arch for one).
my belief is it is an issue with Skylake chips and intel based mobos and the south-bridge PCI support within the kernel itself.
(i have a i7 6700 and an H110 chipset)
Makda (makdamujji) wrote : | #19 |
This is my dmesg output:
[ 121.716206] pcieport 0000:00:1c.5: device [8086:9d15] error status/
[ 121.716209] pcieport 0000:00:1c.5: [ 0] Receiver Error (First)
[ 121.716216] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
[ 121.716616] pcieport 0000:00:1c.5: can't find device of ID00e5
[ 121.716619] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
[ 121.717092] pcieport 0000:00:1c.5: can't find device of ID00e5
[ 121.717109] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
[ 121.717129] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e5(Receiver ID)
my lspci:
00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 08)
00:02.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 07)
00:04.0 Signal processing controller: Intel Corporation Skylake Processor Thermal Subsystem (rev 08)
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller (rev 21)
00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI (rev 21)
00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21)
00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port (rev f1)
00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
01:00.0 3D controller: NVIDIA Corporation Device 134e (rev a2)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 10)
03:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8723BE PCIe Wireless Network Adapter
Most probably having something to do with 6th gen intel and Realtek hardware.
tags: |
added: yakketywily removed: wily |
tags: |
added: wily yakkety removed: yakketywily |
JujuLand (alain-aupeix) wrote : | #20 |
Same bug with a Dell XPS8900.
I can install 12.04, but it fails with 15.10 or 16.04.
Having installed 12.04 and updated to 14.04, I have then updated to 16.04, but if it boots correctly, syslog and kern.log are filled with these messages and / is filled (0 bytes free ...)
I tried to boot on 16.04 DVD, but impossible ...
Is there any progress about this bug ?
Thanks
A+
Bill Michaelson (t-launchpad-bill-from-net) wrote : | #21 |
I seem to have this issue too, but related to a different device. Running 16.04 with 4.4.0-31-generic. New (used) machine so very concering. It ran fine for about an hour then spontaneously started spewing this:
Jul 26 13:28:05 twin kernel: [ 8.837650] pcieport 0000:00:03.0: AER: Multiple Corrected error received: id=0018
Jul 26 13:28:05 twin kernel: [ 8.837665] pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0018(Receiver ID)
Jul 26 13:28:05 twin kernel: [ 8.837675] pcieport 0000:00:03.0: device [8086:d138] error status/
Jul 26 13:28:05 twin kernel: [ 8.837685] pcieport 0000:00:03.0: [ 0] Receiver Error (First)
lspci -nn gives me a match against this:
00:03.0 PCI bridge [0604]: Intel Corporation Core Processor PCI Express Root Port 1 [8086:d138] (rev 11)
and booting with pci=noaer suppresses the messages with no apparent ill effects.
But I don't know what the message is supposed to mean and I fear that I am suppressing a valid warning and gambling if I use the machine for serious work. Insights more than welcome. The machine is an ASUS G73Jh laptop Intel Core i7-720QM @ 1.60GHz / Nehalem 45nm). TIA.
David Henningsson (diwic) wrote : | #22 |
Out of curiousity, do all of you have the combination of Skylake + RTL8723BE, and second, do you experience (as I do) that wifi doesn't work very well (often loses connections etc)?
...as the errors seem to indicate some kind of physical error between the Skylake/Sunrise Point host controller and the wifi card.
David Henningsson (diwic) wrote : | #23 |
Btw, I reported mine upstream long ago and got response from upstream that "I've thought about this problem a bit, but realistically I don't have time to do the fix I'd like to do /.../ Anybody else who is interested should feel free to take a crack at it."
See http://
Also some googling finds me a few other reports with very similar symptoms, e g:
description: | updated |
Fabio A. (falemagn) wrote : | #24 |
Yes David, I've got your exact hw combination and indeed wifi sometimes seems to "get stuck".
A
sudo modprobe -r rtl8723be
followed by
sudo modprobe rtl8723be
does the trick of bringing the device to life most of the times, though.
Makda (makdamujji) wrote : | #25 |
WiFi can be fixed by this:
Create a conf file for Wifi:
sudo gedit /etc/modprobe.
Write in it:
options rtl8723be fwlps=N ips=N
Save and reboot. WIFi will work fine now, but the NOAER error still floods the dmesg.
JujuLand (alain-aupeix) wrote : | #26 |
I have build the Dell XP 8900 with Ubuntu 14.04, and it works fine.
I forget to disable LTS update, and the owner made the update
The bug is always here, and I must redo a 14.04 install
Grrr ....
Does somebody is in charge of this bug which is very old (since 15.04) ?
Thanks
A+
Bjorn Helgaas (bjorn-helgaas) wrote : | #27 |
Related problem report:
https:/
Brief analysis of AER issue:
http://
I did say in that analysis that I was going to work on fixing this, but I haven't had time. It would be great if somebody would jump in and help out.
JujuLand (alain-aupeix) wrote : | #28 |
Hi, I had a look to the link you give, and saw there is a way to boot using pci=noaer parameter.
It's a good way while no other solution has been found, but does this method is usable when booting on a live hd to install on an HD
Thanks
A+
JujuLand (alain-aupeix) wrote : | #29 |
Humm ... typo : booting on a live DvD, obviously :)
A+
John (jsalatas) wrote : | #30 |
Same here. Also in a Dell XPS 8900 (Skylake + RTL8723BE) using kernel 4.4.0
Eduardo Montes de Oca Sanchez (ed-montesdeoca) wrote : | #31 |
I have de same issue. I Have an HP Star Wars Special Edition 15-an050nr:
edrendar@
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Nov 8 23:43:47 outrider-
Daniel Jose (danieldsj) wrote : | #32 |
I exhibited similar symptoms when installing Ubuntu 16.04.1 LTS on an Asus x541u VivoBook Max system. When performing the installation, the logs would fill up with these errors and eventually fail because of lack of disk space. I found the following thread helpful...
http://
The workaround for me was to hold left SHIFT, edit the grub menu and add the pcie_aspm=off kernel parameter to suppress the messages during the installation and every subsequent boot. Adding these options to the grub configuration after installing was the long-term workaround.
I'm slightly affected, or maybe actually my kernel is "fixed" to correctly clear the error report even when device is not found internally (referring to the #27 brief analysis), as I do see the AER error in dmesg, periodically showing up, but only about once per couple of minutes.
It's still beyond being acceptable for me, so I used the "pci=noaer" workaround, which stops the messages appearing.
Error log:
[ 487.987496] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[ 487.987503] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
[ 487.987505] pcieport 0000:00:1c.0: device [8086:a110] error status/
[ 487.987507] pcieport 0000:00:1c.0: [ 0] Receiver Error (First)
Further errors have the same 1c.0 address (Intel Corporation Wireless 3165) and details.
Kernel version: 4.4.0-59-generic
CPU: Intel(R) Core(TM) i5-6300HQ CPU @ 2.30GHz
# lspci -vt
-[0000:00]-+-00.0 Intel Corporation Sky Lake Host Bridge/DRAM Registers
+-02.0 Intel Corporation Skylake Integrated Graphics
+-14.0 Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller
+-14.2 Intel Corporation Sunrise Point-H Thermal subsystem
+-16.0 Intel Corporation Sunrise Point-H CSME HECI #1
+-17.0 Intel Corporation Sunrise Point-H SATA Controller [AHCI mode]
+-1f.0 Intel Corporation Sunrise Point-H LPC Controller
+-1f.2 Intel Corporation Sunrise Point-H PMC
+-1f.3 Intel Corporation Sunrise Point-H HD Audio
\-1f.4 Intel Corporation Sunrise Point-H SMBus
MSI Notebook GP62 6QF-678XCZ
mohican (mohican) wrote : | #34 |
Hello,
same bug on Asus R556UB-DM217T (live session)
I was able to install using pci=noaer
Also associated with a bug with the sound (no input sound from integrated webcam mic)
sound device : HDA Intel PCH, Realtek ALC256
pakman (phill-phillk) wrote : | #35 |
not sure if this merit's as i encountered this on a Centos install with anaconda, booted with the flag specified & the errors didnt pile up. Hardware is a dell xps. i can provide more info if needed.
PanPetr (javacentrum) wrote : | #36 |
The same issue: lubuntu 16.04 on HP ProBook 470 G3 writes to kernel.log and then completely freeze
Mar 24 09:02:09 localhost kernel: [ 6972.305728] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
Mar 24 09:02:09 localhost kernel: [ 6972.305749] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e5(Receiver ID)
Mar 24 09:02:09 localhost kernel: [ 6972.305760] pcieport 0000:00:1c.5: device [8086:9d15] error status/
Mar 24 09:02:09 localhost kernel: [ 6972.305768] pcieport 0000:00:1c.5: [ 0] Receiver Error (First)
Mar 24 09:03:12 localhost kernel: [ 7035.298073] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
Mar 24 09:03:12 localhost kernel: [ 7035.298083] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e5(Receiver ID)
Mar 24 09:03:12 localhost kernel: [ 7035.298087] pcieport 0000:00:1c.5: device [8086:9d15] error status/
Mar 24 09:03:12 localhost kernel: [ 7035.298089] pcieport 0000:00:1c.5: [ 0] Receiver Error
Mar 24 09:04:15 localhost kernel: [ 7098.238955] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
Mar 24 09:04:15 localhost kernel: [ 7098.238979] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e5(Receiver ID)
Mar 24 09:04:15 localhost kernel: [ 7098.238992] pcieport 0000:00:1c.5: device [8086:9d15] error status/
Mar 24 09:04:15 localhost kernel: [ 7098.239001] pcieport 0000:00:1c.5: [ 0] Receiver Error
\00\00\
Davide (davide-maraschio93) wrote : | #37 |
The same issue: Ubuntu 16.04.2 on Asus N552VW-FY136T writes to kernel.log and then completely freeze
Mar 24 09:02:09 localhost kernel: [ 6972.305728] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
pcieport 0000:00:1c.5 PCIe Bus Error: severity=corrected, type=physical layer, id=00e4(Receiver 12)
pcieport 0000:00:1c.5 device[8086:a112] error status/
The workarounds described here don't work for me.
Davide (davide-maraschio93) wrote : | #38 |
My kernel version is 4.8
Davide (davide-maraschio93) wrote : | #39 |
I've reinstalled Ubuntu and now it starts. I typed dmesg and there's this message anyway:
[ 0.875431] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 0.875438] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 0.875440] pcieport 0000:00:1c.4: device [8086:a114] error status/
[ 0.875442] pcieport 0000:00:1c.4: [ 8] RELAY_NUM Rollover
[ 0.879660] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 0.879667] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 0.879669] pcieport 0000:00:1c.4: device [8086:a114] error status/
[ 0.879670] pcieport 0000:00:1c.4: [ 8] RELAY_NUM Rollover
[ 0.911313] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 0.911319] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 0.911320] pcieport 0000:00:1c.4: device [8086:a114] error status/
[ 0.911321] pcieport 0000:00:1c.4: [ 8] RELAY_NUM Rollover
[ 0.923536] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 0.923542] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 0.923543] pcieport 0000:00:1c.4: device [8086:a114] error status/
[ 0.923544] pcieport 0000:00:1c.4: [ 8] RELAY_NUM Rollover
Changed in linux (Ubuntu): | |
status: | Triaged → New |
Brad Figg (brad-figg) wrote : Status changed to Confirmed | #40 |
This change was made by a bot.
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
Greg Lutostanski (lutostag) wrote : | #41 |
Hitting this with zesty
Linux doe 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:04:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Patrik Wallström (pawal) wrote : | #42 |
I got this today with Zesty as well:
pawal@lakrobot:~$ lspci -vt
-[0000:00]-+-00.0 Intel Corporation Device 5904
+-02.0 Intel Corporation Device 5916
+-04.0 Intel Corporation Skylake Processor Thermal Subsystem
+-14.0 Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
+-14.2 Intel Corporation Sunrise Point-LP Thermal subsystem
+-15.0 Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0
+-15.1 Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1
+-16.0 Intel Corporation Sunrise Point-LP CSME HECI #1
| +-01.0-[04-38]--
| \-02.0-[39]----00.0 Intel Corporation DSL6340 USB 3.1 Controller [Alpine Ridge]
+-1f.0 Intel Corporation Device 9d58
+-1f.2 Intel Corporation Sunrise Point-LP PMC
+-1f.3 Intel Corporation Device 9d71
\-1f.4 Intel Corporation Sunrise Point-LP SMBus
pawal@lakrobot:~$ uname -a
Linux lakrobot 4.10.0-20-generic #22-Ubuntu SMP Thu Apr 20 09:22:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Dmitrii Shcherbakov (dmitriis) wrote : | #43 |
- dmesg_pcie_aspm_rc8.log Edit (173.4 KiB, text/plain)
I have the same issue on the Razer Blade 2017 - the kernel log is flooded with messages.
Disabling PCIe Active State Power Management helps:
GRUB_CMDLINE_
Tested that on 4.11.0-
Eudald (reaven) wrote : | #44 |
Same here with 17.04 Kernel: 4.10.0-20-generic. Computer hangs completely after some random time. It's happened since I updated to 17.04.
Going to try if the workaround prevents the computer from freezing.
Eudald (reaven) wrote : | #45 |
@dmitriis are you sure this flag disables PCIe Active State Power Management? I set it and I still see errors:
May 5 13:09:43 evo kernel: [ 673.810810] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
May 5 13:09:43 evo kernel: [ 673.810821] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
May 5 13:09:43 evo kernel: [ 673.810829] pcieport 0000:00:1c.0: device [8086:a110] error status/
May 5 13:09:43 evo kernel: [ 673.810833] pcieport 0000:00:1c.0: [ 0] Receiver Error (First)
Dmitrii Shcherbakov (dmitriis) wrote : | #46 |
Eudald, I am sure. Tested multiple times.
# edit /etc/default/grub
sudo update-grub
sudo shutdown -r now
and you should be good.
https:/
pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power
Management.
off Disable ASPM.
force Enable ASPM even on devices that claim not to support it.
WARNING: Forcing ASPM on may cause system lockups.
jbeale (jpbeale) wrote : | #47 |
Same problem here. Both Ubuntu Desktop 16.04, and 17.04 with kernel 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:04:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
This is on i7-6700TE @ 2.4GHz, 32 GB RAM, 500 GB SSD with Nvidia GeForce GT710 video card and Realtek 8821AE PCI-E wireless adaptor.
The "pci=noaer" in the GRUB file does work for me to eliminate the error log spam. Before doing that fix, I had an extremely high error rate (every few microseconds) so /var/log/kern.log grew over 10 GB in just a few minutes after bootup. A brief excerpt:
[ 283.805239] pcieport 0000:00:1d.0: AER: Corrected error received: id=00e8
[ 283.805243] pcieport 0000:00:1d.0: can't find device of ID00e8
[ 283.805256] pcieport 0000:00:1d.0: AER: Corrected error received: id=00e8
[ 283.805260] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e8(Receiver ID)
[ 283.805262] pcieport 0000:00:1d.0: device [8086:a119] error status/
[ 283.805263] pcieport 0000:00:1d.0: [ 0] Receiver Error (First)
[ 283.805281] pcieport 0000:00:1d.0: AER: Corrected error received: id=00e8
[ 283.805287] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e8(Receiver ID)
[ 283.805288] pcieport 0000:00:1d.0: device [8086:a119] error status/
[ 283.805290] pcieport 0000:00:1d.0: [ 0] Receiver Error (First)
jbeale (jpbeale) wrote : | #48 |
Note: before doing the workaround, my Realtek 8821AE wifi module did work and would connect to the network, despite the high rate of errors going to the log. After implementing the workaround, no more errors but the wifi doesn't work (it can see the network but won't connect to it).
Daniel Mulholland (dan-mulholland) wrote : | #49 |
FWIW, running Kubuntu 17.04 with kernel 4.12.0-
9825.550655] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 9825.550661] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 9825.550664] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 9825.550666] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
[ 9825.846925] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 9825.846951] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 9825.846966] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 9825.846974] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
[ 9825.852701] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 9825.852715] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 9825.852724] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 9825.852730] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
[ 9826.680756] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 9826.680767] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 9826.680774] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 9826.680780] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
[ 9826.938346] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 9826.938362] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 9826.938370] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 9826.938375] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
[ 9828.079556] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 9828.079566] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 9828.079573] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 9828.079577] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
[ 9828.278507] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 9828.278531] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 9828.278548] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 9828.278559] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
I think this is likely the same issue as in this thread, however the type is "Data Link Layer" rather than "Physical Layer". Happy to run diagnostics on suggestion.
Vladiszavlyev Gergo (gergo-ruszki) wrote : | #50 |
I have an ASUS N552VW laptop for which updating BIOS to 300 had helped to solve this issue.
Vladiszavlyev Gergo (gergo-ruszki) wrote : | #51 |
Follow-up: Errors disappeared only for a few reboots. First only a subset of errors, today all of the previously observed error messages appeared again during boot up.
Stephan Rügamer (sruegamer) wrote : | #52 |
Just saw this message the first time:
Ubuntu Artful (Devel) latest packages.
Jul 21 10:15:18 sruegamer-xps13 kernel: [ 3949.594315] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
Jul 21 10:15:18 sruegamer-xps13 kernel: [ 3949.594323] pcieport 0000:00:1c.4: device [8086:9d14] error status/
Jul 21 10:15:18 sruegamer-xps13 kernel: [ 3949.594329] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
Jul 21 10:15:36 sruegamer-xps13 kernel: [ 3967.818653] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
Jul 21 10:15:36 sruegamer-xps13 kernel: [ 3967.818666] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
Jul 21 10:15:36 sruegamer-xps13 kernel: [ 3967.818671] pcieport 0000:00:1c.4: device [8086:9d14] error status/
Jul 21 10:15:36 sruegamer-xps13 kernel: [ 3967.818674] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
Jul 21 10:16:01 sruegamer-xps13 gnome-terminal-
Jul 21 10:16:43 sruegamer-xps13 kernel: [ 4034.583189] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
Jul 21 10:16:43 sruegamer-xps13 kernel: [ 4034.583201] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
Jul 21 10:16:43 sruegamer-xps13 kernel: [ 4034.583205] pcieport 0000:00:1c.4: device [8086:9d14] error status/
Jul 21 10:16:43 sruegamer-xps13 kernel: [ 4034.583207] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
Jul 21 10:16:46 sruegamer-xps13 kernel: [ 4037.002195] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
Jul 21 10:16:46 sruegamer-xps13 kernel: [ 4037.002201] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
Jul 21 10:16:46 sruegamer-xps13 kernel: [ 4037.002204] pcieport 0000:00:1c.4: device [8086:9d14] error status/
Jul 21 10:16:46 sruegamer-xps13 kernel: [ 4037.002206] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
Laptop is Dell XPS 13 Dev Edition (9360)
Daniel Mulholland (dan-mulholland) wrote : Re: [Bug 1521173] Re: AER: Corrected error received: id=00e0 | #53 |
I have experienced the same issue.
Linux kernel 4.12 (easily installed using Ukuu
http://
completely resolved this issue with the PCIe bus for me.
On Fri, Jul 21, 2017 at 8:26 PM, Stephan Ruegamer <
<email address hidden>> wrote:
> Just saw this message the first time:
>
> Ubuntu Artful (Devel) latest packages.
>
> Jul 21 10:15:18 sruegamer-xps13 kernel: [ 3949.594315] pcieport
> 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer,
> id=00e4(Transmitter ID)
> Jul 21 10:15:18 sruegamer-xps13 kernel: [ 3949.594323] pcieport
> 0000:00:1c.4: device [8086:9d14] error status/
> Jul 21 10:15:18 sruegamer-xps13 kernel: [ 3949.594329] pcieport
> 0000:00:1c.4: [12] Replay Timer Timeout
> Jul 21 10:15:36 sruegamer-xps13 kernel: [ 3967.818653] pcieport
> 0000:00:1c.4: AER: Corrected error received: id=00e4
> Jul 21 10:15:36 sruegamer-xps13 kernel: [ 3967.818666] pcieport
> 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer,
> id=00e4(Transmitter ID)
> Jul 21 10:15:36 sruegamer-xps13 kernel: [ 3967.818671] pcieport
> 0000:00:1c.4: device [8086:9d14] error status/
> Jul 21 10:15:36 sruegamer-xps13 kernel: [ 3967.818674] pcieport
> 0000:00:1c.4: [12] Replay Timer Timeout
> Jul 21 10:16:01 sruegamer-xps13 gnome-terminal-
> blank_cursor from the cursor theme
> Jul 21 10:16:43 sruegamer-xps13 kernel: [ 4034.583189] pcieport
> 0000:00:1c.4: AER: Corrected error received: id=00e4
> Jul 21 10:16:43 sruegamer-xps13 kernel: [ 4034.583201] pcieport
> 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer,
> id=00e4(Transmitter ID)
> Jul 21 10:16:43 sruegamer-xps13 kernel: [ 4034.583205] pcieport
> 0000:00:1c.4: device [8086:9d14] error status/
> Jul 21 10:16:43 sruegamer-xps13 kernel: [ 4034.583207] pcieport
> 0000:00:1c.4: [12] Replay Timer Timeout
> Jul 21 10:16:46 sruegamer-xps13 kernel: [ 4037.002195] pcieport
> 0000:00:1c.4: AER: Corrected error received: id=00e4
> Jul 21 10:16:46 sruegamer-xps13 kernel: [ 4037.002201] pcieport
> 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer,
> id=00e4(Transmitter ID)
> Jul 21 10:16:46 sruegamer-xps13 kernel: [ 4037.002204] pcieport
> 0000:00:1c.4: device [8086:9d14] error status/
> Jul 21 10:16:46 sruegamer-xps13 kernel: [ 4037.002206] pcieport
> 0000:00:1c.4: [12] Replay Timer Timeout
>
> Laptop is Dell XPS 13 Dev Edition (9360)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> AER: Corrected error received: id=00e0
>
> Status in Linux:
> Unknown
> Status in linux package in Ubuntu:
> Confirmed
> Status in linux source package in Xenial:
> Triaged
>
> Bug description:
> Note: Current workaround is to add pci=noaer to your kernel command
> line:
>
> 1) edit /etc/default/grub and and add pci=noaer to the line starting
> with GRUB_CMDLINE_
> GRUB_CMDLINE_
...
jon anoter (jon8899888) wrote : | #54 |
I have the same issue on Asus x550vx laptop {with Nividia GTX950M} i7-7700HQ quad-core on Ubuntu 16.04.3 linux kernel 4.10.0-32-generic:
Aug 21 07:45:05 kernel: [170968.303385] pcieport 0000:00:1c.2: AER: Multiple Corrected error received: id=00e2
Aug 21 07:45:05 kernel: [170968.304027] pcieport 0000:00:1c.2: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e2(Receiver ID)
Aug 21 07:45:05 kernel: [170968.304030] pcieport 0000:00:1c.2: device [8086:a112] error status/
Aug 21 07:45:05 kernel: [170968.304032] pcieport 0000:00:1c.2: [ 0] Receiver Error (First)
Aug 21 07:45:05 kernel: [170968.304044] pcieport 0000:00:1c.2: AER: Corrected error received: id=00e2
Aug 21 07:45:05 kernel: [170968.304691] pcieport 0000:00:1c.2: can't find device of ID00e2
:1c.2: AER: Corrected error received: id=00e2
Logi Leifsson (logileifs) wrote : | #55 |
Also affecting me on Ubuntu 14.04 Asus UX305CA.
Could this be the reason my computer could not resume from suspend anymore?
After a very recent system update my computer never resumed fully from suspend and after a hard restart I got an apportcheckresume error. Only thing I could notice was the same error being described here so I wonder if that has been preventing my computer from resuming after suspend
Julian Alarcon (julian-alarcon) wrote : | #56 |
Still happening with Ubuntu 17.10 kernel Linux P01A30136 4.12.0-12-generic #13-Ubuntu SMP Thu Aug 17 16:13:25 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux and updated BIOS.
Aug 28 10:55:08 LAPTOPNAME kernel: [ 5632.967441] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
Aug 28 10:55:08 LAPTOPNAME kernel: [ 5632.967445] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e5(Receiver ID)
Aug 28 10:55:08 LAPTOPNAME kernel: [ 5632.967447] pcieport 0000:00:1c.5: device [8086:9d15] error status/
Aug 28 10:55:08 LAPTOPNAME kernel: [ 5632.967449] pcieport 0000:00:1c.5: [ 0] Receiver Error (First)
@Spazm (granny-launchpad) wrote : | #57 |
Hit by this today after updating packages on ubuntu 17.10 running on dell 9360.
This upgraded the kernel to '4.13.0-11-generic #12-Ubuntu SMP'
The same pcieport messages as dan-mulholland was seeing.
[ 1423.748011] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 1423.748015] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 1423.748017] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
[ 1428.702571] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 1428.702577] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
...
% lcpci | grep 1c.4
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
Will try with a stock kernel.
jon anoter (jon8899888) wrote : | #58 |
I just wanted to report that I am on Asus X550V (Skylake i7-7700HQ Cpu with Nividia GeForce GTX 950M) and your workaround in first paragraph worked:
"Note: Current workaround is to add pci=noaer to your kernel command line:
1) edit /etc/default/grub and and add pci=noaer to the line starting with GRUB_CMDLINE_
GRUB_CMDLINE_
2) run "sudo update-grub"
3) reboot"
`
That fixed my problem. No more `PCIe Bus Error` and `AER error` messages now.
(I also used `sudo find /var/log -type f -name "*.gz" -delete` to remove old log files and enabled `logrotate` , because I had over 100Gb (!) in those thousands of spam `pcie error` log messages.)
spike speigel (frail-knight) wrote : | #59 |
Just now experiencing this on Ubuntu 17.10. Never saw this before. Dell XPS 13 DE 9360 w/ Kabylake CPU.
tags: | added: artful |
spike speigel (frail-knight) wrote : | #60 |
[ 6283.204650] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 6283.204661] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 6283.204671] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 6283.204677] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
Tim Ritberg (xpert-reactos) wrote : | #61 |
Still same here. Updated from 17.04 to 17.10:
pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e5(Transmitter ID)
pcieport 0000:00:1c.5: device [8086:9d15] error status/
pcieport 0000:00:1c.5: [12] Replay Timer Timeout
Skylake i5-6200U
Aspire E5-574G
Carlos (cjclm7) wrote : | #62 |
Same issue on Linux Ubuntu Server 16.04
Intel Skylake i5-6400
Asus Motherboard Z270 Prime
GPU on PCIe MSI RX 580
Marcos Alano (mhalano) wrote : | #63 |
I'm using Ubuntu 17.10 on a i7 Skylake and using the "pci=noaer"tip the message goes away. Now I need to find out what this option means to see if I'm losing something.
Marcos Alano (mhalano) wrote : | #64 |
This message just occurs to me when I set the "Fastboot" option on BIOS to "Minimal" instead of "Through". I think the Linux isn't ready yet for Fastboot feature.
Jinyu LIU (liujinyu) wrote : | #65 |
I got same issue with DELL XPS
* Ubuntu 17.10
* Linux SimonUbuntu 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
* syslog
Nov 7 00:59:08 SimonUbuntu kernel: [ 1701.635733] pcieport 0000:00:1c.2: AER: Corrected error received: id=00e2
Nov 7 00:59:08 SimonUbuntu kernel: [ 1701.635744] pcieport 0000:00:1c.2: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e2(Receiver ID)
Nov 7 00:59:08 SimonUbuntu kernel: [ 1701.635750] pcieport 0000:00:1c.2: device [8086:a292] error status/
Nov 7 00:59:08 SimonUbuntu kernel: [ 1701.635754] pcieport 0000:00:1c.2: [ 0] Receiver Error (First)
* dmesg
[ 1994.498700] pcieport 0000:00:1c.2: AER: Corrected error received: id=00e2
[ 1994.498705] pcieport 0000:00:1c.2: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e2(Receiver ID)
[ 1994.498707] pcieport 0000:00:1c.2: device [8086:a292] error status/
[ 1994.498708] pcieport 0000:00:1c.2: [ 0] Receiver Error (First)
$ lspci -v -s 1c.2
00:1c.2 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #3 (rev f0) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 122
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
Memory behind bridge: df300000-df3fffff
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp
$ lspci
00:00.0 Host bridge: Intel Corporation Device 591f (rev 05)
00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 05)
00:02.0 Display controller: Intel Corporation HD Graphics 630 (rev 04)
00:14.0 USB controller: Intel Corporation 200 Series PCH USB 3.0 xHCI Controller
00:15.0 Signal processing controller: Intel Corporation 200 Series PCH Serial IO I2C Controller #0
00:15.1 Signal processing controller: Intel Corporation 200 Series PCH Serial IO I2C Controller #1
00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1
00:17.0 SATA controller: Intel Corporation 200 Series PCH SATA controller [AHCI mode]
00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #2 (rev f0)
00:1c.2 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #3 (rev f0)
00:1c.3 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #4 (rev f0)
00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #9 (rev f0)
00:1e.0 Signal processing controller: Intel Corporation 200 Series PCH Serial IO UART Controller #0
00:1f.0 ISA bridge: Intel Corporation 200 Series PCH LPC Controller (Z270)
00:1f.2 Memory controller: Intel Corporation 200 Series PCH PMC
00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio
00:1f.4 SMBus: Intel Corporation 200 Series PCH SMBus Controller
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
02:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
03:00.0 Network controller: Intel Corporation Wireless 3165 (rev 79)
04:00.0 Ethernet controller: Qualcomm Atheros QCA8171 Gigabit E...
Kai-Heng Feng (kaihengfeng) wrote : | #66 |
> On 7 Nov 2017, at 1:07 AM, Jinyu LIU <email address hidden> wrote:
>
> I got same issue with DELL XPS
Jinyu LIU,
Can you file a new bug? Thanks.
Kai-Heng
>
> * Ubuntu 17.10
> * Linux SimonUbuntu 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>
> * syslog
> Nov 7 00:59:08 SimonUbuntu kernel: [ 1701.635733] pcieport 0000:00:1c.2: AER: Corrected error received: id=00e2
> Nov 7 00:59:08 SimonUbuntu kernel: [ 1701.635744] pcieport 0000:00:1c.2: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e2(Receiver ID)
> Nov 7 00:59:08 SimonUbuntu kernel: [ 1701.635750] pcieport 0000:00:1c.2: device [8086:a292] error status/
> Nov 7 00:59:08 SimonUbuntu kernel: [ 1701.635754] pcieport 0000:00:1c.2: [ 0] Receiver Error (First)
>
> * dmesg
> [ 1994.498700] pcieport 0000:00:1c.2: AER: Corrected error received: id=00e2
> [ 1994.498705] pcieport 0000:00:1c.2: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e2(Receiver ID)
> [ 1994.498707] pcieport 0000:00:1c.2: device [8086:a292] error status/
> [ 1994.498708] pcieport 0000:00:1c.2: [ 0] Receiver Error (First)
>
>
> $ lspci -v -s 1c.2
> 00:1c.2 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #3 (rev f0) (prog-if 00 [Normal decode])
> Flags: bus master, fast devsel, latency 0, IRQ 122
> Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
> Memory behind bridge: df300000-df3fffff
> Capabilities: <access denied>
> Kernel driver in use: pcieport
> Kernel modules: shpchp
>
>
> $ lspci
> 00:00.0 Host bridge: Intel Corporation Device 591f (rev 05)
> 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 05)
> 00:02.0 Display controller: Intel Corporation HD Graphics 630 (rev 04)
> 00:14.0 USB controller: Intel Corporation 200 Series PCH USB 3.0 xHCI Controller
> 00:15.0 Signal processing controller: Intel Corporation 200 Series PCH Serial IO I2C Controller #0
> 00:15.1 Signal processing controller: Intel Corporation 200 Series PCH Serial IO I2C Controller #1
> 00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1
> 00:17.0 SATA controller: Intel Corporation 200 Series PCH SATA controller [AHCI mode]
> 00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #2 (rev f0)
> 00:1c.2 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #3 (rev f0)
> 00:1c.3 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #4 (rev f0)
> 00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #9 (rev f0)
> 00:1e.0 Signal processing controller: Intel Corporation 200 Series PCH Serial IO UART Controller #0
> 00:1f.0 ISA bridge: Intel Corporation 200 Series PCH LPC Controller (Z270)
> 00:1f.2 Memory controller: Intel Corporation 200 Series PCH PMC
> 00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio
> 00:1f.4 SMBus: Intel Corporation 200 Series PCH SMBus Controller
> 01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
> 01:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio C...
Sasha Stadnik (evrybiont) wrote : | #67 |
I had the same problem on Linux Lubuntu 17.10 (4.13.0-16-generic)
Asus X550VXK Intel i7, Nvidia Geforce GTX 950M
First at all i had black screen, temporary added "nomodeset" to grub menu solved it.
pci=noaer helped me to get rid of "PCIe Bus Error: severity=Corrected, type=Physical Layer"
After that i had wifi problems (often loses connections etc), posts below helped me to fix wifi
https:/
the same in https:/
Marcos Alano (mhalano) wrote : | #68 |
I entered on BIOS and set the option "Fastboot" to "through". I would
like people check what value is selected for this option and change to
check if error persists. Some people could help me on that?
On Fri, Nov 10, 2017 at 10:37 AM, Sasha Stadnik
<email address hidden> wrote:
> I had the same problem on Linux Lubuntu 17.10 (4.13.0-16-generic)
> Asus X550VXK Intel i7, Nvidia Geforce GTX 950M
>
> First at all i had black screen, temporary added "nomodeset" to grub
> menu solved it.
>
> pci=noaer helped me to get rid of "PCIe Bus Error: severity=Corrected,
> type=Physical Layer"
>
> After that i had wifi problems (often loses connections etc), posts below helped me to fix wifi
> https:/
> the same in https:/
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> AER: Corrected error received: id=00e0
>
> Status in Linux:
> Unknown
> Status in linux package in Ubuntu:
> Confirmed
> Status in linux source package in Xenial:
> Triaged
>
> Bug description:
> Note: Current workaround is to add pci=noaer to your kernel command
> line:
>
> 1) edit /etc/default/grub and and add pci=noaer to the line starting with GRUB_CMDLINE_
> GRUB_CMDLINE_
> 2) run "sudo update-grub"
> 3) reboot
>
> ----
>
> My dmesg gets completely spammed with the following messages appearing
> over and over again. It stops after one s3 cycle; it only happens
> after reboot.
>
> [ 5315.986588] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
> [ 5315.987249] pcieport 0000:00:1c.0: can't find device of ID00e0
> [ 5315.995632] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
> [ 5315.995664] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
> [ 5315.995674] pcieport 0000:00:1c.0: device [8086:9d14] error status/
> [ 5315.995683] pcieport 0000:00:1c.0: [ 0] Receiver Error
> [ 5316.002772] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
> [ 5316.002811] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
> [ 5316.002826] pcieport 0000:00:1c.0: device [8086:9d14] error status/
> [ 5316.002838] pcieport 0000:00:1c.0: [ 0] Receiver Error
> [ 5316.009926] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
> [ 5316.009964] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
> [ 5316.009979] pcieport 0000:00:1c.0: device [8086:9d14] error status/
> [ 5316.009991] pcieport 0000:00:1c.0: [ 0] Receiver Error
>
> ProblemType: Bug
> DistroRelease: Ubuntu 16.04
> Package: linux-image-
> ProcVersionSign
> Uname: Linux 4.2.0-19-generic x86_64
> A...
gotcha (pjusto) wrote : | #69 |
Hi Marcos, I was getting this error on a Precision 5520 when I plugged a TB16 docking station. I am running Xubuntu 16.04 fully updated. No peripheral was functional.
After setting the fast boot option to Auto, it worked!
Cheers...
Dmitrii Shcherbakov (dmitriis) wrote : | #70 |
Marcos,
#68
Regardless of fastboot on/off I get the same behavior without pcie_aspm=off
➜ ~ uname -r
4.13.0-16-generic
Bougron (francis-bougron) wrote : | #71 |
hello
Today, I have seem this 4 lines repeaded many many times in a syslog trace of ubuntu 17.10
Nov 20 00:07:53 sat-XPS-15-9560 kernel: [ 590.905484] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
Nov 20 00:07:53 sat-XPS-15-9560 kernel: [ 590.905513] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e0(Transmitter ID)
Nov 20 00:07:53 sat-XPS-15-9560 kernel: [ 590.905522] pcieport 0000:00:1c.0: device [8086:a110] error status/
Nov 20 00:07:53 sat-XPS-15-9560 kernel: [ 590.905528] pcieport 0000:00:1c.0: [12] Replay Timer Timeout
Bougron (francis-bougron) wrote : | #72 |
Kai-Heng Feng (kaihengfeng) wrote : | #73 |
Bougron, please file a new bug.
Rolands Kusiņš (tower98) wrote : | #74 |
@Bougron did you register new bug? Failed to find new one... If registered, could you pls share new number?
Got new laptop, seems that I'm having the same issue.
Nov 22 10:05:50 tower9-xps15 kernel: [ 110.580978] pcieport 0000:00:1d.0: AER: Corrected error received: id=00e8
Nov 22 10:05:50 tower9-xps15 kernel: [ 110.580990] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e8(Transmitter ID)
Nov 22 10:05:50 tower9-xps15 kernel: [ 110.581008] pcieport 0000:00:1d.0: device [8086:a118] error status/
Nov 22 10:05:50 tower9-xps15 kernel: [ 110.581009] pcieport 0000:00:1d.0: [12] Replay Timer Timeout
$ uname -r
4.13.0-16-generic
ps:
root 4177 0.0 0.0 26804 4740 pts/2 S+ 10:15 0:00 | | \_ /usr/bin/perl /var/lib/
root 4178 0.0 0.0 4468 896 pts/2 S+ 10:15 0:00 | | \_ run-parts --verbose --exit-on-error --arg=4.
root 4179 0.0 0.0 4608 1708 pts/2 S+ 10:15 0:00 | | \_ /bin/sh /usr/lib/
root 4184 0.0 0.0 12936 1024 pts/2 S+ 10:15 0:00 | | \_ plymouth --ping
$ sudo lspci -v
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
Subsystem: Dell Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
Flags: bus master, fast devsel, latency 0
Capabilities: [e0] Vendor Specific Information: Len=10 <?>
00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 05) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 16
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: ec000000-ed0fffff
Prefetchable memory behind bridge: 00000000c000000
Capabilities: [88] Subsystem: Dell Skylake PCIe Controller (x16)
Capabilities: [80] Power Management version 3
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [a0] Express Root Port (Slot+), MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [140] Root Complex Link
Capabilities: [d94] #19
Kernel driver in use: pcieport
Kernel modules: shpchp
00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04) (prog-if 00 [VGA controller])
Subsystem: Dell Device 07be
Flags: bus master, fast devsel, latency 0, IRQ 135
Memory at eb000000 (64-bit, non-prefetchable) [size=16M]
Memory at 80000000 (64-bit, prefetchable) [size=256M]
I/O ports at f000 [size=64]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [100] Process Address Space ID (PASID)
Capabilities: [200] Address Translation Service (ATS)
Capabilities: [300] Page Request Interface (PRI)
Kernel driver in use: i915
Kernel modules: i915
00:0...
Rolands Kusiņš (tower98) wrote : | #75 |
Sorry not enough coffee in a morning. ps output was meant for frozen kernel update...
information type: | Public → Public Security |
information type: | Public Security → Public |
information type: | Public → Public Security |
information type: | Public Security → Public |
Bruno Randolf (br1-l) wrote : | #76 |
Setting "Fastboot" to "Through" in the BIOS (v2.4.2) of my XPS 13 9360 fixed this error.
spike speigel (frail-knight) wrote : | #77 |
I'm not seeing this spammed in dmesg. Only maybe once per boot, but I'm seeing the following on my 9360 running Ubuntu 17.10:
[ 4649.396767] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 4649.396784] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 4649.396800] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 4649.396811] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
Thorsten Munsch (thorsten-munsch) wrote : | #78 |
Still present in (X)ubuntu 17.10 with kernel 4.13.0-32-generic.
The trigger is the onboard Realtek network chip on my Gigabyte GA-AB350 Gaming 3 (AMD Ryzen) mainboard:
+-01.3-
| +-00.1 Advanced Micro Devices, Inc. [AMD] Device 43b7
| \-00.2-
| +-01.0-[04]--
| \-04.0-[05]--
Thorsten Munsch (thorsten-munsch) wrote : | #79 |
I noticed weird network problems on this system aswell, when plugging in a USB3 external harddisk and on Friday even when I just plugged in my mobile phone just to load the battery.
Don't know if this is connected in some way. Yesterday I updated the UEFI/BIOS and will watch if this is still happening.
angelalberto (flkangel) wrote : | #80 |
I had same messages, to get hide it I use Fastboot and pcie_aspm=off. I think this only hide the messages, WiFi works correctly
roussel geoffrey (roussel-geoffrey) wrote : | #81 |
I had the same problem and fix #9 worked for me (adding "pci=noaer").
I'm on Ubuntu 17.10 on a HP Pavilion laptop 14-008nf and all hardware seems to be working.
I was flooded with this(took lots of disk space cause logging constantly):
akem@akem-
Mar 20 19:00:03 akem-HP-
Mar 20 19:00:03 akem-HP-
Mar 20 19:00:03 akem-HP-
Mar 20 19:00:03 akem-HP-
Mar 20 19:00:03 akem-HP-
Mar 20 19:00:03 akem-HP-
Mar 20 19:00:03 akem-HP-
Luca (zapduke) wrote : | #82 |
I too like Thorsten have a Ryzen workstation, and incidentally the same network chipset but a different motherboard (asrock ab350m pro4)
1f:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
I think this error message is generic and different type of problems hides behind it, in my case every once in a while my PC slowly freeze, before freezing I see the AER error, after I have seen a lot of them it freeze.
adding "pcie_aspm=off" makes the error messages disappear and it freeze silently or it shows "r8169:
Warner (warner-veltman) wrote : | #83 |
I can confirm similar issues on AMD Threadripper.
The issues went away on Ubuntu 17.10 by adding "pci_aspm=off" to grub, but are re-introduced by upgrading to 18.04. Strangely, the error now occurs in both 4.13 and 4.15 kernels.
One solution I found is to set PCIe to 2.0 instead of the default 3.0 in BIOS (but this comes at a slight performance cost).
Do we know if this will be assigned / solved soon?
May 4 13:09:38 TR-Ubuntu kernel: [ 76.552730] pcieport 0000:00:01.1: [12] Replay Timer Timeout
May 4 13:09:38 TR-Ubuntu kernel: [ 76.563746] dpc 0000:00:
May 4 13:09:38 TR-Ubuntu kernel: [ 76.563759] pcieport 0000:00:01.1: AER: Multiple Corrected error received: id=0000
May 4 13:09:38 TR-Ubuntu kernel: [ 76.563788] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Transmitter ID)
May 4 13:09:38 TR-Ubuntu kernel: [ 76.563790] pcieport 0000:00:01.1: device [1022:1453] error status/
May 4 13:09:38 TR-Ubuntu kernel: [ 76.563792] pcieport 0000:00:01.1: [ 7] Bad DLLP
M (manudv7) wrote : | #84 |
I have this error on my Asus X541U, with Ubuntu 18.04, please solve it.
It generates an endless list with this error:
PCIe Bus Error: severity=Corrected, type=Physical Layer,
id=00e5(Receiver ID) device [8086:9d15] error status/
And I can't log in.
And it is related to the Wi-Fi board of my PC, which is the following:Realtek Semiconductor Co., Ltd. RTL8723BE PCIe Wireless Network Adapter
Changed in linux (Ubuntu): | |
status: | Confirmed → In Progress |
Changed in linux (Ubuntu): | |
status: | In Progress → Confirmed |
Dave Howson (dave.sohan) wrote : | #85 |
What is the current status of this issue?
I am facing it on my MSI laptop and I'm not sure if disabling interrupts or turning off active-state power management is the right solution?
Riko Naka (rikonaka) wrote : | #86 |
I have the same problem in my computer since I upgrade the last linux kernel 4.4.0-128-generic, and my sound card can not work as usual.
Jun 18 01:48:00 home kernel: [ 353.142183] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
Jun 18 01:48:00 home kernel: [ 353.142194] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
Jun 18 01:48:00 home kernel: [ 353.142197] pcieport 0000:00:1c.0: device [8086:a115] error status/
Jun 18 01:48:00 home kernel: [ 353.142200] pcieport 0000:00:1c.0: [ 0] Receiver Error
Changed in linux (Ubuntu): | |
status: | Confirmed → Fix Released |
Phillip Sz (phillip-sz) wrote : | #87 |
Where is this fixed?
asusbios (asusbios) wrote : | #88 |
This is not fixed for me. Asus x541u
asusbios (asusbios) wrote : | #89 |
Just an update with this. I tried with fedora rawhide with 4.18 kernel and Ubuntu daily cosmic cuttlefish and the issue is present there too.
I am unable to change the status of this bug back to confirmed.
Luiz (lmfranco) wrote : | #90 |
Let me tell my history:
I bought a dell inspiron 7000 series with windows 10.
Replaced by ubuntu 18.04
I make some bios upgrades. my actual bios is the newer.
One day i opened my notebook and do a ssd upgrade.
My error was to unplug the battery because one pin twisted.
I noticed in ubuntu that the notebook was not charging until the battery FULL charge, even with less one pin the batery should charge LESS than full design(3684000) and was charging at "full" (3403000).
Ubuntu was showing all the time CHARGING and never completelly charged.
The error pci aer corrected occurs as i could see in dmesg log.
with lspci i described the pci hardware:
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
Another problem i noticed is that the bios date was bugged when i restore factory bios settings. One bios upgrade i download the file(bios) from another country my notebook model over. I'm not sure this is the cause rtc bug.
I am not completelly sure how the faithful solution sequence was made by me, so i am only reporting ALL long story happened.
In my opinion it can be TIMER bug. One pci aer dmesg log mention replay timer.
What i made:
1 - Download from LOCAL vendor bios. I download las time i upgrade from another country bios vendor, over proxy.
2- Cleared setup (with blank) password. I cleared my only defined admin password.
3- Restarted and flashed the bios download from my model locale.
4- Reset cmos/nvram to defaults factory.
5 - Adjust time at bios and sync with ubuntu via command: timedatectl set-local-rtc 1 --adjust-
6- I leave my notebook turned on full time until the drain total remaining battery. Unplugged from power.
7 - Plug AC Power and turn on the notebook.
The error still occurs, but not in boot time from dmesg logs. My /etc/default/grub file: GRUB_CMDLINE_
Conclusion
Its not Linux bug. In my experience it can be related to: Bios/Time&
The final procedure was to leave my battery drains with notebook switched on.
Luiz (lmfranco) wrote : | #91 |
the initial error report is(several log messages dmesg):
[ 13.078695] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
[ 13.078697] pcieport 0000:00:1c.4: device [8086:9d14] error status/
[ 13.078698] pcieport 0000:00:1c.4: [12] Replay Timer Timeout
My wifi card is:
02:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
All Qualcomm Atheros QCA6174 dmesg log messages:
[ 11.539350] ath10k_pci 0000:02:00.0: enabling device (0000 -> 0002)
[ 11.540180] ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[ 11.821802] ath10k_pci 0000:02:00.0: Direct firmware load for ath10k/
[ 11.821813] ath10k_pci 0000:02:00.0: Direct firmware load for ath10k/
[ 11.825550] ath10k_pci 0000:02:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1028:0310
[ 11.825552] ath10k_pci 0000:02:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0
[ 11.825976] ath10k_pci 0000:02:00.0: firmware ver WLAN.RM.
[ 11.896449] ath10k_pci 0000:02:00.0: board_file api 2 bmi_id N/A crc32 20d869c3
[ 12.560585] ath10k_pci 0000:02:00.0: Unknown eventid: 118809
[ 12.563589] ath10k_pci 0000:02:00.0: Unknown eventid: 90118
[ 12.564353] ath10k_pci 0000:02:00.0: htt-ver 3.47 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[ 12.655660] ath10k_pci 0000:02:00.0 wlp2s0: renamed from wlan0
[ 13.448044] ath10k_pci 0000:02:00.0: Unknown eventid: 118809
[ 13.451050] ath10k_pci 0000:02:00.0: Unknown eventid: 90118
My ath10k_pci tryied to load:
[ 11.821802] ath10k_pci 0000:02:00.0: Direct firmware load for ath10k/
[ 11.821813] ath10k_pci 0000:02:00.0: Direct firmware load for ath10k/
The pci aer message disappears when i disable Qualcomm Atheros QCA6174 WIFI. Note that i keep bluetooth firmware initial bios load. Referenced by the same driver ath10k_pci.
The wifi is connected i think pcie slot, maybe pcie x1.
My "fix" is to disable wifi firmware loading on bios and use ethernet.
Pci aer error with wifi firmware disabled has gone, even when playing.
My notebook error with the battery is physical, maybe the vendor support can do something.
Maybe One solution related to wifi firmware is to use windows 10 driver/firmware and add to linux kernel driver tree after testing. The firmware is packaged with wifi driver, only need to extract. Other fix can be to upgrade kernel. My version is from ubuntu 18.04 repository 4.15.0-23-generic default kernel.
Bluetooth works great.
Dimitrios Menounos (dmenounos) wrote : | #92 |
I have a Dell Inspiron 5570 with Intel i7-8550U CPU. I face the same problem with the Ubuntu 16.04 OEM install and a fresh Kubuntu 18.04 install.
22/7/18 3:01 Μ.Μ. kernel pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
22/7/18 3:01 Μ.Μ. kernel pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e5(Transmitter ID)
22/7/18 3:01 Μ.Μ. kernel pcieport 0000:00:1c.5: device [8086:9d15] error status/
22/7/18 3:01 Μ.Μ. kernel pcieport 0000:00:1c.5: [12] Replay Timer Timeout
$ lspci -tv
-[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
+-02.0 Intel Corporation UHD Graphics 620
+-04.0 Intel Corporation Skylake Processor Thermal Subsystem
+-14.0 Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
+-14.2 Intel Corporation Sunrise Point-LP Thermal subsystem
+-15.0 Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0
+-16.0 Intel Corporation Sunrise Point-LP CSME HECI #1
+-17.0 Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode]
+-1f.0 Intel Corporation Device 9d4e
+-1f.2 Intel Corporation Sunrise Point-LP PMC
+-1f.3 Intel Corporation Sunrise Point-LP HD Audio
\-1f.4 Intel Corporation Sunrise Point-LP SMBus
I haven't tried the pci=noaer solution yet. However, judging from (https:/
Leonidas S. Barbosa (leosilvab) wrote : | #93 |
I have the same issue in mey dell inspiron 5378 i7.
lspci -vt
-[0000:00]-+-00.0 Intel Corporation Device 5904
+-02.0 Intel Corporation Device 5916
+-04.0 Intel Corporation Skylake Processor Thermal Subsystem
+-13.0 Intel Corporation Device 9d35
+-14.0 Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
+-14.2 Intel Corporation Sunrise Point-LP Thermal subsystem
+-15.0 Intel Corporation Sunrise Point-LP Serial IO I2C Controller
+-15.1 Intel Corporation Sunrise Point-LP Serial IO I2C Controller
+-16.0 Intel Corporation Sunrise Point-LP CSME HECI
+-17.0 Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode]
+-1f.0 Intel Corporation Device 9d58
+-1f.2 Intel Corporation Sunrise Point-LP PMC
+-1f.3 Intel Corporation Device 9d71
\-1f.4 Intel Corporation Sunrise Point-LP SMBus
It also seems my wifi card is struggling it's quite annoying.
Dmesg info:
7024.543968] acpi INT3400:00: Unsupported event [0x86]
[ 7127.808824] acpi INT3400:00: Unsupported event [0x86]
[ 7527.461667] wlp1s0: deauthenticating from 10:62:d0:9d:dc:b2 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 7532.477432] wlp1s0: authenticate with 10:62:d0:9d:dc:b2
[ 7532.527885] wlp1s0: send auth to 10:62:d0:9d:dc:b2 (try 1/3)
[ 7532.529562] wlp1s0: authenticated
[ 7532.531874] wlp1s0: associate with 10:62:d0:9d:dc:b2 (try 1/3)
[ 7532.535501] wlp1s0: RX AssocResp from 10:62:d0:9d:dc:b2 (capab=0x411 status=0 aid=3)
[ 7532.537945] wlp1s0: associated
[ 7627.582766] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[ 7627.582785] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e0(Transmitter ID)
[ 7627.582801] pcieport 0000:00:1c.0: device [8086:9d14] error status/
[ 7627.582815] pcieport 0000:00:1c.0: [12] Replay Timer Timeout
[ 7732.910629] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[ 7732.910649] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e0(Transmitter ID)
[ 7732.910662] pcieport 0000:00:1c.0: device [8086:9d14] error status/
[ 7732.910670] pcieport 0000:00:1c.0: [12] Replay Timer Timeout
[ 7733.522628] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[ 7733.522648] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e0(Transmitter ID)
[ 7733.522661] pcieport 0000:00:1c.0: device [8086:9d14] error status/
[ 7733.522669] pcieport 0000:00:1c.0: [12] Replay Timer Timeout
[ 7761.990278] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[ 7761.990300] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e0(Transmitter ID)
[ 7761.990313] pcieport 0000:00:1c.0: device [8086:9d14] error status/
[ 7761.990323] pcieport 0000:00:1c.0: [12] Replay Timer Timeout
[ 7821.073917] pcieport 0000:00:1c.0: AER: Corrected error received:...
Leonidas S. Barbosa (leosilvab) wrote : | #94 |
I'm in Xenial : Linux 4.15.0-29-generic #31~16.04.1-Ubuntu
Lucas Czepaniki (lucas.czpnk) wrote : | #95 |
I'm also having this issue on my Dell Inspiron 14 7472.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
$ uname -a
Linux bionic 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ dmesg
[ 2196.965458] pcieport 0000:00:1c.5: device [8086:9d15] error status/
[ 2196.965466] pcieport 0000:00:1c.5: [12] Replay Timer Timeout
[ 2197.399555] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
[ 2197.399566] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e5(Transmitter ID)
[ 2197.399569] pcieport 0000:00:1c.5: device [8086:9d15] error status/
[ 2197.399571] pcieport 0000:00:1c.5: [12] Replay Timer Timeout
[ 2197.644496] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
[ 2197.644506] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e5(Transmitter ID)
[ 2197.644509] pcieport 0000:00:1c.5: device [8086:9d15] error status/
[ 2197.644511] pcieport 0000:00:1c.5: [12] Replay Timer Timeout
[ 2198.273044] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
[ 2198.273053] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e5(Transmitter ID)
[ 2198.273056] pcieport 0000:00:1c.5: device [8086:9d15] error status/
[ 2198.273058] pcieport 0000:00:1c.5: [12] Replay Timer Timeout
[ 2198.274547] pcieport 0000:00:1c.5: AER: Corrected error received: id=00e5
[ 2198.274557] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e5(Transmitter ID)
[ 2198.274561] pcieport 0000:00:1c.5: device [8086:9d15] error status/
[ 2198.274564] pcieport 0000:00:1c.5: [12] Replay Timer Timeout
...
and it goes like this for a ton of lines.
Amr Elbeleidy (beleidy) wrote : | #96 |
*First linux bug report*
I am also affected by this issue Bionic: 4.15.0-33 on a Dell Aurora R6
Processor is i7-7700K Kabylake getting the issue with the PCI bus connected to Intel Corporation Wireless 3165 card.
I see someone said a fix is in place, but cannot find where the fix has been released.
C de-Avillez (hggdh2) wrote : | #97 |
Reverting to Triaged on the Ubuntu task, Also making clear there is a workaround (pci=noaer) for it.
Changed in linux (Ubuntu): | |
status: | Fix Released → Triaged |
Joseph Salisbury (jsalisbury) wrote : | #98 |
The mainline kernel is now at v4.19-rc6. It might be worth testing this kernel to see if the bug has been fixed upstream. It can be downloaded from:
description: | updated |
StoatWblr (stoatwblr) wrote : | #99 |
This is also present in later distro versions, right up to Cosmic.
In my case it manifests on Supermicro and Intel 7500/5500/5520/X58 - based servers when Qlogic QLE2562 fibre optic cards are used - and _ONLY_ with Qlogic cards, nothing else seems to trigger it
As with the wifi cards on laptops, a S3 cycle stops it.
PedroCorreia (pmfernandez) wrote : | #100 |
I'm still having this issue with my Dell Inspiron 5570 and a Ubuntu 18.04 fully updated install.
I had this issue since i bought it (6 months ago), but now its even worse. Right now my wifi keeps disconnecting and a hard reboot is required to make it work again.
Also, the wifi icon on the top panel keeps showing an interrogation instead of the wifi icon.
Restarting the Network manager results on an inability to detect any wifi connections.
I have several errors in dmesg when this happens like ( failed to wake target to writing ... )
jack lemon (mb0087) wrote : | #101 |
I'm also experiencing this bug with the latest debian testing release:
Linux lemon 4.19.0-1-amd64 #1 SMP Debian 4.19.12-1 (2018-12-22) x86_64 GNU/Linux
Shaheed Haque (srhaque-i) wrote : | #102 |
I am seeing this on an updated Cosmic with a Dell Inspiron 5570. Kernel version is presently 4.18.0-15.16.
vmc (vmclark) wrote : | #103 |
I get the pcie errors on all X,L,K Ubuntu's disco 19.04
====
[ 3056.549121] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[ 3056.549136] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 3056.549147] pcieport 0000:00:1c.0: device [8086:a33d] error status/
[ 3056.549154] pcieport 0000:00:1c.0: [12] Timeout
================
00:1c.0 0604: 8086:a33d (rev f0) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 122
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00003000-00003fff
Memory behind bridge: a2100000-a21fffff
Kernel driver in use: pcieport
=================
lspci -s 000:00:1c.0
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port (rev f0)
=================
$ lspci -vt
-[0000:00]-+-00.0 Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers
+-02.0 Intel Corporation UHD Graphics 630 (Desktop)
+-08.0 Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model
+-12.0 Intel Corporation Cannon Lake PCH Thermal Controller
+-14.0 Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller
+-14.2 Intel Corporation Cannon Lake PCH Shared SRAM
+-14.3 Intel Corporation Wireless-AC 9560 [Jefferson Peak]
+-16.0 Intel Corporation Cannon Lake PCH HECI Controller
+-17.0 Intel Corporation SATA Controller [RAID mode]
+-1f.0 Intel Corporation Device a308
+-1f.3 Intel Corporation Cannon Lake PCH cAVS
+-1f.4 Intel Corporation Cannon Lake PCH SMBus Controller
\-1f.5 Intel Corporation Cannon Lake PCH SPI Controller
Mohamed Salama (gray.hat.enigma) wrote : | #104 |
I have experienced the same bug on my laptop HP - Pavilion and I think the problem is in compatibility between Wireless Card RTL8723BE PCIe and the linux kernal
causing it to infinitely log a PCIe Error on every boot/reboot of the system which cause a huge log files size on the disk!
I 'm not with the approach to suppress the warning on start up using grub defaults parameters (pci=nomsi and pci=noaer) .. I think this may endanger the system if a serious error/problem arises in the future and anyway the wifi card doesn't function properly due to the error.
So I think it might better to disable it permanently and use another wifi drive ( for example: usb adapter )
# Of course this is a "temp" solution until there is a fix to the kernal regarding this matter, but it will do the trick
Steps:
# In /etc/modprobe.
blacklist rtl8723be
# After rebooting to make sure the driver is disabled execute this
lsmod | grep rtl
# To get the kernal module related to the card name simply execute
lspci -nnk
Pablo Palácios (ppalacios) wrote : | #105 |
I've got the same using archlinux latest kernel with a dell computer, i7 and an pcie network card as well. I've found this thread on redhat bugzilla very helpful:
https:/
I was able to solve my problem by explicitly disabling aspm in my bios. From factory it was set to auto which perhaps could result in "device has no support for aspm but let's enabled aspm anyway" behavior making kernel confused.
tags: | added: cscc |
Willem Hobers (whobers) wrote : | #106 |
Seeing this on Linux LAPTOP 5.0.0-25-generic #26~18.04.1-Ubuntu SMP Thu Aug 1 13:51:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux, running xubuntu 18.04.3.
description: Notebook
product: Aspire A315-53 (0000000000000000)
vendor: Acer
*-pci
product: Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
vendor: Intel Corporation
physical id: 100
bus info: pci@0000:00:00.0
version: 08
width: 32 bits
clock: 33MHz
If there's any other info I can provide, please let me know.
Utkarsh (ashubisht) wrote : | #107 |
I am getting the same issue with Ubuntu 19.04.
I am using HP Probook 440G3 having i7-6500U processor and RTL8723BE network card.
Any ideas on when this issue is planned to be patched?
Getting this info from Windows counterpart, as I am still getting issues after applying nomsi
Name LocationInfo UINumber
---- ------------ --------
Realtek RTL8723BE 802.11 bgn Wi-Fi Adapter PCI bus 3, device 0, function 0 5
Realtek PCIe GBE Family Controller PCI bus 2, device 0, function 0 4
Realtek PCIE CardReader PCI bus 4, device 0, function 0 8
V-Mark (vertesmark) wrote : | #108 |
I have similar problem, but I got "Timeout"
ACER Nitro 5 - Ubuntu 19.04 fresh install.
Intel(R) Core(TM) i7-8750H
Spamming in few minutes the following (on example):
Sep 25 03:09:06 mark-Nitro-AN515-52 kernel: [ 4326.007230] pcieport 0000:00:1d.5: AER: Corrected error received: 0000:00:1d.5
Sep 25 03:09:06 mark-Nitro-AN515-52 kernel: [ 4326.007248] pcieport 0000:00:1d.5: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Sep 25 03:09:06 mark-Nitro-AN515-52 kernel: [ 4326.007256] pcieport 0000:00:1d.5: device [8086:a335] error status/
Sep 25 03:09:06 mark-Nitro-AN515-52 kernel: [ 4326.007261] pcieport 0000:00:1d.5: [12] Timeout
Spamming means: Sometimes 1 every 2-4 minutes, sometime I have 1 hour without any spam.
>lspci -vt
-[0000:00]-+-00.0 Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers
| \-00.1 NVIDIA Corporation GP107GL High Definition Audio Controller
+-02.0 Intel Corporation UHD Graphics 630 (Mobile)
+-08.0 Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model
+-12.0 Intel Corporation Cannon Lake PCH Thermal Controller
+-14.0 Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller
+-14.2 Intel Corporation Cannon Lake PCH Shared SRAM
+-14.3 Intel Corporation Wireless-AC 9560 [Jefferson Peak]
+-15.0 Intel Corporation Device a368
+-15.1 Intel Corporation Device a369
+-16.0 Intel Corporation Cannon Lake PCH HECI Controller
+-17.0 Intel Corporation Device a353
| \-00.1 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
+-1e.0 Intel Corporation Device a328
+-1f.0 Intel Corporation Device a30d
+-1f.3 Intel Corporation Cannon Lake PCH cAVS
+-1f.4 Intel Corporation Cannon Lake PCH SMBus Controller
\-1f.5 Intel Corporation Cannon Lake PCH SPI Controller
Luigi Calligaris (luigicalligaris) wrote : | #109 |
Dell Inspiron P74G, Kubuntu 19.04 Disco, kernel 5.0.0-13-generic.
I'm affected as well by this bug, with ~50 lines per minute of errors in the syslog.
I noticed only recently the issue on my Kubuntu 18.04 LTS setup (say, this October 2019). Since then I upgraded to 19.04, but with no improvement. My errors in dmesg are of the same form as stated above, with two recurring types of error statuses:
pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
pcieport 0000:00:1c.4: device [8086:9d14] error status/
pcieport 0000:00:1c.4: [12] Timeout
pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4
pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
pcieport 0000:00:1c.4: device [8086:9d14] error status/
pcieport 0000:00:1c.4: [12] Timeout
That pcie port is shown to be connected to the Atheros WiFi of the laptop:
+-1c.4-[02]----00.0 Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
The output of lshw for it is:
*-pci:1
bus info: pci@0000:00:1c.4
width: 32 bits
clock: 33MHz
bus info: pci@0000:02:00.0
I cannot find an APSM disable option in my BIOS setup.
A guy named Dennis E. Mungai digged into the issue last year (link below), and his temporary fix (turning off the report bit for AER Corrected errors) worked for me, without the need to turn off AER for the whole system.
https:/
I find interesting that for most of us this issue affects laptop WiFi cards from different vendors.
information type: | Public → Public Security |
information type: | Public Security → Public |
Ricardo S O Leite (ricsdeol) wrote : | #110 |
Hi, Dell G3 3579 (086F)
Ubuntu 20.04
LOG:
[72720.138307] pcieport 0000:00:1d.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[72720.138314] pcieport 0000:00:1d.6: AER: device [8086:a336] error status/
[72720.138320] pcieport 0000:00:1d.6: AER: [12] Timeout
PCI INFO:
➜ sudo lspci -v
00:00.0 Host bridge: Intel Corporation 8th Gen Core 4-core Processor Host Bridge/DRAM Registers [Coffee Lake H] (rev 07)
DeviceName: Onboard - Other
Subsystem: Dell 8th Gen Core 4-core Processor Host Bridge/DRAM Registers [Coffee Lake H]
Flags: bus master, fast devsel, latency 0
Capabilities: [e0] Vendor Specific Information: Len=10 <?>
Kernel driver in use: skl_uncore
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 07) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 122
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00004000-00004fff [size=4K]
Memory behind bridge: a3000000-a40fffff [size=17M]
Prefetchable memory behind bridge: 000000009000000
Capabilities: [88] Subsystem: Dell Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16)
Capabilities: [80] Power Management version 3
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [a0] Express Root Port (Slot+), MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [140] Root Complex Link
Capabilities: [d94] Secondary PCI Express
Kernel driver in use: pcieport
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (prog-if 00 [VGA controller])
DeviceName: Onboard - Video
Subsystem: Dell UHD Graphics 630 (Mobile)
Flags: bus master, fast devsel, latency 0, IRQ 130
Memory at a2000000 (64-bit, non-prefetchable) [size=16M]
Memory at 80000000 (64-bit, prefetchable) [size=256M]
I/O ports at 5000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [100] Process Address Space ID (PASID)
Capabilities: [200] Address Translation Service (ATS)
Capabilities: [300] Page Request Interface (PRI)
Kernel driver in use: i915
Kernel modules: i915
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 07)
DeviceName: Onboard - Other
Subsystem: Dell Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
Flags: fast devsel, IRQ 16
Memory at a4610000 (64-bit, non-prefetchable) [size=32K]
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 3
Capabilities: [e0] Vendor Specific Information: Len=0c <?>
Kernel driver in use: proc_thermal
Kernel modules: processor_
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
DeviceName:...
Martin Vernay (magean) wrote : | #111 |
I've been getting a similar problem on a Leopard GP73-8RE laptop from MSI, on Ubuntu 20.04 as well as 19.10 and 18.04.
This is the message that spammed in my system journal, causing it to inflate very rapidly to ludicrous proportions:
22:36:51 kernel: alx 0000:03:00.0: AER: [ 7] BadDLLP
22:36:51 kernel: alx 0000:03:00.0: AER: device [1969:e0a1] error status/
22:36:51 kernel: alx 0000:03:00.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
I tried the following kernel parameters:
-`pci=nomsi`: this also disabls removable devices... not an option.
-`pci=noaer` : disables advanced error reporting without fixing the errors themselves. It works insofar as it suppresses the message flood. However, this is akin to "shooting the messenger": it also prevents troubleshooting other, potentially more serious, errors that won't be reported as well. Plus, letting errors occur continuously might not be the optimal solution, even though these errors are apparently getting corrected.
-`pci=nommconf`: this gets rid of the errors, and so far hasn't had any undesirable side effect. I'll report back if I notice any.
Someone on reddit has also suggested `pcie_aspm=off` :
https:/
But I haven't tried it myself.
Martin Vernay (magean) wrote : | #112 |
So, although `pci=nommconf` gets rid of the error flood, it does apparently make some collateral damage. After a few days under this kernel parameter, the person who uses the laptop on a daily basis reported a decrease in responsiveness and stability, with occasional stutters if I understood correctly. Then I was called to help with a black screen. And indeed, there was nothing to be done but a hard power-off. I couldn't even access a tty. At that point I decided to stop the experiment and reverted to `pci=noaer`; the system then returned to its normal behavior.
I am now trying `pcie_aspm=off`. That apparently gets rid of the errors as well. Hopefully the trade-off is limited to less efficient power saving, which doesn't matter as the laptop is nearly always connected to a power source. Besides, if the error messages were of any indication, ASPM did not work correctly anyway; so, potentially power management won't get worse (what's there to lose by disabling a malfunctioning feature?).
Paul Menzel (paulmenzel) wrote : | #113 |
@magean, I believe you are having a different issue here, so please create a separate bug report, and, as you reproduced this with Linux 5.4 (also try https:/
smiki (micouk) wrote : | #114 |
my investigation came to same conclusion as #109 (but I'm on 20.04 and latest kernel, so this is still relevant)
My configuration is as follows.
It there is a need to get more information/logs, please let me know.
--
Dell Latitude 7389
miki@DL-7389:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_
DISTRIB_
DISTRIB_
miki@DL-7389:~$ uname -a
Linux DL-7389 5.4.0-31-generic #35-Ubuntu SMP Thu May 7 20:20:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
dmesg gets spammed by these error reports:
[116522.584941] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[116522.584959] pcieport 0000:00:1c.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[116522.584967] pcieport 0000:00:1c.0: AER: device [8086:9d17] error status/
[116522.584973] pcieport 0000:00:1c.0: AER: [12] Timeout
[116533.643718] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[116533.643735] pcieport 0000:00:1c.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[116533.643744] pcieport 0000:00:1c.0: AER: device [8086:9d17] error status/
[116533.643751] pcieport 0000:00:1c.0: AER: [12] Timeout
[116559.755644] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[116559.755655] pcieport 0000:00:1c.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[116559.755658] pcieport 0000:00:1c.0: AER: device [8086:9d17] error status/
[116559.755660] pcieport 0000:00:1c.0: AER: [12] Timeout
Device causing it seems to be Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter:
miki@DL-7389:~$ sudo lspci -t -v
-[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
+-02.0 Intel Corporation HD Graphics 620
+-04.0 Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
+-13.0 Intel Corporation Sunrise Point-LP Integrated Sensor Hub
+-14.0 Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
+-14.2 Intel Corporation Sunrise Point-LP Thermal subsystem
+-15.0 Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0
+-15.1 Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1
+-15.2 Intel Corporation Sunrise Point-LP Serial IO I2C Controller #2
+-16.0 Intel Corporation Sunrise Point-LP CSME HECI #1
+-1f.0 Intel Corporation Sunrise Point LPC Controller/eSPI Controller
+-1f.2 Intel Corporation Sunrise Point-LP PMC
+-1f.3 Intel Corporation Sunrise Point-LP HD Audio
\-1f.4 Intel Corporation Sunrise Point-LP SMBus
lspci -v detailed output for this device (strange that the serial number is read as zeros):
01:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapte...
piscvau (piscvau) wrote : | #115 |
on PC MSI GE73, installation of Xubuntu 18.04 fails,as well as XUBUNTU 19.10. PC was returned to MSI under warranty. Hardware is correct and no problem with windows.
WIth the latest BIOS it is now impossible to boot the PC with an ISO USB key for version 18.04 and 19.10.
WIth XUBUNTU 20.04 the PC boots but after entering session, the system crashes.
Paul Menzel (paulmenzel) wrote : | #116 |
@piscvau, the original report is not about a crash, so your issue is unrelated. Please create a separate report for the crash Ubuntu 20.04. (Also mention there, if it is a system crash/hang? Does the numlock key still work? Can you switch to a virtual console with Ctrl + Alt + F4? Can you still ping the system in the network?) Good luck!
Luis A (peppapig123) wrote : | #117 |
This bug still affect the install process today, with ubuntu 20.04 LTS and same with kubuntu installer.
fermulator (fermulator) wrote : | #118 |
"me too" - Dell Latitude w/ a Dell WD16 USB-C dock. After recent firmware updates it got significantly worse and nearly never properly re-attaches after suspend/resume.
Dell 5400 Latitude:
(0.1.9.1=same, 0.1.7.4=older, 0.1.6.5=older, 0.1.5.1=older, 0.1.4.2=older)
dock:
```
No upgrades for RTS5413 in Dell dock, current is 01.21: 01.21=same
No upgrades for RTS5487 in Dell dock, current is 01.47: 01.47=same
No upgrades for WD19, current is 01.00.00.00: 01.00.00.00=same
No upgrades for Package level of Dell dock, current is 01.00.04.01: 01.00.04.01=same
No upgrades for VMM5331 in Dell dock, current is 05.03.10: 05.03.10=same
```
spammed by
```
[Tue Sep 15 18:50:06 2020] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
[Tue Sep 15 18:50:06 2020] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[Tue Sep 15 18:50:06 2020] pcieport 0000:00:1d.0: device [8086:9db1] error status/
[Tue Sep 15 18:50:06 2020] pcieport 0000:00:1d.0: [12] Timeout
```
```
lspci -v | grep 9db1
00:1d.0 PCI bridge: Intel Corporation Device 9db1 (rev f0) (prog-if 00 [Normal decode])
```
Bill Duetschler (bikergeek) wrote : | #119 |
Still an issue for me on Ubuntu 20.10 "Groovy".
Wren Turkal (wt-penguintechs-org) wrote : | #120 |
I have this same problem on a Dell XPS 13 9360 that shipped with Ubuntu 16.04 preloaded. My dmesg logs look identical to what I am seeing above.
Wren Turkal (wt-penguintechs-org) wrote : | #121 |
And FWIW, I have fully upgraded all firmware and also tried both Ubuntu 20.10 and Fedora 33. All of these systems show the same behavior.
Wren Turkal (wt-penguintechs-org) wrote : | #122 |
I also tried all LTS Ubuntus back to 16.04. They all get this log message a lot.
Nivedita Singhvi (niveditasinghvi) wrote : | #123 |
Seen this as well -- although I don't believe it's causing any
problems that we know of -- sure does look right now like it's
only noise in the logs.
In Linux Kernel Bug Tracker #109691, rbelli97 (rbelli97-linux-kernel-bugs) wrote : | #137 |
Hello to all. I have the same problem, and this has affected me for a long time now. I described it in detail here, with output, videos, photos etc:
https:/
I hope this adds useful information to draw attention to the bug in question.
Riccardo Belli (rbelli97) wrote : | #124 |
Hello to all. I have the same problem, and this has affected me for a long time now. I described it in detail here, with output, videos, photos etc:
https:/
I hope this adds useful information to draw attention to the bug in question.
Changed in linux: | |
importance: | Unknown → Medium |
status: | Unknown → Confirmed |
Tobias Schönberg (tobias47n9e) wrote : | #125 |
Since upgrading from Ubuntu 20.10 to 21.04 I get this message like every second in journalctl:
Apr 09 13:00:28 tobias-MS-7C37 kernel: pcieport 0000:00:03.1: AER: Multiple Corrected error received: 0000:00:00.0
Apr 09 13:00:28 tobias-MS-7C37 kernel: pcieport 0000:00:03.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Apr 09 13:00:28 tobias-MS-7C37 kernel: pcieport 0000:00:03.1: device [1022:1453] error status/
Apr 09 13:00:28 tobias-MS-7C37 kernel: pcieport 0000:00:03.1: [ 8] Rollover
Apr 09 13:00:28 tobias-MS-7C37 kernel: pcieport 0000:00:03.1: [12] Timeout
Apr 09 13:00:29 tobias-MS-7C37 kernel: pcieport 0000:00:03.1: AER: Corrected error received: 0000:00:00.0
Apr 09 13:00:29 tobias-MS-7C37 kernel: pcieport 0000:00:03.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Apr 09 13:00:29 tobias-MS-7C37 kernel: pcieport 0000:00:03.1: device [1022:1453] error status/
Apr 09 13:00:29 tobias-MS-7C37 kernel: pcieport 0000:00:03.1: [12] Timeout
Paul Menzel (paulmenzel) wrote : | #126 |
For every one affected, at least attach the output of `lspci -nn`, `dmesg`, and give details for your system.
As this bug has gotten long, and causes go from firmware, firmware configuration to hardware issues, it’s better if you opened a separate report directly upstream, after testing the current Linux kernel using Ubuntu PPA repository [1].
Tobias Schönberg (tobias47n9e) wrote : | #127 |
lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
20:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream [1022:57ad]
21:00.0 PCI bridge [0604]: Advan...
Paul Menzel (paulmenzel) wrote : | #128 |
Please create a separate bug report, as the error type is different from the original report here. Also, in the new report (best upstream), give more information (firmware version, extension cards, …), and also *attach* (not paste) the output of `lspci -tvnn` and `sudo lspci -vvxxx`.
In Linux Kernel Bug Tracker #109691, pmenzel+bugzilla.kernel.org (pmenzel+bugzilla.kernel.org-linux-kernel-bugs) wrote : | #138 |
As the ASUS X541UVK is a different device, please create a new bug report with all the necessary information included/attached.
In Linux Kernel Bug Tracker #109691, bjorn (bjorn-linux-kernel-bugs) wrote : | #139 |
Riccardo, would you mind booting with just "pci=noaer" to see if that works around the problem? Your photo at https:/
Riccardo Belli (rbelli97) wrote : | #140 |
I just created the new bug report as suggested, here:
https:/
In Linux Kernel Bug Tracker #109691, naveennaidu479 (naveennaidu479-linux-kernel-bugs) wrote : | #142 |
Created attachment 299043
Patch for the AER message spew
Hello Folks,
I have been working on a patch for the AER message spew. I have a potential patch ready for the problem, but unfortunately, I do not have a system that outputs the same AER errors so I am unable to test it out.
It would really help if anyone could please test this patch and see if it solved the AER message spew.
Thanks,
Naveen Naidu
In Linux Kernel Bug Tracker #109691, naveennaidu479 (naveennaidu479-linux-kernel-bugs) wrote : | #143 |
(In reply to Naveen Naidu from comment #11)
> Created attachment 299043 [details]
> Patch for the AER message spew
>
> Hello Folks,
>
> I have been working on a patch for the AER message spew. I have a potential
> patch ready for the problem, but unfortunately, I do not have a system that
> outputs the same AER errors so I am unable to test it out.
>
> It would really help if anyone could please test this patch and see if it
> solved the AER message spew.
>
> Thanks,
> Naveen Naidu
Forgot to mention! This patch would make the "pci=noaer" unnecessary.
tags: | added: patch |
In Linux Kernel Bug Tracker #109691, cspadijer (cspadijer-linux-kernel-bugs) wrote : | #146 |
Created attachment 299047
attachment-
Hi Naveen.
Absolutely, I can test.
I can try it out this weekend.
Chris
Get BlueMail for Android
On Oct 1, 2021, 2:35 AM, at 2:35 AM, <email address hidden> wrote:
>https:/
>
>Naveen Naidu (<email address hidden>) changed:
>
> What |Removed |Added
>------
> CC| |<email address hidden>
>
>--- Comment #11 from Naveen Naidu (<email address hidden>) ---
>Created attachment 299043
> --> https:/
>Patch for the AER message spew
>
>Hello Folks,
>
>I have been working on a patch for the AER message spew. I have a
>potential
>patch ready for the problem, but unfortunately, I do not have a system
>that
>outputs the same AER errors so I am unable to test it out.
>
>It would really help if anyone could please test this patch and see if
>it
>solved the AER message spew.
>
>Thanks,
>Naveen Naidu
>
>--
>You may reply to this email to add a comment.
>
>You are receiving this mail because:
>You reported the bug.
In Linux Kernel Bug Tracker #109691, naveennaidu479 (naveennaidu479-linux-kernel-bugs) wrote : | #147 |
Comment on attachment 299043
Patch for the AER message spew
I apologize, please ignore this patch. I realized there is a bug in the patch. I have fixed it now and will upload it. I apologized for the inconvenience caused. I do not know how to delete this patch, so I'll reupload a new patch. Apologies again ^^'
In Linux Kernel Bug Tracker #109691, naveennaidu479 (naveennaidu479-linux-kernel-bugs) wrote : | #148 |
Created attachment 299071
Patch for the AER message spew
This is the correct patch. Please use this and ignore the previous patch.
In Linux Kernel Bug Tracker #109691, naveennaidu479 (naveennaidu479-linux-kernel-bugs) wrote : | #149 |
Created attachment 299073
Patch for the AER message spew
Naveen Naidu (theprophet26) wrote : | #145 |
- 0001-PCI-AER-Clear-error-device-AER-registers-in-aer_irq.patch Edit (15.2 KiB, text/plain)
This is the correct patch for the AER message spew.
In Linux Kernel Bug Tracker #109691, cspadijer (cspadijer-linux-kernel-bugs) wrote : | #150 |
Created attachment 299081
attachment-
Okay sounds good.
I will try it soon.
Chris
Get BlueMail for Android
On Oct 3, 2021, 2:03 AM, at 2:03 AM, <email address hidden> wrote:
>https:/
>
>Naveen Naidu (<email address hidden>) changed:
>
> What |Removed |Added
>------
> Attachment #299043|0 |1
> is obsolete| |
>
>--- Comment #15 from Naveen Naidu (<email address hidden>) ---
>Created attachment 299071
> --> https:/
>Patch for the AER message spew
>
>This is the correct patch. Please use this and ignore the previous
>patch.
>
>--
>You may reply to this email to add a comment.
>
>You are receiving this mail because:
>You reported the bug.
In Linux Kernel Bug Tracker #109691, cspadijer (cspadijer-linux-kernel-bugs) wrote : | #151 |
Are you good with me using kernel: 5.11.0-37-generic or would you
prefer I use a different kernel?
The X555U is currently running Linux Mint 20.2 Cinnamon.
FYI:
I tried removing pci=noaer and it does boot now (without your patch).
It has been a while since I tried removing pci=noaer and new kernels get
installed all the time so not sure what kernel first started allowing it
to boot without needing that line.
However, there are still many errors on boot.
dmesg --level=err,warn
[ 0.105337] x86/cpu: VMX (outside TXT) disabled by BIOS
[ 0.110761] MDS CPU bug present and SMT on, data leak possible. See
https:/
more details.
[ 0.110761] #3
[ 0.114598] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[ 0.135583] ACPI BIOS Error (bug): Could not resolve symbol
[\_SB.PCI0.
[ 0.135597] ACPI Error: Skipping While/If block (20201113/
[ 0.527786] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not
cover the entire command/response buffer. [mem 0xfed40000-
flags 0x200] vs fed40080 f80
[ 0.527874] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not
cover the entire command/response buffer. [mem 0xfed40000-
flags 0x200] vs fed40080 f80
[ 0.736009] i8042: PNP: PS/2 appears to have AUX port disabled, if
this is incorrect please boot with i8042.nopnp
[ 0.738042] platform eisa.0: EISA: Cannot allocate resource for mainboard
[ 0.738044] platform eisa.0: Cannot allocate resource for EISA slot 1
[ 0.738045] platform eisa.0: Cannot allocate resource for EISA slot 2
[ 0.738046] platform eisa.0: Cannot allocate resource for EISA slot 3
[ 0.738048] platform eisa.0: Cannot allocate resource for EISA slot 4
[ 0.738049] platform eisa.0: Cannot allocate resource for EISA slot 5
[ 0.738050] platform eisa.0: Cannot allocate resource for EISA slot 6
[ 0.738051] platform eisa.0: Cannot allocate resource for EISA slot 7
[ 0.738052] platform eisa.0: Cannot allocate resource for EISA slot 8
[ 1.268806] r8169 0000:02:00.0: can't disable ASPM; OS doesn't have
ASPM control
[ 1.329939] i2c_hid i2c-ELAN1000:00: supply vdd not found, using
dummy regulator
[ 1.329973] i2c_hid i2c-ELAN1000:00: supply vddl not found, using
dummy regulator
[ 1.611704] ata1.00: supports DRM functions and may not be fully
accessible
[ 1.613394] ata1.00: supports DRM functions and may not be fully
accessible
[ 5.726419] elan_i2c i2c-ELAN1000:00: supply vcc not found, using
dummy regulator
[ 6.376762] nvidia: loading out-of-tree module taints kernel.
[ 6.376775] nvidia: module license 'NVIDIA' taints kernel.
[ 6.376776] Disabling lock debugging due to kernel taint
[ 6.884240] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 470.63.01
Tue Aug 3 20:44:16 UTC 2021
[ 6.958699] nvidia_uvm: module uses symbols from proprietary module
nvidia, inheriting taint.
[ 8.533945] ACPI Warning: \_SB.PCI0.
mismatch - Found [Buffer], ACPI requires [Package] (20201113/
Chris
On 2021-10-03 2:03 a.m., bugzilla-daemo...
In Linux Kernel Bug Tracker #109691, pmenzel+bugzilla.kernel.org (pmenzel+bugzilla.kernel.org-linux-kernel-bugs) wrote : | #152 |
(In reply to cspadijer from comment #18)
> Are you good with me using kernel: 5.11.0-37-generic or would you
> prefer I use a different kernel?
> The X555U is currently running Linux Mint 20.2 Cinnamon.
>
> FYI:
> I tried removing pci=noaer and it does boot now (without your patch).
> It has been a while since I tried removing pci=noaer and new kernels get
> installed all the time so not sure what kernel first started allowing it
> to boot without needing that line.
> However, there are still many errors on boot.
The original bug seems to be solved now. As there are over ten comments already, could you mark it as fixed, and create new issues?
> dmesg --level=err,warn
> [ 0.105337] x86/cpu: VMX (outside TXT) disabled by BIOS
> [ 0.110761] MDS CPU bug present and SMT on, data leak possible. See
> https:/
> more details.
Is GNU/Linux applying the latest microcode updates?
> [ 0.110761] #3
Cosmetic error.
> [ 0.114598] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
> [ 0.135583] ACPI BIOS Error (bug): Could not resolve symbol
> [\_SB.PCI0.
> [ 0.135597] ACPI Error: Skipping While/If block (20201113/
> [ 0.527786] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not
> cover the entire command/response buffer. [mem 0xfed40000-
> 0x200] vs fed40080 f80
> [ 0.527874] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not
> cover the entire command/response buffer. [mem 0xfed40000-
> 0x200] vs fed40080 f80
Firmware issues.
> [ 0.736009] i8042: PNP: PS/2 appears to have AUX port disabled, if this is
> incorrect please boot with i8042.nopnp
Can be ignored.
> [ 0.738042] platform eisa.0: EISA: Cannot allocate resource for mainboard
> [ 0.738044] platform eisa.0: Cannot allocate resource for EISA slot 1
> [ 0.738045] platform eisa.0: Cannot allocate resource for EISA slot 2
> [ 0.738046] platform eisa.0: Cannot allocate resource for EISA slot 3
> [ 0.738048] platform eisa.0: Cannot allocate resource for EISA slot 4
> [ 0.738049] platform eisa.0: Cannot allocate resource for EISA slot 5
> [ 0.738050] platform eisa.0: Cannot allocate resource for EISA slot 6
> [ 0.738051] platform eisa.0: Cannot allocate resource for EISA slot 7
> [ 0.738052] platform eisa.0: Cannot allocate resource for EISA slot 8
Is there an EISA slot?
> [ 1.268806] r8169 0000:02:00.0: can't disable ASPM; OS doesn't have ASPM
> control
Can be ignored.
> [ 1.329939] i2c_hid i2c-ELAN1000:00: supply vdd not found, using dummy
> regulator
> [ 1.329973] i2c_hid i2c-ELAN1000:00: supply vddl not found, using dummy
> regulator
Please contact the Linux folks about this. But first try the latest Linux mainline version.
> [ 1.611704] ata1.00: supports DRM functions and may not be fully
> accessible
> [ 1.613394] ata1.00: supports DRM functions and may not be fully
> accessible
> [ 5.726419] elan_i2c i2c-ELAN1000:00: supply vcc not found, using dummy
> regulator
> [ 6.376762] nvidia: loading out-of-tree module taints kernel.
> [ 6.376...
In Linux Kernel Bug Tracker #109691, cspadijer (cspadijer-linux-kernel-bugs) wrote : | #153 |
Created attachment 299107
attachment-
Hi Paul.
Okay yes. I will mark as fixed and open up new for other issues you clarified as linux. Thanks for your help.
For the firmware issues should I be reaching out to the vendors?
Chris
Get BlueMail for Android
On Oct 5, 2021, 7:13 AM, at 7:13 AM, <email address hidden> wrote:
>https:/
>
>--- Comment #19 from Paul Menzel
>(<email address hidden>) ---
>(In reply to cspadijer from comment #18)
>> Are you good with me using kernel: 5.11.0-37-generic or would you
>> prefer I use a different kernel?
>> The X555U is currently running Linux Mint 20.2 Cinnamon.
>>
>> FYI:
>> I tried removing pci=noaer and it does boot now (without your patch).
>> It has been a while since I tried removing pci=noaer and new kernels
>get
>> installed all the time so not sure what kernel first started allowing
>it
>> to boot without needing that line.
>> However, there are still many errors on boot.
>
>The original bug seems to be solved now. As there are over ten comments
>already, could you mark it as fixed, and create new issues?
>
>> dmesg --level=err,warn
>> [ 0.105337] x86/cpu: VMX (outside TXT) disabled by BIOS
>> [ 0.110761] MDS CPU bug present and SMT on, data leak possible.
>See
>> https:/
>for
>> more details.
>
>Is GNU/Linux applying the latest microcode updates?
>
>> [ 0.110761] #3
>
>Cosmetic error.
>
>> [ 0.114598] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
>> [ 0.135583] ACPI BIOS Error (bug): Could not resolve symbol
>> [\_SB.PCI0.
>> [ 0.135597] ACPI Error: Skipping While/If block
>(20201113/
>> [ 0.527786] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does
>not
>> cover the entire command/response buffer. [mem 0xfed40000-
>flags
>> 0x200] vs fed40080 f80
>> [ 0.527874] tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does
>not
>> cover the entire command/response buffer. [mem 0xfed40000-
>flags
>> 0x200] vs fed40080 f80
>
>Firmware issues.
>
>> [ 0.736009] i8042: PNP: PS/2 appears to have AUX port disabled, if
>this is
>> incorrect please boot with i8042.nopnp
>
>Can be ignored.
>
>> [ 0.738042] platform eisa.0: EISA: Cannot allocate resource for
>mainboard
>> [ 0.738044] platform eisa.0: Cannot allocate resource for EISA
>slot 1
>> [ 0.738045] platform eisa.0: Cannot allocate resource for EISA
>slot 2
>> [ 0.738046] platform eisa.0: Cannot allocate resource for EISA
>slot 3
>> [ 0.738048] platform eisa.0: Cannot allocate resource for EISA
>slot 4
>> [ 0.738049] platform eisa.0: Cannot allocate resource for EISA
>slot 5
>> [ 0.738050] platform eisa.0: Cannot allocate resource for EISA
>slot 6
>> [ 0.738051] platform eisa.0: Cannot allocate resource for EISA
>slot 7
>> [ 0.738052] platform eisa.0: Cannot allocate resource for EISA
>slot 8
>
>Is there an EISA slot?
>
>> [ 1.268806] r8169 0000:02:00.0: can't disable ASPM; OS doesn't
>have ASPM
>> control
>
>Can be ignored.
>
>> [ 1.329939] i2c_hid i2c-ELAN1000...
In Linux Kernel Bug Tracker #109691, pmenzel+bugzilla.kernel.org (pmenzel+bugzilla.kernel.org-linux-kernel-bugs) wrote : | #154 |
[Please remove the quote next time from your reply. If you look at the Web interface, the comments get needlessly long because of that.]
(In reply to cspadijer from comment #20)
[…]
> Okay yes. I will mark as fixed and open up new for other issues you
> clarified as linux. Thanks for your help.
Thank you.
> For the firmware issues should I be reaching out to the vendors?
Yes, only the vendors can fix the firmware, unless you use FLOSS firmware like coreboot based firmware for example.
Unfortunately, my track record of getting vendors to fix their firmware is not so good, as you are only one customer using this weird operating system and not Microsoft Windows. But fingers crossed.
Additionally you might want to point them to the Firmware Test Suite (FWTS) [1].
In Linux Kernel Bug Tracker #109691, cspadijer (cspadijer-linux-kernel-bugs) wrote : | #155 |
Created attachment 299109
attachment-
Okay great.
Thanks for the link to FirmwareTestSuite.
Chris
Get BlueMail for Android
On Oct 5, 2021, 9:25 AM, at 9:25 AM, <email address hidden> wrote:
>https:/
>
>--- Comment #21 from Paul Menzel
>(<email address hidden>) ---
>[Please remove the quote next time from your reply. If you look at the
>Web
>interface, the comments get needlessly long because of that.]
>
>(In reply to cspadijer from comment #20)
>
>[…]
>
>> Okay yes. I will mark as fixed and open up new for other issues you
>> clarified as linux. Thanks for your help.
>
>Thank you.
>
>> For the firmware issues should I be reaching out to the vendors?
>
>Yes, only the vendors can fix the firmware, unless you use FLOSS
>firmware like
>coreboot based firmware for example.
>
>Unfortunately, my track record of getting vendors to fix their firmware
>is not
>so good, as you are only one customer using this weird operating system
>and not
>Microsoft Windows. But fingers crossed.
>
>Additionally you might want to point them to the Firmware Test Suite
>(FWTS)
>[1].
>
>
>[1]: https:/
>
>--
>You may reply to this email to add a comment.
>
>You are receiving this mail because:
>You reported the bug.
In Linux Kernel Bug Tracker #109691, cspadijer (cspadijer-linux-kernel-bugs) wrote : | #156 |
An upstream kernel since 4.2.0-22-generic has resolved the issue with this make/model of laptop.
Laptop successfully boots now without the pci=nommconf boot parameter.
Changed in linux: | |
status: | Confirmed → Unknown |
Narcis Garcia (narcisgarcia) wrote : | #157 |
One more case:
- Hardware: Mainboard "Asus Prime B-560M-A"
- Software: Debian GNU/Linux 11 (bullseye); Kernel Linux 5.10.0-10-amd64
- systemd-journald messages:
Jan 12 09:57:53 system systemd-
░░ Subject: Journal messages have been missed
░░ Defined-By: systemd
░░ Support: https:/
░░
░░ Kernel messages have been lost as the journal system has been unable
░░ to process them quickly enough.
- Kernel messages (dmesg) that make systemd-journald to collapse:
[19209.926816] pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[19209.926817] pcieport 0000:00:1c.5: device [8086:43bd] error status/
[19209.926817] pcieport 0000:00:1c.5: [ 0] RxErr
Workaround: Adding "pcie_aspm=off" to GRUB_CMDLINE_LINUX parameter at /etc/default/grub
and run: sudo update-grub
Next reboot.
Bjorn Helgaas (bjorn-helgaas) wrote : | #158 |
Is this still an issue? If so, can somebody add a complete dmesg log and "sudo lspci -vv" output from a current kernel?
Noah Bowman (eksistenze) wrote : | #159 |
Xavier (xav46) wrote : | #160 |
Hi there !
Same errors are spamming my logs, and my console...
(Nearly) Fresh install on a Ubuntu server 22.04.4 LTS, motherboard Asus Pro Q670M-C-CSM. Kernel is 5.15.0-97-generic
The "pci=noaer" grub patch does the job, but I’d rather not put the dust under the carpet ;-)
Output of dmesg and lspci -vv attached if it could help.
Xavier (xav46) wrote : | #161 |
Bjorn Helgaas (bjorn-helgaas) wrote : | #162 |
Thank you! "pci=noaer" definitely sweeps this dust under the carpet, and it would be much better to avoid that.
Narcis (comment #157) reported that "pcie_aspm=off" is a workaround and is much more specific than "pci=noaer".
If we can collect complete dmesg and "sudo lspci -vv" output when booting with and without "pcie_aspm=off", there might be a clue about something we're doing wrong with ASPM.
Askar Safin (safinaskar) wrote : | #163 |
I have this problem, too.
I recently bought laptop Dell Precision 7780, which is pretty new. I bought it with 1 SSD. Then I added additional 3 SSDs and then I started to get these messages about corrected errors.
I see this messages in all Linux kernels I tried. In particular I see them in preinstalled Ubuntu (with 6.1 kernel)! This means that Dell is unable to distribute OS compatible with its own hardware! (But note that I started to get errors after I added new SSDs.)
Sometimes Linux boots without any errors. Sometimes it boots and displays them many times per second.
Also I noticed that chances to get errors are high if I did hibernate and resume. I. e. chances of errors are high after resume. (But I didn't read or write anything to that faulty SSD in the process!)
Also I swapped 2 of these 3 new SSDs and errors persisted. But they (errors) remained connected to particular slot/port, not to particular SSD. I. e. after the swap errors are now reported about new SSD inserted to old slot.
Also I get all these errors without any attempts to read or write disk!
(I will attach files in the next message)
Askar Safin (safinaskar) wrote : | #164 |
Askar Safin (safinaskar) wrote : | #165 |
Askar Safin (safinaskar) wrote : | #166 |
Askar Safin (safinaskar) wrote : | #167 |
- dmesg-pcie_aspm Edit (87.1 KiB, text/plain)
Here is dmesg output with "pcie_aspm=off" (and yes, it seems "pcie_aspm=off" fixes the problem)
Askar Safin (safinaskar) wrote : | #168 |
Just now I did the following experiment: I booted Linux 6.10 without "pcie_aspm=off", then I booted it with "pcie_aspm=off", then again without, etc. In total I did nearly 13 boots with "pcie_aspm=off" and nearly 13 boots without "pcie_aspm=off". All these boots were performed automatically by script using kexec. No hibernation was involved.
System was booted, then "sleep 60" was executed, then kexec booted again.
Results are so:
- All boots with "pcie_aspm=off" display no errors
- 3 out of 13 boots without "pcie_aspm=off" display no errors. Other 10 boots show these "corrected error received" errors (at least one time). But number of these messages is small (say, 5 messages in one boot, i. e. in 60 seconds). (My practice shows that if you really want to see a LOT of these messages, you should do hibernate and then resume.)
So, it seems that:
- Debugging using "kexec" works
- You don't need hibernate to reproduce this bug (but number of error messages will be small)
- "pcie_aspm=off" makes messages disappear
- There is a stable way to reproduce bug: boot 13 times using kexec. 10 times out of 13 you will see at least one message
Ask me any questions. I'm ready to do other experiments, send any kind of logs, etc
Askar Safin (safinaskar) wrote : | #169 |
Also, it is possible that the messages are true. They appeared after I inserted 3 additional 4 Tb SSDs. It is possible that I simply inserted too many of them. Yes, officially Dell supports inserting 3 additional SSDs 4 Tb each. But it is possible that official Dell documentation is simply lie and thus the messages are true
Paul Menzel (paulmenzel) wrote : | #170 |
@safinaskar, thank you for your report. Does `pci=noaer` help? All the errors you get are “Correctable”, right? Then it’s probably only cosmetic.
As this is a new device, can you please report this issue to Dell, and also create a new report in Launchpad?
Askar Safin (safinaskar) wrote : | #171 |
The bug (assuming it is a bug) reproduces in 4.9.0-13-amd64 debian's kernel
Askar Safin (safinaskar) wrote : | #172 |
I still suspect this is hardware error, so I just removed 2 of these 3 new SSDs, including one connected to possibly faulty slot. Errors disappeared.
@paulmenzel , if you (or other kernel developers) want, then I can insert them back. But I cannot do this at home, so this will require me to go to certain place for 30 minutes and then to go back for another 30 minutes. I. e. this will require some time investment from my side. So, if you are REALLY sure that you will be ready to try to fix this problem together with me in back-and-forth feedback cycle, then I can do this. I. e. if this actually helps to fix this bug in Linux for real, then, yes, I can do this, assuming that you will not disappear and will be with me in feedback loop.
> All the errors you get are “Correctable”, right?
Yes
> As this is a new device, can you please report this issue to Dell
How to do this? I tried to speak with them in online chat (about other issue), but their chat employees seem to be not so smart. Also: I added these 3 SSD not in authorized Dell service center, so I already lost my service support
This change was made by a bot.