AER: Corrected error received: id=00e4 / PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e4

Bug #1279699 reported by JSE
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Description: Ubuntu 13.10
Release: 13.10 (Ubuntu 3.11.0-15.25-generic 3.11.10)

On Ubuntu 3.11.0-15.25-generic 3.11.10 I am getting the following error:

[ 11.749109] bonding: bond0: Setting MII monitoring interval to 100.
[ 11.797258] bonding: bond1 is being created...
[ 11.797320] bonding: bond1 is being created...
[ 11.797322] bonding: bond1 is being created...
[ 11.797653] bonding: bond1 already exists.
[ 11.797658] bonding: bond1 already exists.
[ 11.808146] bonding: bond1: Setting MII monitoring interval to 100.
[ 11.824060] bonding: bond0: setting mode to 802.3ad (4).
[ 11.824063] bonding: bond1: setting mode to 802.3ad (4).
[ 11.825177] bonding: bond1: Setting LACP rate to fast (1).
[ 11.825259] bonding: bond0: Setting LACP rate to fast (1).
[ 11.825527] IPv6: ADDRCONF(NETDEV_UP): bond1: link is not ready
[ 11.825674] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[ 11.871240] bonding: bond0: Adding slave p2p1.
[ 12.124239] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro
[ 12.211592] init: avahi-cups-reload main process (823) terminated with status 1
[ 12.714726] bonding: bond0: enslaving p2p1 as a backup interface with a down link.
[ 12.715620] bonding: bond1: Adding slave p4p3.
[ 13.072966] bonding: bond1: enslaving p4p3 as a backup interface with a down link.
[ 13.072970] bonding: bond1: Adding slave p4p2.
[ 13.608930] bonding: bond1: enslaving p4p2 as a backup interface with a down link.
[ 13.608934] bonding: bond0: Adding slave p3p1.
[ 13.976911] netxen_nic: p4p2 NIC Link is up
[ 14.008903] netxen_nic: p4p3 NIC Link is up
[ 14.450454] bonding: bond0: enslaving p3p1 as a backup interface with a down link.
[ 14.450459] bonding: bond1: Adding slave p4p1.
[ 14.544862] bonding: bond1: enslaving p4p1 as a backup interface with a down link.
[ 14.544866] bonding: bond1: Adding slave p4p4.
[ 14.624820] bonding: bond1: Warning: No 802.3ad response from the link partner for any adapters in the bond
[ 14.677889] igb: p2p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 14.824864] bonding: bond1: enslaving p4p4 as a backup interface with a down link.
[ 14.824871] IPv6: ADDRCONF(NETDEV_CHANGE): bond1: link becomes ready
[ 14.844792] bonding: bond0: link status down again after 0 ms for interface p2p1.
[ 14.862650] igb 0000:05:00.0: changing MTU from 1500 to 9000
[ 14.924798] bonding: bond1: Warning: No 802.3ad response from the link partner for any adapters in the bond
[ 14.938826] igb 0000:06:00.0: changing MTU from 1500 to 9000
[ 15.024843] bonding: bond1: Warning: No 802.3ad response from the link partner for any adapters in the bond
[ 15.148767] bonding: bond1: link status definitely up for interface p4p3, 1000 Mbps full duplex.
[ 15.148770] bonding: bond1: link status definitely up for interface p4p2, 1000 Mbps full duplex.
[ 15.940783] netxen_nic: p4p1 NIC Link is up
[ 15.948787] bonding: bond1: link status definitely up for interface p4p1, 0 Mbps full duplex.
[ 17.657721] igb: p3p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 17.724664] bonding: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
[ 17.724676] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[ 17.748653] bonding: bond0: link status definitely up for interface p3p1, 1000 Mbps full duplex.
[ 18.201714] igb: p2p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 18.248622] bonding: bond0: link status definitely up for interface p2p1, 1000 Mbps full duplex.

[ 41.932293] pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
[ 41.932337] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e4(Receiver ID)
[ 41.937340] pcieport 0000:00:1c.4: device [8086:8c18] error status/mask=00000001/00002000
[ 41.942261] pcieport 0000:00:1c.4: [ 0] Receiver Error (First)

[ 538.342279] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 538.342306] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e4(Receiver ID)
[ 538.342763] pcieport 0000:00:1c.4: device [8086:8c18] error status/mask=00000001/00002000
[ 538.343127] pcieport 0000:00:1c.4: [ 0] Receiver Error (First)

These are new Supermicro X10SL7-F motherboards with new HP NC375T in the PCIE x8 (in x4) port.
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Feb 13 12:00 seq
 crw-rw---- 1 root audio 116, 33 Feb 13 12:00 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.12.5-0ubuntu2.2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg:

DistroRelease: Ubuntu 13.10
HibernationDevice: RESUME=UUID=83e4fd84-9f18-4328-ac8e-60c5e2451275
MachineType: Supermicro X10SL7-F
MarkForUpload: True
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.11.0-15-generic root=UUID=4e3cc489-e68d-4a0f-965e-1211a3215e16 ro
ProcVersionSignature: Ubuntu 3.11.0-15.25-generic 3.11.10
RelatedPackageVersions:
 linux-restricted-modules-3.11.0-15-generic N/A
 linux-backports-modules-3.11.0-15-generic N/A
 linux-firmware 1.116.2
RfKill: Error: [Errno 2] No such file or directory
Tags: saucy
Uname: Linux 3.11.0-15-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 07/19/2013
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.1
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: X10SL7-F
dmi.board.vendor: Supermicro
dmi.board.version:
 L�
�.
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.1:bd07/19/2013:svnSupermicro:pnX10SL7-F:pvr0123456789:rvnSupermicro:rnX10SL7-F:rvrL.:cvnSupermicro:ct17:cvr0123456789:
dmi.product.name: X10SL7-F
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

Revision history for this message
JSE (jse.nl) wrote :
summary: - AER: Corrected error received: id=00e4
+ AER: Corrected error received: id=00e4 / PCIe Bus Error:
+ severity=Corrected, type=Physical Layer, id=00e4
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1279699

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: saucy
JSE (jse.nl)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: apport-collected
description: updated
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.13 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14-rc2-trusty/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
JSE (jse.nl)
tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
JSE (jse.nl) wrote :
Download full text (3.5 KiB)

Can be triggered when running 'iperf -c xxx.xxx.xxx.xxx' on this server and 'iperf -s' on another server.
Does not happen when running 'iperf -s' running on this server and 'iperf -c xxx.xxx.xxx.xxx' on another server.

In some occasions another error is kicking in too (Bad DLLP):

[ 302.497045] pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
[ 302.497068] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e4(Receiver ID)
[ 302.498022] pcieport 0000:00:1c.4: device [8086:8c18] error status/mask=00000001/00002000 id=00e4(Receiver ID)
[ 302.498493] pcieport 0000:00:1c.4: [ 0] Receiver Error (First)=00000001/00002000
[ 302.498943] pcieport 0000:00:1c.4: Error of this Agent(00e4) is reported first
[ 302.499396] netxen_nic 0000:07:00.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0701(Receiver ID)
[ 302.499865] netxen_nic 0000:07:00.1: device [4040:0100] error status/mask=00002080/00002000d=00e4(Receiver ID)
[ 302.500345] netxen_nic 0000:07:00.1: [ 7] Bad DLLP
[ 302.500831] netxen_nic 0000:07:00.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0702(Receiver ID)
[ 302.501281] netxen_nic 0000:07:00.2: device [4040:0100] error status/mask=00002080/00002000
[ 302.501764] netxen_nic 0000:07:00.2: [ 7] Bad DLLP ed, type=Physical Layer, id=00e4(Receiver ID)
[ 302.502222] netxen_nic 0000:07:00.3: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0703(Receiver ID)
[ 302.502689] netxen_nic 0000:07:00.3: device [4040:0100] error status/mask=00002080/00002000
[ 302.503133] netxen_nic 0000:07:00.3: [ 7] Bad DLLP
[ 334.551474] pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
[ 334.551502] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e4(Receiver ID)
[ 334.551980] pcieport 0000:00:1c.4: device [8086:8c18] error status/mask=00000001/00002000
[ 334.552436] pcieport 0000:00:1c.4: [ 0] Receiver Error (First)
[ 469.354652] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 469.354665] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e4(Receiver ID)
[ 469.355151] pcieport 0000:00:1c.4: device [8086:8c18] error status/mask=00000001/00002000
[ 469.355634] pcieport 0000:00:1c.4: [ 0] Receiver Error (First)
[ 471.814493] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 471.814506] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e4(Receiver ID)
[ 471.815003] pcieport 0000:00:1c.4: device [8086:8c18] error status/mask=00000001/00002000
[ 471.815498] pcieport 0000:00:1c.4: [ 0] Receiver Error (First)
[ 472.548669] pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
[ 472.548683] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e4(Receiver ID)
[ 472.549202] pcieport 0000:00:1c.4: device [8086:8c18] error status/mask=00000001/00002000
[ 472.549708] pcieport 0000:00:1c.4: [ 0] Receiver Error (First)
[ 473.141720] pcieport 00...

Read more...

Revision history for this message
JSE (jse.nl) wrote :

Still present in 3.14.0-031400rc8-generic

Revision history for this message
Pebas (pebasmisc) wrote :

Bug was gone (fixed, I think) in my PC today after installing new Kernel version "linux-image-4.15.0-50-generic" in Ubuntu 18.04.2 x86_64.

Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
Bjorn Helgaas (bjorn-helgaas) wrote :

Generally we should not see reproducible PCIe Correctable Errors in significant numbers. Some have reported that "pcie_aspm=off" avoids the errors. If that's the case for you, see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2043665/comments/6 and help me investigate it!

Revision history for this message
Bjorn Helgaas (bjorn-helgaas) wrote :

Turns out "pcie_aspm=off" effectively disables AER because Linux doesn't request AER control when it doesn't advertise ASPM support (see ACPI_PCIE_REQ_SUPPORT). That explains why "pcie_aspm=off" would avoid the error reporting.

This report was from a v3.11-based kernel. v3.11 is from Mon Sep 2 13:46:10 2013, so it should contain https://git.kernel.org/linus/dce87b960cf4 ("netxen: mask correctable error"), which should mask all Correctable Errors for this device (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c?id=v3.11#n1420), so I don't know why this flood happened.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.