e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang (Intel I219-LM )
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-lts-xenial (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I am running Ubuntu 16.04 on a Supermicro X11SAE mainboard.
The machines was put into service several months ago. Ever since that time, the NIC eno1 is causing problems from time to time. When the machine boots up everything seems to be fine for a while. For no particular reason (from what I could see so far) the device seems to hang and is thus causing performance drop downs and lots of log messages.
This is the setup:
root@server2:~# lsb_release -rd
Description: Ubuntu 16.04.3 LTS
Release: 16.04
root@server2:~# uname -a
Linux server2 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
There are two onboard NICs put together in a bond0 plus a third gateway NIC:
This is my /etc/network/
auto lo
iface lo inet loopback
auto enp7s0
iface enp7s0 inet static
address 192.168.178.254
netmask 255.255.255.0
network 192.168.178.0
broadcast 192.168.178.255
gateway 192.168.178.1
auto eno1
iface eno1 inet manual
bond-master bond0
auto eno2
iface eno2 inet manual
bond-master bond0
auto bond0
iface bond0 inet manual
bond-mode 802.3ad
bond-miimon 100
bond-lacp-rate 1
slaves eno1 eno2
post-up ifup br0
iface br0 inet static
address 192.168.3.254
netmask 255.255.255.0
network 192.168.3.0
dns-nameserver 192.168.3.254
dns-search localdomain
broadcast 192.168.3.255
bridge_ports bond0
bridge_stp off
bridge_fd 0
bridge_maxwait 0
The onboard NIC eno1 and enp7s0 are served by the e1000e driver. The eno0 uses the igb driver. (see below)
root@server2:~# lspci |grep Ether
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
root@server2:~# ethtool -i eno1
driver: e1000e
version: 3.2.6-k
firmware-version: 0.8-4
expansion-
bus-info: 0000:00:1f.6
supports-
supports-test: yes
supports-
supports-
supports-
root@server2:~# ethtool -i eno2
driver: igb
version: 5.3.0-k
firmware-version: 3.25, 0x800005cc
expansion-
bus-info: 0000:06:00.0
supports-
supports-test: yes
supports-
supports-
supports-
root@server2:~# ethtool -i enp7s0
driver: e1000e
version: 3.2.6-k
firmware-version: 1.8-0
expansion-
bus-info: 0000:07:00.0
supports-
supports-test: yes
supports-
supports-
supports-
After every reboot it takes a while for the eno1 to start hanging (between hours and days).
dmesg then shows messages like this every few seconds:
[1874222.304742] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
[1874224.304604] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
[1874224.308396] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
[1874224.308447] e1000e 0000:00:1f.6 eno1: speed changed to 0 for port eno1
[1874224.396431] bond0: link status definitely down for interface eno1, disabling it
[1874228.310205] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[1874228.312176] bond0: link status definitely up for interface eno1, 1000 Mbps full duplex
Those are the standard settings for the affected NIC:
root@server2:~# ethtool -k eno1
Features for eno1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentatio
udp-fragmentati
generic-
generic-
large-receive-
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-
tx-gre-
tx-ipip-
tx-sit-
tx-udp_
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-
rx-vlan-
rx-vlan-
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
hw-tc-offload: off [fixed]
root@server2:~# lspci -vv -s 0000:00:1f.6
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
DeviceName: Intel Ethernet i219 #1
Subsystem: Super Micro Computer Inc Ethernet Connection (2) I219-LM
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 147
Region 0: Memory at df800000 (32-bit, non-prefetchable) [size=128K]
Kernel driver in use: e1000e
Kernel modules: e1000e
06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
DeviceName: Intel Ethernet i210 #2
Subsystem: Super Micro Computer Inc I210 Gigabit Network Connection
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at df400000 (32-bit, non-prefetchable) [size=512K]
Region 2: I/O ports at d000 [size=32]
Region 3: Memory at df480000 (32-bit, non-prefetchable) [size=16K]
Kernel driver in use: igb
Kernel modules: igb
root@server2:~# lspci -vv -s 0000:07:00.0
07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: Intel Corporation Gigabit CT Desktop Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at df3c0000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at df300000 (32-bit, non-prefetchable) [size=512K]
Region 2: I/O ports at c000 [size=32]
Region 3: Memory at df3e0000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at df380000 [disabled] [size=256K]
Kernel driver in use: e1000e
Kernel modules: e1000e
I looked around to see if there was a chance to mitigate the problem.
Someone mentioned that "ethtool -K eno1 sg off tso off gro off" should help circumvent the problem.
Unfortunately it did not help.
affects: | ubuntu → linux-lts-xenial (Ubuntu) |
Status changed to 'Confirmed' because the bug affects multiple users.