Sporadic problems with X710 (i40e) and bonding where one interface is shown as "state DOWN" and without LOWER_UP

Bug #1811963 reported by Malte Schmidt
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

After rebooting the physical server there is a 50/50 chance of all connected interfaces coming up. This affects Dell EMC R740's and R440's equipped with the X710 network cards.
As far as I noticed (~20 reboots on different machines), this happens only when using bonding (in this case active-backup or mode 1, did not test different modes yet). The networking-hardware on the other side shows the ports "connected". tcpdump shows frames being received, even if the interface is in "state DOWN".

Tried with:

Ubuntu 16.04, kernel 4.4.0-141, driver 2.7.26 (from the Intel-website), firmware 18.8.9
Ubuntu 16.04, kernel 4.4.0-141, driver 1.4.25-k, firmware 18.8.9
Ubuntu 16.04, kernel 4.15.0-43 (hwe), driver 2.1.14-k, firmware 18.8.9

The following excerpts are made using Intels driver in version 2.7.26, therefore tainting the kernel, but the same happens using the original kernel's version or the hardware enablement kernel's version.

Sporadic failure case:

[ 6.319226] i40e: loading out-of-tree module taints kernel.
[ 6.319227] i40e: loading out-of-tree module taints kernel.
[ 6.319422] i40e: module verification failed: signature and/or required key missing - tainting kernel
[ 6.410837] i40e: Intel(R) 40-10 Gigabit Ethernet Connection Network Driver - version 2.7.26
[ 6.410838] i40e: Copyright(c) 2013 - 2018 Intel Corporation.
[ 6.423542] i40e 0000:3b:00.0: fw 6.81.49447 api 1.7 nvm 6.80 0x80003d72 18.8.9
[ 6.658526] i40e 0000:3b:00.0: MAC address: ff:ff:ff:ff:ff:ff
[ 6.710391] i40e 0000:3b:00.0: PCI-Express: Speed 8.0GT/s Width x8
[ 6.725692] i40e 0000:3b:00.0: Features: PF-id[0] VFs: 64 VSIs: 2 QP: 40 RSS FD_ATR FD_SB NTUPLE CloudF DCB VxLAN Geneve NVGRE PTP VEPA
[ 6.750239] i40e 0000:3b:00.1: fw 6.81.49447 api 1.7 nvm 6.80 0x80003d72 18.8.9
[ 6.987874] i40e 0000:3b:00.1: MAC address: ff:ff:ff:ff:ff:f1
[ 7.005397] i40e 0000:3b:00.1 eth0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
[ 7.024993] i40e 0000:3b:00.1: PCI-Express: Speed 8.0GT/s Width x8
[ 7.040298] i40e 0000:3b:00.1: Features: PF-id[1] VFs: 64 VSIs: 2 QP: 40 RSS FD_ATR FD_SB NTUPLE CloudF DCB VxLAN Geneve NVGRE PTP VEPA
[ 7.054384] i40e 0000:3b:00.1 enp59s0f1: renamed from eth0
[ 7.079613] i40e 0000:3b:00.0 enp59s0f0: renamed from eth1
[ 9.788893] i40e 0000:3b:00.0 enp59s0f0: already using mac address ff:ff:ff:ff:ff:ff
[ 9.819480] i40e 0000:3b:00.1 enp59s0f1: set new mac address ff:ff:ff:ff:ff:ff

[ 9.728194] bond0: Setting MII monitoring interval to 100
[ 9.788690] bond0: Adding slave enp59s0f0
[ 9.805195] bond0: Enslaving enp59s0f0 as a backup interface with a down link
[ 9.819470] bond0: Adding slave enp59s0f1
[ 9.836360] bond0: making interface enp59s0f1 the new active one
[ 9.836614] bond0: Enslaving enp59s0f1 as an active interface with an up link

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: enp59s0f1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: enp59s0f0
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: ff:ff:ff:ff:ff:ff
Slave queue ID: 0

Slave Interface: enp59s0f1
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: ff:ff:ff:ff:ff:f1
Slave queue ID: 0

4: enp59s0f0: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 portid ffffffffffff state DOWN group default qlen 1000
    link/ether ff:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
5: enp59s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 portid fffffffffff1 state UP group default qlen 1000
    link/ether ff:ff:ff:ff:ff:f1 brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ff:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
    inet 123.123.123.123/24 brd 123.123.123.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 ffff::ffff:ffff:ffff:ffff/64 scope link
       valid_lft forever preferred_lft forever

bond0 Link encap:Ethernet HWaddr ff:ff:ff:ff:ff:ff
          inet addr:123.123.123.123 Bcast:123.123.123.255 Mask:255.255.255.0
          inet6 addr: ffff::ffff:ffff:ffff:ffff/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
          RX packets:4392 errors:0 dropped:10 overruns:0 frame:0
          TX packets:3585 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:443958 (443.9 KB) TX bytes:1129305 (1.1 MB)

enp59s0f0 Link encap:Ethernet HWaddr ff:ff:ff:ff:ff:ff
          UP BROADCAST SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:10 errors:0 dropped:10 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:684 (684.0 B) TX bytes:0 (0.0 B)

enp59s0f1 Link encap:Ethernet HWaddr ff:ff:ff:ff:ff:ff
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:4382 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3585 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:443274 (443.2 KB) TX bytes:1129305 (1.1 MB)

Expectation:

[ 6.012207] i40e: loading out-of-tree module taints kernel.
[ 6.012208] i40e: loading out-of-tree module taints kernel.
[ 6.012391] i40e: module verification failed: signature and/or required key missing - tainting kernel
[ 6.078171] i40e: Intel(R) 40-10 Gigabit Ethernet Connection Network Driver - version 2.7.26
[ 6.078171] i40e: Copyright(c) 2013 - 2018 Intel Corporation.
[ 6.091118] i40e 0000:3b:00.0: fw 6.81.49447 api 1.7 nvm 6.80 0x80003d72 18.8.9
[ 6.344344] i40e 0000:3b:00.0: MAC address: ff:ff:ff:ff:ff:ff
[ 6.360210] i40e 0000:3b:00.0 eth0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
[ 6.371418] i40e 0000:3b:00.0: PCI-Express: Speed 8.0GT/s Width x8
[ 6.380524] i40e 0000:3b:00.0: Features: PF-id[0] VFs: 64 VSIs: 2 QP: 40 RSS FD_ATR FD_SB NTUPLE CloudF DCB VxLAN Geneve NVGRE PTP VEPA
[ 6.393099] i40e 0000:3b:00.1: fw 6.81.49447 api 1.7 nvm 6.80 0x80003d72 18.8.9
[ 6.456772] i40e 0000:3b:00.1: MAC address: ff:ff:ff:ff:ff:f1
[ 6.468907] i40e 0000:3b:00.1 eth1: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
[ 6.480078] i40e 0000:3b:00.1: PCI-Express: Speed 8.0GT/s Width x8
[ 6.489135] i40e 0000:3b:00.1: Features: PF-id[1] VFs: 64 VSIs: 2 QP: 40 RSS FD_ATR FD_SB NTUPLE CloudF DCB VxLAN Geneve NVGRE PTP VEPA
[ 6.508587] i40e 0000:3b:00.1 enp59s0f1: renamed from eth1
[ 6.559689] i40e 0000:3b:00.0 enp59s0f0: renamed from eth0
[ 9.063270] i40e 0000:3b:00.1 enp59s0f1: already using mac address ff:ff:ff:ff:ff:f1
[ 9.081577] i40e 0000:3b:00.0 enp59s0f0: set new mac address ff:ff:ff:ff:ff:f1

[ 9.160033] bond0: Setting MII monitoring interval to 100
[ 9.170981] bond0: Adding slave enp59s0f1
[ 9.187281] bond0: making interface enp59s0f1 the new active one
[ 9.187496] bond0: Enslaving enp59s0f1 as an active interface with an up link
[ 9.251473] bond0: Adding slave enp59s0f0
[ 9.267908] bond0: Enslaving enp59s0f0 as a backup interface with an up link

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: enp59s0f1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: enp59s0f1
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: ff:ff:ff:ff:ff:f1
Slave queue ID: 0

Slave Interface: enp59s0f0
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: ff:ff:ff:ff:ff:ff
Slave queue ID: 0

2: enp59s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 portid ffffffffffff state UP group default qlen 1000
    link/ether ff:ff:ff:ff:ff:f1 brd ff:ff:ff:ff:ff:ff
3: enp59s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 portid fffffffffff1 state UP group default qlen 1000
    link/ether ff:ff:ff:ff:ff:f1 brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ff:ff:ff:ff:ff:f1 brd ff:ff:ff:ff:ff:ff
    inet 123.123.123.123/24 brd 123.123.123.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 ffff::ffff:ffff:ffff:ffff/64 scope link
       valid_lft forever preferred_lft forever

bond0 Link encap:Ethernet HWaddr ff:ff:ff:ff:ff:f1
          inet addr:123.123.123.123 Bcast:123.123.123.255 Mask:255.255.255.0
          inet6 addr: ffff::ffff:ffff:ffff:ffff/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
          RX packets:2638 errors:0 dropped:12 overruns:0 frame:0
          TX packets:2218 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:272069 (272.0 KB) TX bytes:694737 (694.7 KB)

enp59s0f0 Link encap:Ethernet HWaddr ff:ff:ff:ff:ff:f1
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:12 errors:0 dropped:12 overruns:0 frame:0
          TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:860 (860.0 B) TX bytes:3525 (3.5 KB)

enp59s0f1 Link encap:Ethernet HWaddr ff:ff:ff:ff:ff:f1
          UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
          RX packets:2626 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2203 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:271209 (271.2 KB) TX bytes:691212 (691.2 KB)

Revision history for this message
Malte Schmidt (maltris) wrote :
Revision history for this message
Malte Schmidt (maltris) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1811963

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Malte Schmidt (maltris) wrote :

Unable to serve the logs via apport-collect due to policy restrictions.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Malte Schmidt (maltris) wrote :

Can anyone follow up on this?

I am ready to provide specific logs and try alternative methods/workarounds on demand.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v5.0 kernel [0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".

Thanks in advance.

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0/

Malte Schmidt (maltris)
tags: added: kernel-bug-exists-upstream
Revision history for this message
Malte Schmidt (maltris) wrote :

I tested and can reproduce the problem with this combination:

Ubuntu 16.04, kernel 5.0.0-050000.201903032031, driver 2.7.6-k, firmware 18.8.9

I set the tag accordingly.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please file a bug to Intel ethernet developers.

Commit 3647cd6eaf83d7f6145a3ccac73f5286496490d2 and 3f8af41262697a4d6742f030fbe0ceb9e1a048a6 in linux-next may worth trying.

Revision history for this message
Nivedita Singhvi (niveditasinghvi) wrote :

Hi Malte,

Was this issue resolved for you?

There are several other possibilities that it could be - and
if it's still a problem with current mainline, please let
us know.

Revision history for this message
Malte Schmidt (maltris) wrote :

This was fixed for me by patching the latest firmware on the GBICs.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.