[Regression] 5.4 is not identifying all ports on Intel x710-TM4 10GbE controller

Bug #1887703 reported by Jeff Lane 
This bug report is a duplicate of:  Bug #1893956: Intel x710 LOMs do not work on Focal. Edit Remove
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Jeff Lane 
Focal
Confirmed
High
Jeff Lane 

Bug Description

[IMPACT]

The Intel x710-TM4 is one of the latest 10GbE controllers from intel using the i40e driver. This particular 4 port comes in a 2x2 arrangement: 2x SFP+ and 2x RJ-45. This card is enabled in 5.4 via the inbox version of the i40e driver, and hwinfo does show both sides of the card but the kernel only sees the two SFP+ ports and cannot address or use the two copper ports.

This is currently blocking certification for one of our hardware partners.

After some investigation we see this works in a more recent version of the driver. Intel suggests the commit in FIXES from 5.5 should make this work.

This is a regression from the i40e driver in Bionic (5.3 HWE) that, per the tester, does show all four ports.

[FIXES]

3df5b9a6a9ec3c1e4431bf1db3426b54dc92dd91 i40e: enable X710 support

I have a branch here:
https://code.launchpad.net/~bladernr/ubuntu/+source/linux/+git/focal/+ref/1887703-i40e-enable-x710

[TESTING]
Boot system, verify four ports are visible and can be addressed and pass data.

[IMPACT]

Jeff Lane  (bladernr)
tags: added: blocks-hwcert-server
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1887703

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Focal):
status: New → Incomplete
Revision history for this message
Jeff Lane  (bladernr) wrote : Re: 5.4 is not identifying all ports on Intex x710-TM4 10GbE controller

From the duplicate bug is a sosreport from the failing system:

Revision history for this message
Jeff Lane  (bladernr) wrote :

Also, from syslog in the sosreport, looks like something bad happens when the driver loads and the first two ports are probed:

Jul 15 15:18:05 usable-sloth kernel: [ 26.841704] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.8.20-k
Jul 15 15:18:05 usable-sloth kernel: [ 26.844779] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
Jul 15 15:18:05 usable-sloth kernel: [ 26.928593] i40e 0000:33:00.0: unidentified MAC or BLANK NVM: -11
Jul 15 15:18:05 usable-sloth kernel: [ 26.928704] i40e: probe of 0000:33:00.0 failed with error -11
Jul 15 15:18:05 usable-sloth kernel: [ 26.950964] i40e 0000:33:00.1: unidentified MAC or BLANK NVM: -11
Jul 15 15:18:05 usable-sloth kernel: [ 26.951057] i40e: probe of 0000:33:00.1 failed with error -11
Jul 15 15:18:05 usable-sloth kernel: [ 26.973459] i40e 0000:33:00.2: fw 7.2.60285 api 1.9 nvm 7.21 0x80007959 1.2585.0 [8086:104e] [15d9:0000]
Jul 15 15:18:05 usable-sloth kernel: [ 27.297613] i40e 0000:33:00.2: MAC address: 3c:ec:ef:3f:b2:16
Jul 15 15:18:05 usable-sloth kernel: [ 27.298878] i40e 0000:33:00.2: FW LLDP is enabled
Jul 15 15:18:05 usable-sloth kernel: [ 27.314610] i40e 0000:33:00.2 eth0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jul 15 15:18:05 usable-sloth kernel: [ 27.344689] i40e 0000:33:00.2: PCI-Express: Speed 8.0GT/s Width x8
Jul 15 15:18:05 usable-sloth kernel: [ 27.353837] i40e 0000:33:00.2: Features: PF-id[2] VFs: 32 VSIs: 34 QP: 119 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
Jul 15 15:18:05 usable-sloth kernel: [ 27.375077] i40e 0000:33:00.3: fw 7.2.60285 api 1.9 nvm 7.21 0x80007959 1.2585.0 [8086:104e] [15d9:0000]
Jul 15 15:18:05 usable-sloth kernel: [ 27.627129] i40e 0000:33:00.3: MAC address: 3c:ec:ef:3f:b2:17
Jul 15 15:18:05 usable-sloth kernel: [ 27.627800] i40e 0000:33:00.3: FW LLDP is enabled
Jul 15 15:18:05 usable-sloth kernel: [ 27.628476] i40e 0000:33:00.3: Query for DCB configuration failed, err I40E_ERR_NOT_READY aq_err OK
Jul 15 15:18:05 usable-sloth kernel: [ 27.628955] i40e 0000:33:00.3: DCB init failed -63, disabled
Jul 15 15:18:05 usable-sloth kernel: [ 27.639670] i40e 0000:33:00.3 eth1: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jul 15 15:18:05 usable-sloth kernel: [ 27.659332] i40e 0000:33:00.3: PCI-Express: Speed 8.0GT/s Width x8
Jul 15 15:18:05 usable-sloth kernel: [ 27.670480] i40e 0000:33:00.3: Features: PF-id[3] VFs: 32 VSIs: 34 QP: 119 RSS FD_ATR FD_SB NTUPLE VxLAN Geneve PTP VEPA
Jul 15 15:18:05 usable-sloth kernel: [ 27.672306] i40e 0000:33:00.2 eno3: renamed from eth0
Jul 15 15:18:05 usable-sloth kernel: [ 27.702841] i40e 0000:33:00.3 eno4: renamed from eth1

Revision history for this message
Jeff Lane  (bladernr) wrote :

All the necessary logs should be in the attached sosreport, if not, I can ask them to also run apport-collect, but that seemed unnecessary.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Focal):
status: Incomplete → Confirmed
summary: - 5.4 is not identifying all ports on Intex x710-TM4 10GbE controller
+ 5.4 is not identifying all ports on Intel x710-TM4 10GbE controller
Alex Hung (alexhung)
description: updated
Revision history for this message
Jeff Lane  (bladernr) wrote : Re: 5.4 is not identifying all ports on Intel x710-TM4 10GbE controller

Further update,

According to the engineer at SMC, they did try "18.04.3" (not sure if that means 4.15, or 5.3, hopefully not 5.2) and all four ports did work for them, so this appears to be a regression in the i40e driver.

I've asked him to provide more info about what kernels they have tried specifically to help narrow down where it breaks.

Revision history for this message
Jeff Lane  (bladernr) wrote :

An additional update, according to the person who reported this to me, the last known working kernel is 5.3:

$lsb-release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic

ubuntu@Server:~$ uname -a

Linux MyServer 5.3.0-62-generic #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux0

So at least we know now that this breaks somewhere between 5.3 and 5.4.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Tester updated i40e driver to version 2.11.29
and says that he can now see all four ports.

So that would mean there's a difference between the version in 5.3, 2.8.20 in focal (The broken version), and 2.11.29 from Intel which works again.

summary: - 5.4 is not identifying all ports on Intel x710-TM4 10GbE controller
+ [Regression] 5.4 is not identifying all ports on Intel x710-TM4 10GbE
+ controller
Revision history for this message
Jeff Lane  (bladernr) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

Asked them to start bisecting the driver versions from Intel to figure out the earliest update that resolves the issue.

Revision history for this message
Jeff Lane  (bladernr) wrote :

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/drivers/net/ethernet/intel/i40e?id=3df5b9a6a9ec3c1e4431bf1db3426b54dc92dd91

This seems to be the commit missing in 5.4. It landed in 5.5.

cherry pick incoming with a test kernel (hopefully) to see what happens.

Changed in linux (Ubuntu Focal):
importance: Undecided → Critical
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Focal):
assignee: nobody → Jeff Lane (bladernr)
Changed in linux (Ubuntu):
assignee: nobody → Jeff Lane (bladernr)
Jeff Lane  (bladernr)
description: updated
Changed in linux (Ubuntu Focal):
importance: Critical → High
Revision history for this message
Jeff Lane  (bladernr) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

Per tester, the patched kernel does not resolve the issue.

At least we know the diff.

Also, did some bisecting with the Intel drivers upstream and this is the earliest version where the card starts working again:

 2.10.19.82

So it breaks sometime around 2.8.20 and then starts working again around 2.10.19.82

Revision history for this message
Agecon Support (are-support) wrote :

For what it's worth, this doesn't just affect the X710-TM4 controller. The X710-T2L card is rendered completely useless by this bug as that card only has copper 10 GbE ports. As with the X710-TM4, the latest version of the Intel driver fixes the issue.

Are there plans to release a fixed i40e driver for Focal soon? It looks like https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1890988 is another duplicate of this bug as well.

Jeff Lane  (bladernr)
tags: removed: blocks-hwcert-server
Revision history for this message
thomas955 (thoehlig) wrote :
Download full text (3.9 KiB)

We also got new servers with the X710-T2L. The default i40e driver is not working with the current kernel version:

Linux someServer 4.15.0-135-generic #139-Ubuntu SMP Mon Jan 18 17:38:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

"modinfo i40e" deliverd with the kernel:

modinfo i40e
filename: /lib/modules/4.15.0-135-generic/kernel/drivers/net/ethernet/intel/i40e/i40e.ko
version: 2.1.14-k
license: GPL
description: Intel(R) Ethernet Connection XL710 Network Driver
author: Intel Corporation, <email address hidden>
srcversion: 0C15843B4479E58A1E07114
alias: pci:v00008086d0000158Bsv*sd*bc*sc*i*
alias: pci:v00008086d0000158Asv*sd*bc*sc*i*
alias: pci:v00008086d00001588sv*sd*bc*sc*i*
alias: pci:v00008086d00001587sv*sd*bc*sc*i*
alias: pci:v00008086d000037D3sv*sd*bc*sc*i*
alias: pci:v00008086d000037D2sv*sd*bc*sc*i*
alias: pci:v00008086d000037D1sv*sd*bc*sc*i*
alias: pci:v00008086d000037D0sv*sd*bc*sc*i*
alias: pci:v00008086d000037CFsv*sd*bc*sc*i*
alias: pci:v00008086d000037CEsv*sd*bc*sc*i*
alias: pci:v00008086d00001589sv*sd*bc*sc*i*
alias: pci:v00008086d00001586sv*sd*bc*sc*i*
alias: pci:v00008086d00001585sv*sd*bc*sc*i*
alias: pci:v00008086d00001584sv*sd*bc*sc*i*
alias: pci:v00008086d00001583sv*sd*bc*sc*i*
alias: pci:v00008086d00001581sv*sd*bc*sc*i*
alias: pci:v00008086d00001580sv*sd*bc*sc*i*
alias: pci:v00008086d00001574sv*sd*bc*sc*i*
alias: pci:v00008086d00001572sv*sd*bc*sc*i*
depends: ptp
retpoline: Y
intree: Y
name: i40e
vermagic: 4.15.0-135-generic SMP mod_unload
signat: PKCS#7
signer:
sig_key:
sig_hashalgo: md4
parm: debug:Debug level (0=none,...,16=all), Debug mask (0x8XXXXXXX) (uint)

I compiled the driver by myself with version i40e-2.14.13 and with this one the card is working.

"modinfo i40e" self build:

modinfo i40e
filename: /lib/modules/4.15.0-135-generic/updates/drivers/net/ethernet/intel/i40e/i40e.ko
version: 2.14.13
license: GPL
description: Intel(R) 40-10 Gigabit Ethernet Connection Network Driver
author: Intel Corporation, <email address hidden>
srcversion: 1A821F3D488396B967F338E
alias: pci:v00008086d0000158Bsv*sd*bc*sc*i*
alias: pci:v00008086d0000158Asv*sd*bc*sc*i*
alias: pci:v00008086d000037D3sv*sd*bc*sc*i*
alias: pci:v00008086d000037D2sv*sd*bc*sc*i*
alias: pci:v00008086d000037D1sv*sd*bc*sc*i*
alias: pci:v00008086d000037D0sv*sd*bc*sc*i*
alias: pci:v00008086d000037CFsv*sd*bc*sc*i*
alias: pci:v00008086d000037CEsv*sd*bc*sc*i*
alias: pci:v00008086d00000D58sv*sd*bc*sc*i*
alias: pci:v00008086d00000CF8sv*sd*bc*sc*i*
alias: pci:v00008086d00001588sv*sd*bc*sc*i*
alias: pci:v00008086d00001587sv*sd*bc*sc*i*
alias: pci:v00008086d0000104Fsv*sd*bc*sc*i*
alias: pci:v00008086d0000104Esv*sd*bc*sc*i*
alias: pci:v00008086d000015FFsv*sd*bc*sc*i*
alias: pci:v00008086d000015...

Read more...

Revision history for this message
J (lemonkoala) wrote :

This appears to still be an issue on the latest 5.4.0-99-generic kernel.

I'm using a 2 port 10GSFP+ card. The PCI driver isn't bound according to lspci, and the interfaces don't show up in dmesg or "ip link".

# lspci -nnk
31:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 [8086:37cc] (rev 09)
 DeviceName: 10Gb Ethernet1
 Subsystem: Lenovo Ethernet Connection X722 [17aa:4021]

# modinfo i40e
...
version: 2.8.20-k
description: Intel(R) Ethernet Connection XL710 Network Driver
...
vermagic: 5.4.0-99-generic SMP mod_unload modversions

Strangely the ports don't even come up with the latest 5.13.0-28 kernel.

# modinfo i40e
(no version)
vermagic: 5.13.0-28-generic SMP mod_unload modversions

The card works perfectly in another distribution with the i40e driver, but that's running a 4.18 kernel.

Revision history for this message
Jeff Lane  (bladernr) wrote :

@thomas955 - the i40 driver in 4.15 will not be updated. You may want to try the 5.4 HWE kernel for 18.04 which I think should include the patches to enable the x710-T2L. We only pulled the support back as far as 5.4.

@lemonkoala the x722 is a different card from the x710, and it's likely additional patches are necessary to enable that version of the 700 series card. Please file a new bug for your issue (and feel free to add a comment here with the bug link and I'll see if I can work on it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.