Predictable interface names partially broken with igb driver

Bug #1578141 reported by Dr. Jens Harbott on 2016-05-04
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
systemd (Ubuntu)
Medium
Unassigned

Bug Description

Note: I'm not sure whether this is really a kernel bug or something within systemd/udev, please advise how to further debug.

On a system with two GE ports, instead of getting named eno1 and eno2, I am getting eno1 and renameN. Where N starts at 3 and increases by 2 on every iteration of doing "rmmod igb;modprobe igb". The corresponding lines in dmesg look like this:

[ 2.748429] igb 0000:07:00.0: added PHC on eth0
[ 2.748431] igb 0000:07:00.0: Intel(R) Gigabit Ethernet Network Connection
[ 2.748433] igb 0000:07:00.0: eth0: (PCIe:5.0Gb/s:Width x4) 00:25:90:d7:60:8e
[ 2.748505] igb 0000:07:00.0: eth0: PBA No: 106100-000
[ 2.748506] igb 0000:07:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
[ 2.802555] igb 0000:07:00.1: added PHC on eth1
[ 2.802557] igb 0000:07:00.1: Intel(R) Gigabit Ethernet Network Connection
[ 2.802559] igb 0000:07:00.1: eth1: (PCIe:5.0Gb/s:Width x4) 00:25:90:d7:60:8f
[ 2.802631] igb 0000:07:00.1: eth1: PBA No: 106100-000
[ 2.802632] igb 0000:07:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
[ 2.803618] igb 0000:07:00.0 eno1: renamed from eth0
[ 2.833208] igb 0000:07:00.1 rename3: renamed from eth1

What is even worse: Sometimes the naming changes and the second interface ends up as eno1 while the first interface is named renameN with an even N. The bad thing about this version is that when it happens while the installer is running, the installer will setup "rename2" as the primary network interface, which works fine for the installation itself, but after installation is finished and the first boot of the installed system happens, that interface will be gone, leaving the system without network connectivity.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-21-generic 4.4.0-21.37
ProcVersionSignature: Ubuntu 4.4.0-21.37-generic 4.4.6
Uname: Linux 4.4.0-21-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 4 09:48 seq
 crw-rw---- 1 root audio 116, 33 May 4 09:48 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Wed May 4 10:00:39 2016
HibernationDevice: RESUME=/dev/mapper/compute--node37--vg-swap
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 0557:2221 ATEN International Co., Ltd Winbond Hermon
 Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Supermicro X9DRT-HF+
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_US:
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US
 SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-21-generic root=/dev/mapper/compute--node37--vg-root ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-21-generic N/A
 linux-backports-modules-4.4.0-21-generic N/A
 linux-firmware 1.157
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/21/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3.0c
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: X9DRT-HF+
dmi.board.vendor: Supermicro
dmi.board.version: 0123456789
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3.0c:bd05/21/2014:svnSupermicro:pnX9DRT-HF+:pvr0123456789:rvnSupermicro:rnX9DRT-HF+:rvr0123456789:cvnSupermicro:ct17:cvr0123456789:
dmi.product.name: X9DRT-HF+
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

Dr. Jens Harbott (j-harbott) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.6 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-rc6-wily/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Dr. Jens Harbott (j-harbott) wrote :

We have seen the issue only when deploying Xenial, installations running Precise or Trusty do not seem to be affected.

Running with 4.6.0-040600rc6-generic has shown the same behaviour.

After doing the rmmod/modprobe cycle, I have now found the attached set of messages in the output of "journalctl -x", so this seems to confirm my suspicion of some bad interaction between kernel and systemd going on.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Dr. Jens Harbott (j-harbott) wrote :

Looking further, it seems that the BIOS is providing broken information, see this snippet from dmidecode output:

Handle 0x007E, DMI type 41, 11 bytes
Onboard Device
        Reference Designation: Onboard Intel Ethernet 1
        Type: Ethernet
        Status: Enabled
        Type Instance: 1
        Bus Address: 0000:07:00.0

Handle 0x007F, DMI type 41, 11 bytes
Onboard Device
        Reference Designation: Onboard Intel Ethernet 2
        Type: Ethernet
        Status: Enabled
        Type Instance: 1
        Bus Address: 0000:07:00.1

and as a result we get ID_NET_NAME_ONBOARD=eno1 for both devices. So udev tries to rename both interfaces to eno1, only one succeeds, the other one fails due to a name collision. Would be nice to implement a workaround for these broken BIOS data.

Martin Pitt (pitti) wrote :

To fix your system locally you can set some custom names for these two devices, or disable the ifnames schema completely. Please see /usr/share/doc/udev/README.Debian.gz for details.

Can you please copy&paste the output of

   SYSTEMD_LOG_LEVEL=debug udevadm test-builtin net_id /sys/class/net/eno1

and again for the other /sys/class/net/rename* device?

Dr. Jens Harbott (j-harbott) wrote :

@Martin: Fixing the installed system is easy, the bad case happens when the installer sets up the system with "rename2" as the first interface, making access via console necessary in order to recover.

root@compute-node37:~# SYSTEMD_LOG_LEVEL=debug udevadm test-builtin net_id /sys/class/net/eno1
calling: test-builtin
=== trie on-disk ===
tool version: 229
file size: 6841778 bytes
header size 80 bytes
strings 1755242 bytes
nodes 5086456 bytes
Load module index
timestamp of '/etc/systemd/network' changed
timestamp of '/lib/systemd/network' changed
Parsed configuration file /lib/systemd/network/99-default.link
Parsed configuration file /lib/systemd/network/90-mac-for-usb.link
Created link configuration context.
ID_NET_NAME_MAC=enx002590d8975a
ID_OUI_FROM_DATABASE=Super Micro Computer, Inc.
ID_NET_NAME_ONBOARD=eno1
ID_NET_NAME_PATH=enp7s0f0
Unload module index
Unloaded link configuration context.
root@compute-node37:~# SYSTEMD_LOG_LEVEL=debug udevadm test-builtin net_id /sys/class/net/rename3
calling: test-builtin
=== trie on-disk ===
tool version: 229
file size: 6841778 bytes
header size 80 bytes
strings 1755242 bytes
nodes 5086456 bytes
Load module index
timestamp of '/etc/systemd/network' changed
timestamp of '/lib/systemd/network' changed
Parsed configuration file /lib/systemd/network/99-default.link
Parsed configuration file /lib/systemd/network/90-mac-for-usb.link
Created link configuration context.
ID_NET_NAME_MAC=enx002590d8975b
ID_OUI_FROM_DATABASE=Super Micro Computer, Inc.
ID_NET_NAME_ONBOARD=eno1
ID_NET_NAME_PATH=enp7s0f1
Unload module index
Unloaded link configuration context.

Also note that the whole renaming process seems to come from an Ubuntu specific patch in Revert-udev-network-device-renaming-immediately-give.patch, IIUC plain systemd would fail the renaming process and end up with either (eno1, eth1) or (eth0, eno1) as interface tuples.

If there was a way to get rid of the race condition and make sure that the system always ends up with the same tuple, that would already be a large step forward.

Anthony Carlson (pdxvoyd) wrote :

I recently ran into this while upgrading 76 systems. In my research I have concluded that this bug is the result of bad information coming from the BIOS 3.0b, 3.0c, and 3.2. This bug is not present with BIOS 3.00.

If you run biosdevname, you will see that the BIOS Device name is em1 for both of the intel i350 interfaces resulting (as mentioned above by Dr. Jens Rosenboom (j-rosenboom-j) in dracut successfully naming the first device to initialize as eno1. If you reboot the system and check udevadm, you can see that the interfaces can alternate in which port is initialized first resulting in eno1 moving from port1 to port2. This is causing problems primarily with bonding our interfaces.

sudo biosdevname -d
BIOS device: em1
Kernel name: eno1
Permanent MAC: 00:00:00:00:00:00
Assigned MAC : 00:00:00:00:00:00
ifIndex: 2
Driver: igb
Driver version: 5.2.15-k
Firmware version: 1.61, 0x8000090e
Bus Info: 0000:06:00.0
PCI name : 0000:06:00.0
PCI Slot : embedded
SMBIOS Device Type: Ethernet
SMBIOS Instance: 1
SMBIOS Label: Onboard Intel Ethernet 1
sysfs Index: 1
sysfs Label: Onboard Intel Ethernet 1
Embedded Index: 1

Duplicate: True
BIOS device: em1
Kernel name: eth1
Permanent MAC: 00:00:00:00:00:00
Assigned MAC : 00:00:00:00:00:00
ifIndex: 3
Driver: igb
Driver version: 5.2.15-k
Firmware version: 1.61, 0x8000090e
Bus Info: 0000:06:00.1
PCI name : 0000:06:00.1
PCI Slot : embedded
SMBIOS Device Type: Ethernet
SMBIOS Instance: 1
SMBIOS Label: Onboard Intel Ethernet 2
sysfs Index: 1
sysfs Label: Onboard Intel Ethernet 2
Embedded Index: 2

Duplicate: True

I have submitted my findings for all of my nodes to Supermicro seeking a bug fix and releasing an update BIOS.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Spike (spike-4) wrote :

Dear all,

I have a supermicro X9DRI-4LNF and experiencing the same issue. @pdxvoyd mentioned this does not happen on BIOS 3.0 but it does for us.

Running latest xenial server with kernel 4.4.0-66.

sudo biosdevname -d | grep device
BIOS device: em1
BIOS device: em1
BIOS device: em1
BIOS device: em1

So confirming the problem with conflicting info coming from the BIOS.

@pdxvoyd any news on your report to SM? been 8 months since your comment. I've open a ticket myself just in case, would love to get this resolved.

best,

Spike

Spike (spike-4) wrote :

quick update, fwiw I just got an updated bios from a SM tech and it worked:

$ sudo biosdevname -d | grep "BIOS dev"
BIOS device: em1
BIOS device: em2
BIOS device: em3
BIOS device: em4

The bios's reported version from the BIOS screen is 3.2, which @pdxvoyd said he had problems with, but I guess for me it's the complete opposite since I had probs on 3.0 which he said worked for him and 3.2 fixes it for me which didn't work for him.

In any case something should probably happen with systemd to handle clashing names because these are servers, not desktop boxes, and automation is pretty much impossible if network interfaces fail to come back.

best,

Spike

Changed in systemd (Ubuntu):
importance: Undecided → Medium
Wladimir Mutel (mwg) wrote :

I observe the same effect on Asus P10S-M WS motherboard with Ubuntu 18.04, kernel 4.15.0-22-generic

(from lshw :)
product: P10S-M WS Series
vendor: ASUSTeK COMPUTER INC.
version: Rev 1.xx

description: BIOS
vendor: American Megatrends Inc.
version: 4401
date: 03/05/2018

'biosdevname -d' prints both (2) onboard interfaces with BIOS device: em1

BIOS device: em1
Kernel name: eno1
Permanent MAC: 2C:FD:A1:C6:F3:FA
Assigned MAC : 2C:FD:A1:C6:F3:FA
ifIndex: 2
Driver: igb
Driver version: 5.4.0-k
Firmware version: 3.16, 0x800004d7
Bus Info: 0000:01:00.0
PCI name : 0000:01:00.0
PCI Slot : embedded
SMBIOS Device Type: Ethernet
SMBIOS Instance: 1
SMBIOS Label: Onboard LAN
sysfs Index: 1
sysfs Label: Onboard LAN
Embedded Index: 1

Duplicate: True
BIOS device: em1
Kernel name: rename3
Permanent MAC: 2C:FD:A1:C6:F3:FB
Assigned MAC : 2C:FD:A1:C6:F3:FB
ifIndex: 3
Driver: igb
Driver version: 5.4.0-k
Firmware version: 3.16, 0x80000513
Bus Info: 0000:02:00.0
PCI name : 0000:02:00.0
PCI Slot : embedded
SMBIOS Device Type: Other
SMBIOS Instance: 1
SMBIOS Label: Onboard 1394
sysfs Index: 1
sysfs Label: Onboard 1394
Embedded Index: 2

Duplicate: True

so, this is not limited to SUpermicro boards only

Chiva (srchiva) wrote :

Just talked to a SuperMicro tech agent and has sent me a non released BIOS update for the X9DRI-4LNF motherboard, which fixes Spectre and several other fixes, including the enumeration problem of the ethernet jacks of this board.

This BIOS update is v3.3 and dated from May 2018, so pretty recent, but as the board is EOL there won't be any further updates and this one will be the most up to date.

You can find the BIOS update file here:
https://drive.google.com/open?id=1PfWZXd-0DWuJ6MJh4OneeHdD1FLFXmx8

In case you don't trust me, please ask the SM tech guy to send you the latest BIOS update, in my case it was "x9dr3p8.523.zip"

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers