Mellanox NIC interface names change between 5.4 and 5.8

Bug #1940860 reported by dann frazier
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
New
Undecided
Unassigned
debian-installer
New
Undecided
Unassigned
subiquity
New
Undecided
Unassigned
linux (Ubuntu)
Won't Fix
Undecided
Unassigned
systemd (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

I noticed on a couple of systems that my network interface names change when upgrading from the focal LTS (5.4) kernel to the focal HWE (both 5.8 & 5.11) kernels. Both systems have Mellanox Connect-X 5 NICs.

dannf@bizzy:~$ uname -a
Linux bizzy 5.4.0-81-generic #91-Ubuntu SMP Thu Jul 15 19:10:30 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
dannf@bizzy:~$ ls /sys/class/net
enp1s0f0 enp1s0f1 enx3e8734bc294f lo

dannf@bizzy:~$ uname -a
Linux bizzy 5.8.0-63-generic #71~20.04.1-Ubuntu SMP Thu Jul 15 17:46:44 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
dannf@bizzy:~$ ls /sys/class/net
enp1s0f0np0 enp1s0f1np1 enx3e8734bc294f lo

dannf@bizzy:~$ uname -a
Linux bizzy 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:08 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
dannf@bizzy:~$ ls /sys/class/net
enp1s0f0np0 enp1s0f1np1 enx3e8734bc294f lo

I bisected this down to a kernel change:
# first bad commit: [c6acd629eec754a9679f922d51f90e44c769b80c] net/mlx5e: Add support for devlink-port in non-representors mode

The impact is that your network can fail to come up after transitioning from the LTS kernel to the HWE kernel. Now, this isn't a huge problem for MAAS installs because MAAS configures netplan to always use the same names as were used at commissioning. It does impact subiquity based installs however, which do not.

dann frazier (dannf)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Dan Streetman (ddstreet) wrote :

systemd/udevd appears to be working exactly as advertised, using the phys_port_name when it's provided by the device's kernel driver; should this be marked invalid for systemd, or is there actually some change needed there?

Revision history for this message
dann frazier (dannf) wrote :

@ddstreet I agree that systemd is behaving as designed. But I'm not sure what the proper fix for this is, and therefore where changes would be required.

My initial thought is that perhaps subiquity installs should do what MAAS installs do and configure netplan to always use the install-time names. I've added a subiquity for that consideration. Note that if that is the chosen solution, we should be careful to ignore NICs with randomly generated MACs (see bug 1936972).

Revision history for this message
Dan Streetman (ddstreet) wrote :

> we should be careful to ignore NICs with randomly generated MACs (see bug 1936972).

ugh nics with LAA?

yeah, it can be hard to 'uniquely' identify a nic, especially since it's so common to clone macs for bonds, bridges, vlans, and in some cases even duplicate hw devices with the same nic (e.g. bug 1843381).

> My initial thought is that perhaps subiquity installs should do what MAAS installs do and configure netplan to always use the install-time names.

I'm generally not a fan of how MAAS configures netplan to force-rename interfaces; that sounds to me like it's destined for interface naming collisions. But I don't have any better immediate suggestion for 'foolproof' matching of all possible nics that might exist across all kernel driver versions, either.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I should say here that I don't really know what changes to subiquity are desirable here. I don't really like the idea of subiquity always using set-name, that just feels wrong, but I can see the problem here too.

Revision history for this message
dann frazier (dannf) wrote :

Understood, thanks for considering the issue. Perhaps this just needs to be release noted, warning users it may happen and how to avoid it (i.e. implement their own set-name config)?

Revision history for this message
dann frazier (dannf) wrote :

And here's some proposed text. I assume this would be applicable to the release notes from 20.04->22.04

= Known Issues =
== Network Interface Names ==
Ubuntu generates [predictable interface names](https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/) by default. These names are influenced by the information exposed by kernel drivers, and this can therefore vary from kernel release to kernel release. For example, Mellanox Connect-X 5 adapters are known to be assigned names such as enp1s0f0 with Linux 5.4, but be assigned a name like enp1s0f0np1 in Linux >= 5.8 (bug 1940860). If you find your system is impacted by such a name change after a kernel upgrade, you will need to update your network configuration files. If you would like to retain the same network interface names when switching between kernels, [netplan](https://netplan.io/reference/) provides a "set-name" field you can apply to your interface configuration. When set, this will cause Ubuntu to use the defined name instead of the default.

Changed in systemd (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Javier Diaz Jr (javierdiazcharles) wrote :

This also affects the ubuntu-installer when the `d-i base-installer/kernel/override-image string linux-generic-hwe-20.04` flag is added to preseed. The installer loads 5.4 kernel and configure NICs using the 5.4 naming convention, then once the 5.11 kernel is loaded and the host reboots the network config fails.

affects: linux → debian-installer
Revision history for this message
Javier Diaz Jr (javierdiazcharles) wrote :

Note that this is for the latest focal debian-installer. Since there is no hwe-netboot image the current focal installer is affected. I think an hwe-netboot image for focal would resolve this issue for the installer.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.