udev produces unpredictable net names when PCI device is a bridge
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
systemd (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
dann frazier | ||
Hirsute |
Fix Released
|
Undecided
|
Unassigned | ||
Impish |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
udev can produce unpredictable network interface names by default when multiple devices map to the same slot due to an intermediate bridge. On an Nvidia DGX2 system, I see the following when booting a system with udev 245.4-4ubuntu3.13:
ubuntu@akis:~$ ls /sys/class/net
enp134s0f0 enp6s0 ens103 ens107 eth3 eth9
enp134s0f1 ens102 ens106 eth1 eth7 lo
For each ens* device, there is a sibling eth* device that maps to the same slot because both devices are behind the same bridge.
Unpredictable names present well known problems, but I'll describe a specific issue I'm having. We currently do automated network testing that MAAS deploys a system and then configures 2 specific NICs on the system. While MAAS does take care to always restore the names used during commissioning (eth3 will always be the same NIC on every deploy), these names can change each time the system is commissioned. So today we need to go in and edit the NIC names manually in MAAS any time the system is re-commissioned.
[Test Case]
Boot with kernel option net.naming-
[Fix]
This issue was addressed upstream by adding a new v247 naming scheme that detects this scenario and disables usage of slot-based names for these devices. Obviously changing the default naming scheme in a released LTS series could break users. However, we could introduce the v247 scheme in a focal SRU, and keep the default scheme of v245 (via -Ddefault-
[Regression Risk]
This would change the behavior of any users who select net.naming-
Related branches
- Canonical Foundations Team: Pending requested
-
Diff: 396 lines (+331/-1)8 files modifieddebian/changelog (+21/-0)
debian/patches/lp1945225/0001-udev-net_id-parse-_SUN-ACPI-index-as-a-signed-intege.patch (+47/-0)
debian/patches/lp1945225/0002-udev-net_id-don-t-generate-slot-based-names-if-multi.patch (+126/-0)
debian/patches/lp1945225/0003-net_id-fix-newly-added-naming-scheme-name.patch (+65/-0)
debian/patches/lp1945225/0004-Add-remaining-supported-schemes-as-options-for-defau.patch (+25/-0)
debian/patches/series (+5/-0)
debian/patches/test-make-test-execute-pass-on-Linux-5.15.patch (+40/-0)
debian/rules (+2/-1)
- Canonical Foundations Team: Pending requested
- Dimitri John Ledkov: Pending requested
-
Diff: 313 lines (+269/-1)6 files modifieddebian/patches/lp1945225/0001-udev-net_id-parse-_SUN-ACPI-index-as-a-signed-intege.patch (+47/-0)
debian/patches/lp1945225/0002-udev-net_id-don-t-generate-slot-based-names-if-multi.patch (+126/-0)
debian/patches/lp1945225/0003-net_id-fix-newly-added-naming-scheme-name.patch (+65/-0)
debian/patches/lp1945225/0004-Add-remaining-supported-schemes-as-options-for-defau.patch (+25/-0)
debian/patches/series (+4/-0)
debian/rules (+2/-1)
description: | updated |
Changed in systemd (Ubuntu Impish): | |
status: | New → Fix Released |
Changed in systemd (Ubuntu Hirsute): | |
status: | New → Fix Released |
Changed in systemd (Ubuntu Focal): | |
status: | New → In Progress |
assignee: | nobody → dann frazier (dannf) |
description: | updated |
tags: | added: rls-ff-incoming |
> udev can produce unpredictable network interface names by default when multiple devices map to the same slot due to an intermediate bridge.
so, if I understand it right, the MR won't actually fix this for anyone without additional per-system work, right? specifically, any system with this problem will need to also add the 'net.naming- scheme= latest' boot parameter (or set it via systemd-udevd env var).
If that's the case, then it seems like a much simpler manual workaround for this would be to just avoid slot naming for the problematic nics, for example by dropping a link file into /etc/systemd/ network/ with content like:
[Match]
Driver="whatever driver the DGX nics use, or use some other specific match"
[Link] sPolicy= database onboard path y=persistent
NamePolicy=keep kernel database onboard path
AlternativeName
MACAddressPolic
essentially, override the 99-default.link to remove 'slot' naming.
> While MAAS does take care to always restore the names used during commissioning (eth3 will always be the same NIC on every deploy), these names can change each time the system is commissioned.
if the only change needed is during maas comissioning, that seems like the perfect time to use a link file to override the specific problematic nics by whatever matching logic is best.