udev NIC renaming race with mlx5_core driver
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
systemd (Ubuntu) |
Fix Released
|
Undecided
|
Nick Rosbrook | ||
Focal |
Fix Released
|
Medium
|
Nick Rosbrook | ||
Jammy |
Fix Released
|
Medium
|
Nick Rosbrook | ||
Kinetic |
Fix Released
|
Undecided
|
Nick Rosbrook | ||
Lunar |
Fix Released
|
Undecided
|
Nick Rosbrook |
Bug Description
[Impact]
On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core driver's own configuration of subordinate interfaces. When the kernel wins this race, the device cannot be renamed as udev has attempted, and this causes systemd-
[Test Plan]
Repeated launches of Standard_D8ds_v5 instance types will generally hit this race around 1 in 10 runs. Create a vm snapshot with updated systemd from ppa:enr0n/
To check for failure symptom:
- Assert that network-
To assert success condition during net rename busy race:
- assert when "eth1" is still the primary device name, that two altnames are listed (preserving the altname due to the primary NIC rename being hit).
Sample script uses pycloudlib to create modified base image for test and launches 100 VMs of type Standard_D8ds_v5, counting both successes and any failures seen.
#!/usr/bin/env python3
# This file is part of pycloudlib. See LICENSE file for license information.
"""Basic examples of various lifecycle with an Azure instance."""
import json
import logging
import os
import sys
from enum import Enum
import pycloudlib
LOG = logging.getLogger()
base_cfg = """#cloud-config
ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
"""
# source: "deb [allow-
# - apt install systemd udev -y --allow-
apt_cfg = """
# Add developer PPA
apt:
sources:
systemd-testing:
source: {source}
# upgrade systemd after cloud-init is nearly done
runcmd:
- apt install systemd udev -y --allow-
"""
debug_systemd_cfg = """
# Create systemd-udev debug override.conf in base image
write_files:
- path: /etc/systemd/
owner: root:root
defer: {defer}
content: |
[Service]
Environment
- path: /etc/systemd/
owner: root:root
defer: {defer}
content: |
[Service]
Environment
LogRateLimi
"""
cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
cloud_config2 = base_cfg + debug_systemd_cfg
class BootCondition(
SUCCESS_
SUCCESS_
ERROR_
def batch_launch_vm(
client, instance_type, image_id, user_data, instance_count=5
):
instances = []
while len(instances) < instance_count:
)
)
return instances
def get_boot_
blame = instance.
try:
LOG.info(
f"--- Attempt {test_idx} ssh ubuntu@
)
except IndexError:
blame = [""]
altnames_
ip_addr = json.loads(
rename_
for d in ip_addr:
if d["ifname"] == "eth1":
if len(d.get(
)
else:
)
)
LOG.info(
)
if "systemd-
if rename_
return BootCondition.
else:
return (
)
else:
LOG.info(
f"--- Attempt {attempt} found delayed instance boot: {blame[0]}: ssh ubuntu@
)
r = instance.execute(
)
LOG.info(r)
if "Failure to rename" in str(r):
return BootCondition.
def debug_systemd_
release=
):
"""Test overlake v5 timeouts
test procedure:
- Launch base jammy image
- enable ppa:enr0n/
- cloud-init clean --logs && deconfigure waalinux agent before shutdown
- snapshot a base image
- launch v5 system from snapshot
- check systemd-analyze for expected timeout
"""
apt_source = (
'"deb http://
)
if with_ppa:
apt_source = '"deb [allow-
ppas = {
}
apt_source = apt_source.
client = pycloudlib.
image_id = client.
pub_path = "/home/
priv_path = "/home/
client.
base_instance = client.launch(
)
LOG.info(f"base instance: ssh ubuntu@
base_
LOG.
snapshotted
reproducer = False
success_
success_
failure_
failure_
tests_launched = 0
TEST_
----- Test run complete: {tests_launched} attempted -----
Successes without rename race: {success_
Successes with rename race and preserved altname: {success_
Failures due to network delay: {failure_
Failures due to no altnames persisted: {failure_
===
"""
instances = [base_instance]
for batch_count in [10] * 10:
)
for test_idx, instance in enumerate(
)
if boot_condition == BootCondition.
if not altnames_persisted:
elif boot_condition == BootCondition.
if not altnames_persisted:
elif boot_condition == BootCondition.
if not altnames_persisted:
else:
LOG.info(
)
)
base_
if __name__ == "__main__":
# Avoid polluting the log with azure info
logging.
logging.
logging.
logging.
release = "jammy" if len(sys.argv) < 2 else sys.argv[1]
with_ppa = os.environ.
prefix = "ppa" if with_ppa else "sru"
logging.
)
debug_
[Where problems could occur]
The patches effectively make it so that if a interface cannot be renamed from udev, then the new name is left as an alternative name as a fallback. If problems occur, it would be related to device renaming, and particularly related to the devices alternative names.
For Jammy and Kinetic, there are additional patches in udev. These patches clean up/revert device properties that were changed as a part of the rename attempt. If there were regressions due to these patches, we would likely see erroneous device properties (e.g. shown by udevadm info) on network devices after a rename failure.
Related branches
- Lukas Märdian: Approve
-
Diff: 351 lines (+311/-0)6 files modifieddebian/changelog (+12/-0)
debian/patches/lp2002445/core-device-ignore-failed-uevents.patch (+51/-0)
debian/patches/lp2002445/sd-device-introduce-device_get_property_int.patch (+56/-0)
debian/patches/lp2002445/sd-device-make-device_set_syspath-clear-sysname-and-sysnu.patch (+29/-0)
debian/patches/lp2002445/udev-restore-syspath-and-properties-on-failure.patch (+159/-0)
debian/patches/series (+4/-0)
- Lukas Märdian: Approve
-
Diff: 363 lines (+323/-0)6 files modifieddebian/changelog (+12/-0)
debian/patches/lp2002445/core-device-ignore-failed-uevents.patch (+43/-0)
debian/patches/lp2002445/sd-device-introduce-device_get_property_int.patch (+56/-0)
debian/patches/lp2002445/sd-device-make-device_set_syspath-clear-sysname-and-sysnu.patch (+45/-0)
debian/patches/lp2002445/udev-restore-syspath-and-properties-on-failure.patch (+163/-0)
debian/patches/series (+4/-0)
- Lukas Märdian: Approve
-
Diff: 360 lines (+320/-0)6 files modifieddebian/changelog (+12/-0)
debian/patches/lp2002445/core-device-ignore-failed-uevents.patch (+43/-0)
debian/patches/lp2002445/sd-device-introduce-device_get_property_int.patch (+55/-0)
debian/patches/lp2002445/sd-device-make-device_set_syspath-clear-sysname-and-sysnu.patch (+45/-0)
debian/patches/lp2002445/udev-restore-syspath-and-properties-on-failure.patch (+161/-0)
debian/patches/series (+4/-0)
- Lukas Märdian: Approve
-
Diff: 508 lines (+444/-10)5 files modifieddebian/changelog (+18/-6)
debian/patches/CVE-2022-3821.patch (+37/-0)
debian/patches/CVE-2022-4415.patch (+386/-0)
debian/patches/series (+2/-0)
debian/tests/boot-and-services (+1/-4)
- Lukas Märdian: Approve
-
Diff: 983 lines (+913/-0)11 files modifieddebian/changelog (+31/-0)
debian/patches/CVE-2022-4415.patch (+380/-0)
debian/patches/CVE-2022-45873.patch (+115/-0)
debian/patches/backport-for-CVE-2022-45873.patch (+45/-0)
debian/patches/lp2002445/sd-netlink-add-a-test-for-rtnl_set_link_name.patch (+66/-0)
debian/patches/lp2002445/sd-netlink-do-not-swap-old-name-and-alternative-name.patch (+54/-0)
debian/patches/lp2002445/sd-netlink-restore-altname-on-error-in-rtnl_set_link_name.patch (+64/-0)
debian/patches/lp2002445/udev-attempt-device-rename-even-if-interface-is-up.patch (+61/-0)
debian/patches/lp2002445/udev-net-allow-new-link-name-as-an-altname-before-renamin.patch (+34/-0)
debian/patches/lp2004478-network-dhcp4-accept-local-subnet-routes-from-DHCP.patch (+54/-0)
debian/patches/series (+9/-0)
- Lukas Märdian: Approve
-
Diff: 506 lines (+442/-0)10 files modifieddebian/changelog (+22/-0)
debian/patches/lp2000880-network-create-stacked-netdevs-after-the-underlying-link-.patch (+33/-0)
debian/patches/lp2002445/sd-netlink-add-a-test-for-rtnl_set_link_name.patch (+81/-0)
debian/patches/lp2002445/sd-netlink-do-not-swap-old-name-and-alternative-name.patch (+54/-0)
debian/patches/lp2002445/sd-netlink-restore-altname-on-error-in-rtnl_set_link_name.patch (+64/-0)
debian/patches/lp2002445/udev-attempt-device-rename-even-if-interface-is-up.patch (+63/-0)
debian/patches/lp2002445/udev-net-allow-new-link-name-as-an-altname-before-renamin.patch (+36/-0)
debian/patches/lp2004478-network-dhcp4-accept-local-subnet-routes-from-DHCP.patch (+54/-0)
debian/patches/lp2009502-Enable-dev-sgx_vepc-access-for-the-group-sgx.patch (+27/-0)
debian/patches/series (+8/-0)
- Lukas Märdian: Approve
-
Diff: 409 lines (+346/-0)9 files modifieddebian/changelog (+20/-0)
debian/patches/lp1933090-test-seccomp-accept-ENOSYS-from-sysctl-2-too.patch (+25/-0)
debian/patches/lp2002445-netlink-do-not-fail-when-new-interface-name-is-already-us.patch (+50/-0)
debian/patches/lp2002445-netlink-introduce-rtnl_get-delete_link_alternative_names.patch (+102/-0)
debian/patches/lp2002445-sd-netlink-restore-altname-on-error-in-rtnl_set_link_name.patch (+64/-0)
debian/patches/lp2002445-udev-attempt-device-rename-even-if-interface-is-up.patch (+38/-0)
debian/patches/lp2002445-udev-net-allow-new-link-name-as-an-altname-before-renamin.patch (+36/-0)
debian/patches/series (+6/-0)
debian/tests/boot-and-services (+5/-0)
- Lukas Märdian: Approve
-
Diff: 5254 lines (+1754/-807)121 files modified.packit.yml (+5/-0)
debian/changelog (+105/-0)
debian/control (+4/-15)
debian/patches/Deny-list-TEST-74-AUX-UTILS-on-s390x.patch (+16/-0)
debian/patches/debian/Downgrade-a-couple-of-warnings-to-debug.patch (+3/-3)
debian/patches/debian/Make-run-lock-tmpfs-an-API-fs.patch (+2/-0)
debian/patches/debian/Revert-core-one-step-back-again-for-nspawn-we-actual.patch (+1/-1)
debian/patches/lp2002445-sd-netlink-add-a-test-for-rtnl_set_link_name.patch (+72/-0)
debian/patches/lp2002445-sd-netlink-do-not-swap-old-name-and-alternative-name.patch (+62/-0)
debian/patches/lp2002445-sd-netlink-restore-altname-on-error-in-rtnl_set_link_name.patch (+64/-0)
debian/patches/lp2002445-test-network-add-a-test-for-renaming-device-to-current-al.patch (+48/-0)
debian/patches/lp2002445-udev-attempt-device-rename-even-if-interface-is-up.patch (+63/-0)
debian/patches/lp2002445-udev-net-allow-new-link-name-as-an-altname-before-renamin.patch (+34/-0)
debian/patches/p11kit-switch-to-dlopen.patch (+3/-3)
debian/patches/series (+7/-3)
debian/rules (+1/-0)
debian/tests/boot-and-services (+2/-2)
debian/tests/control (+13/-8)
dev/null (+0/-323)
man/org.freedesktop.systemd1.xml (+6/-0)
man/systemd.mount.xml (+3/-1)
man/systemd.scope.xml (+2/-0)
man/systemd.service.xml (+9/-11)
src/basic/alloc-util.c (+4/-0)
src/basic/alloc-util.h (+29/-10)
src/basic/cgroup-util.c (+1/-1)
src/basic/hashmap.c (+1/-1)
src/basic/linux/README (+1/-0)
src/basic/linux/btrfs.h (+50/-12)
src/basic/linux/btrfs_tree.h (+240/-1)
src/basic/linux/genetlink.h (+3/-2)
src/basic/linux/if_bridge.h (+21/-0)
src/basic/linux/if_ether.h (+2/-0)
src/basic/linux/if_link.h (+16/-0)
src/basic/linux/if_macsec.h (+2/-0)
src/basic/linux/if_tun.h (+5/-1)
src/basic/linux/in.h (+9/-14)
src/basic/linux/l2tp.h (+0/-2)
src/basic/linux/netfilter/nf_tables.h (+29/-0)
src/basic/linux/netlink.h (+24/-7)
src/basic/linux/nl80211.h (+128/-7)
src/basic/linux/pkt_sched.h (+11/-0)
src/basic/linux/rtnetlink.h (+1/-1)
src/basic/linux/stddef.h (+46/-0)
src/basic/linux/update.sh (+1/-1)
src/basic/virt.c (+1/-1)
src/boot/efi/boot.c (+5/-2)
src/boot/efi/console.c (+0/-16)
src/boot/efi/cpio.c (+1/-1)
src/boot/efi/meson.build (+11/-2)
src/boot/efi/missing_efi.h (+0/-19)
src/boot/efi/secure-boot.c (+1/-1)
src/boot/efi/util.c (+5/-3)
src/busctl/busctl.c (+19/-2)
src/core/cgroup.c (+1/-1)
src/core/cgroup.h (+1/-0)
src/core/dbus-scope.c (+6/-0)
src/core/execute.c (+17/-0)
src/core/execute.h (+1/-0)
src/core/import-creds.c (+7/-0)
src/core/load-fragment-gperf.gperf.in (+1/-0)
src/core/mount.c (+18/-3)
src/core/scope.c (+20/-3)
src/core/scope.h (+2/-0)
src/core/slice.c (+3/-0)
src/core/swap.c (+1/-1)
src/core/unit.c (+1/-0)
src/cryptsetup/cryptsetup-fido2.c (+72/-57)
src/cryptsetup/cryptsetup-fido2.h (+24/-16)
src/cryptsetup/cryptsetup.c (+27/-42)
src/fundamental/macro-fundamental.h (+1/-0)
src/gpt-auto-generator/gpt-auto-generator.c (+5/-5)
src/import/curl-util.c (+4/-0)
src/import/pull-job.c (+5/-5)
src/journal-remote/microhttpd-util.h (+2/-2)
src/kernel-install/50-depmod.install (+2/-0)
src/libsystemd-network/sd-dhcp-client.c (+18/-20)
src/libsystemd-network/sd-dhcp-lease.c (+4/-4)
src/libsystemd-network/test-ndisc-ra.c (+6/-14)
src/libsystemd-network/test-ndisc-rs.c (+8/-13)
src/libsystemd/sd-device/test-sd-device.c (+8/-7)
src/libsystemd/sd-event/sd-event.c (+6/-1)
src/locale/localed.c (+8/-12)
src/login/logind-dbus.c (+6/-0)
src/network/netdev/l2tp-tunnel.c (+5/-5)
src/network/networkd-address.c (+5/-1)
src/network/networkd-ndisc.c (+11/-10)
src/network/networkd-route.c (+5/-1)
src/nspawn/nspawn-patch-uid.c (+3/-1)
src/partition/growfs.c (+6/-1)
src/resolve/resolvectl.c (+3/-3)
src/resolve/resolved-dns-scope.c (+2/-1)
src/resolve/resolved-dns-search-domain.c (+1/-1)
src/resolve/resolved-dns-server.h (+2/-2)
src/resolve/resolved-varlink.c (+2/-2)
src/shared/bootspec.c (+5/-3)
src/shared/bus-unit-util.c (+3/-0)
src/shared/creds-util.c (+16/-20)
src/shared/generator.c (+10/-1)
src/shared/install.c (+8/-3)
src/shared/install.h (+2/-2)
src/shared/mount-setup.c (+2/-0)
src/shared/sleep-config.c (+15/-17)
src/sleep/sleep.c (+6/-2)
src/test/test-execute.c (+3/-0)
src/test/test-unit-name.c (+3/-1)
src/tmpfiles/tmpfiles.c (+5/-2)
test/TEST-55-OOMD/test.sh (+6/-0)
test/fuzz/fuzz-unit-file/directives.scope (+1/-0)
test/test-functions (+16/-0)
test/test-network/conf/23-bond199.network (+0/-3)
test/test-network/systemd-networkd-tests.py (+19/-3)
test/test-shutdown.py (+1/-1)
test/units/testsuite-26.sh (+1/-1)
test/units/testsuite-55.sh (+3/-0)
test/units/testsuite-64.sh (+6/-5)
test/units/testsuite-65.sh (+10/-0)
test/units/testsuite-73.sh (+14/-3)
test/units/testsuite-74.firstboot.sh (+54/-15)
test/units/testsuite-75.sh (+22/-13)
units/systemd-userdbd.service.in (+1/-1)
description: | updated |
Changed in systemd (Ubuntu Focal): | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in systemd (Ubuntu Jammy): | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in systemd (Ubuntu Lunar): | |
status: | New → Fix Committed |
description: | updated |
description: | updated |
description: | updated |
Changed in systemd (Ubuntu Kinetic): | |
status: | New → Triaged |
description: | updated |
Changed in systemd (Ubuntu Lunar): | |
status: | In Progress → Fix Committed |
Changed in systemd (Ubuntu Focal): | |
assignee: | nobody → Mustafa Kemal Gilor (mustafakemalgilor) |
status: | Triaged → In Progress |
Changed in systemd (Ubuntu Focal): | |
assignee: | Mustafa Kemal Gilor (mustafakemalgilor) → Nick Rosbrook (enr0n) |
tags: |
added: verification-done-jammy verification-done-kinetic removed: verification-needed-jammy verification-needed-kinetic |
Fixed upstream in https:/ /github. com/systemd/ systemd/ pull/25221