systemd-udevd regression: some renamed network interfaces stuck in "pending" state

Bug #1888726 reported by Basic on 2020-07-23
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
netplan.io (Ubuntu)
Undecided
Unassigned
Focal
Undecided
Unassigned
systemd (Ubuntu)
Undecided
Unassigned
Focal
Undecided
Unassigned

Bug Description

Summary:

In Ubuntu Server 20.04 LTS and newer, using netplan.io and systemd-networkd, in certain network configurations, renamed network interfaces get stuck in "pending" state and are not configured properly. On boot, system is stuck on "Wait for Network to be Configured" for 2 minutes.

How to reproduce:

1. Configure a machine (for example, a virtual machine) with two Ethernet network cards. Make note of MAC addresses of these network cards.
2. Set up netplan with a single configuration file, contents are in the attached "00-static.yaml" file. Replace the MAC addresses to match your setup. IP address configuration is omitted and is not necessary to reproduce the bug.
3. Reboot the system.

Expected outcome:

1. System boots in a reasonable time
2. First network interface (wan) is brought up, is not renamed, and is marked as configured by networkd
3. Second network interface (lan) is brought up, renamed, configured for MTU=9000, and is marked as configured by networkd
4. VLAN interface (vlan20) is brought up, renamed, configured for MTU=9000, and is marked as configured by networkd

Actual outcome:

1. System boot is delayed by 2 minutes
2. First network interface (wan) is configured as expected
3. Second network interface (lan) is configured as expected
4. VLAN interface (vlan20) seems to be configured as expected, but is stuck in "pending" state according to networkctl list

Test environment:

  Hardware:

  Virtual machine with the following configuration

  * 4 amd64 CPU cores (also tested with a single core)
  * 1 GB RAM
  * 8 GB disk
  * 2 network cards (vmxnet3 in VMware, virtio in Parallels)

  Working as expected:

  * [1] Ubuntu Server 19.10, kernel 5.3.0-62, netplan.io 0.99-0ubuntu3~19.10.2, systemd+udev 242-7ubuntu3.11

  Broken:

  * [2] Ubuntu Server 19.10, kernel 5.3.0-62, netplan.io 0.99-0ubuntu3~19.10.2, systemd 242-7ubuntu3.11, udev 245.4-4ubuntu3.1
  * [3] Ubuntu Server 20.04 LTS, kernel 5.4.0-42, netplan.io 0.99-0ubuntu3~20.04.2, systemd+udev 245.4-4ubuntu3.1
  * [4] Ubuntu Server 20.04 LTS, kernel 5.4.0-42, netplan.io 0.99-0ubuntu3~20.04.2, systemd+udev vanilla upstream git e9769453
  * [5] Ubuntu Server 20.04 LTS, kernel 5.4.0-42, netplan.io 0.99-0ubuntu3~20.04.2, systemd+udev vanilla upstream git v242
  * [6] Ubuntu Server 20.10 daily-live, kernel 5.4.0-26, netplan.io 0.99-0ubuntu5, systemd+udev 245.6-3ubuntu3

[2] As noted above, issue was reproduced in 19.10 by upgrading ONLY udev and libudev1 to ones shipped in focal-updates.

[4] It was also reproduced in vanilla upstream systemd, git master commit e9769453. Just installed on top of existing systemd using "sudo ninja -C build install".

[5] Interestingly enough, issue also seems to exist in vanilla v242. Either that, or the installation didn't replace the packaged systemd properly. This may hint at some distribution-specific patch that got removed before 20.04.

This issue was reproduced in VMware ESXi 6.7U3, VMware Fusion 11.5.5 and Parallels Desktop 15.1.4. This leads me to believe that network card drivers or virtualisation engines do not play part in the issue.

Extra observations:

To make the example configuration (00-static.yaml) not get stuck in "pending" state, any one of the following options helps:

* Remove "set-name" parameter for "lan" interface
* Remove "mtu" parameter for "lan" interface
* Remove "wan" interface entirely

I got some data/logs for each of these scenarios for eoan [1] and focal [3], as well as the original broken config, and put them together in the attached "parallels.tar.gz".

Note about Apport:

Attached apport report was generated for test environment [2] above.

Attachments:

* 00-static.yaml: minimalistic broken netplan configuration example
* parallels.tar.gz: various logs for eoan [1] and focal [3], as described in "Extra observations" above

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: udev 245.4-4ubuntu3.2
ProcVersionSignature: Ubuntu 5.3.0-62.56-generic 5.3.18
Uname: Linux 5.3.0-62-generic x86_64
ApportVersion: 2.20.11-0ubuntu8.9
Architecture: amd64
CustomUdevRuleFiles: 70-snap.core.rules
Date: Thu Jul 23 18:54:58 2020
InstallationDate: Installed on 2020-07-22 (1 days ago)
InstallationMedia: Ubuntu-Server 19.10 "Eoan Ermine" - Release amd64 (20191017)
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Parallels Software International Inc. Parallels Virtual Platform
ProcEnviron:
 TERM=linux
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-62-generic root=UUID=c42113c4-0f7a-44bc-a6ae-b27e2b146723 ro
SourcePackage: systemd
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/13/2020
dmi.bios.vendor: Parallels Software International Inc.
dmi.bios.version: 15.1.4 (47270)
dmi.board.name: Parallels Virtual Platform
dmi.board.vendor: Parallels Software International Inc.
dmi.board.version: None
dmi.chassis.type: 10
dmi.chassis.vendor: Parallels Software International Inc.
dmi.modalias: dmi:bvnParallelsSoftwareInternationalInc.:bvr15.1.4(47270):bd04/13/2020:svnParallelsSoftwareInternationalInc.:pnParallelsVirtualPlatform:pvrNone:rvnParallelsSoftwareInternationalInc.:rnParallelsVirtualPlatform:rvrNone:cvnParallelsSoftwareInternationalInc.:ct10:cvr:
dmi.product.family: Parallels VM
dmi.product.name: Parallels Virtual Platform
dmi.product.sku: Undefined
dmi.product.version: None
dmi.sys.vendor: Parallels Software International Inc.

Related branches

Revision history for this message
Basic (basicxp) wrote :
Revision history for this message
Basic (basicxp) wrote :
Dan Streetman (ddstreet) on 2020-08-26
Changed in systemd (Ubuntu):
status: New → Invalid
Revision history for this message
Dan Streetman (ddstreet) wrote :

The example config is:

network:
  version: 2
  renderer: networkd

  ethernets:
    wan:
      match:
        macaddress: "00:1c:42:eb:ee:bb"

    lan:
      match:
        macaddress: "00:1c:42:72:d3:a2"
      set-name: lan
      mtu: 9000

  vlans:
    vlan20:
      link: lan
      id: 20

this fails because 'lan' matches on the macaddr, but vlans always inherit their macaddr from their parent interface, so 'lan' and 'vlan20' will *both* match the 'lan' section (since their macaddr are identical) and thus systemd-networkd will attempt to rename vlan20 to 'lan', which will fail, since the interface 'lan' already exists.

From the systemd-networkd perspective, this is clearly a misconfiguration; when matching on macaddr, if other interfaces in the system might have the same mac (e.g. bridge, vlan, etc), the match filtering must also filter which interface to match, e.g., besides the MACAddress= match it also needs (for example) one or more of:

Driver=ixgbe
Type=!vlan
Type=!bridge

for this specific example, just adding (again, to the systemd-networkd config, not the netplan config) the match filter 'Type=!vlan' will work, however i don't think netplan provides any way to configure type matching. I think it does allow driver matching, so you might be able to edit your netplan config 'lan' section to:

    lan:
      match:
        macaddress: "00:1c:42:72:d3:a2"
        driver: "virtio_net"
      set-name: lan
      mtu: 9000

Revision history for this message
Dan Streetman (ddstreet) wrote :

Marking invalid for systemd, and adding netplan.io, though I'm not sure how exactly netplan.io might want to handle the situation. The netplan man page seems to indicate that the ethernets: section should never match any interface that isn't "physical" though that's a distinction that is unique to netplan (meaning, systemd-networkd doesn't limit its link matching like that) so netplan might need to fix up the systemd-networkd config it generates to support that documented promise.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in netplan.io (Ubuntu):
status: New → Confirmed
Revision history for this message
Ryan Harper (raharper) wrote :

@Dan

Just ran into this issue in curtin's vmtest for network_vlan on groovy. I can confirm that if the netplan config includes the driver match on the physical interfaces, then the vlans come up just fine.

I was thinking that netplan could inject the driver of the underlying device into the match section. That would help in these situations, however, I know of a few scenarios where this would need to be disabled (On Azure, for example, their Advanced Networking which auto-bonds an SRIOV interface and a HyperV nic together as eth0).

Also, this scenario does *NOT* fail for me on Focal, so I'm wondering what changed between either netplan or systemd?

Focal:
netplan.io 0.99-0ubuntu3~20.04.2
systemd 245.4-4ubuntu3.2

Groovy
netplan.io 0.99-0ubuntu6
systemd 246-2ubuntu1

tags: added: rls-ff-incomng
tags: added: rls-ff-incoming
removed: rls-ff-incomng
tags: removed: rls-ff-incoming
Changed in systemd (Ubuntu Focal):
status: New → Invalid
tags: added: fr-741
Revision history for this message
Lukas Märdian (slyon) wrote :

I am able to reproduce the problem on Groovy (netplan.io 0.100-0ubuntu5).

Using the "Type=!vlan" filter, which netplan already applies for some bridge/bond scenarios, I was able to get the vlan20 interface from 'degraded / pending' into 'degraded / configured', state and systemd-networkd-wait-online.service not timing out, but succeeding.

I'm working on an upstream solution here:
https://github.com/CanonicalLtd/netplan/pull/166

@Foster: would you be able to test if netplan.io from this PPA fixes the problem for you, too?
https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4315

Lukas Märdian (slyon) on 2020-10-22
Changed in netplan.io (Ubuntu):
status: Confirmed → In Progress
Changed in netplan.io (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package netplan.io - 0.101-0ubuntu1

---------------
netplan.io (0.101-0ubuntu1) hirsute; urgency=medium

  * New upstream release: 0.101
    - Documentation improvements
    - Improved integration tests
    - Add more examples for Wireguard, Open vSwitch, DBus
    - Improve test stability
    - Implementation of DBus Config/Get/Set/Try APIs
    - Add per-route MTU option (LP: #1860201)
    Bug fixes:
    - Fix MAAS OVS first boot (LP: #1898997)
    - Fix match of duplicate MAC on VLANs (LP: #1888726)
    - Fix crash in Python parser (LP: #1904633) (LP: #1905156)
    - Fix rename of matched interfaces at runtime (LP: #1904662)
  * Drop all distro patches, which have been integrated upstream
  * Update symbols file

 -- Lukas Märdian <email address hidden> Wed, 09 Dec 2020 09:41:50 +0100

Changed in netplan.io (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Basic (basicxp) wrote :

Finally got to check this. On latest Hirsute with netplan 0.101-0ubuntu3, my initial sample configuration seems to work as expected.

From my understanding, this hasn't been backported to Focal yet. However specifying the "driver" for the "lan" interface helps, as a workaround.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package netplan.io - 0.101-0ubuntu3~20.04.2

---------------
netplan.io (0.101-0ubuntu3~20.04.2) focal; urgency=medium

  * Backport netplan.io 0.101-0ubuntu3 to 20.04 (LP: #1908509)
    - Includes DBus Config/Get/Set/Try API
    - Includes fixes for NetworkManager integration
    - Includes Documentation improvements
    - Compatibility with systemd v247
  * Improve test stability, by adding two patches from upstream:
    - debian/patches/0004-tests-tunnels-improve-test-reliability.patch
    - debian/patches/0005-tests-dbus-improve-test-stability-of-timeouts.patch

 -- Lukas Märdian <email address hidden> Fri, 08 Jan 2021 15:17:07 +0100

Changed in netplan.io (Ubuntu Focal):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers