Networking failures after NIC reordering

Bug #1958280 reported by Chris Patterson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Netplan
Incomplete
Undecided
Unassigned
cloud-init
Expired
High
Unassigned

Bug Description

We can reliably reproduce a case where network configuration changes for an Ubuntu 20.04 VM results in a networkd hanging on "pending" interfaces. The interfaces are pending because of conflicts in naming from the current boot and that found in /etc/netplan/50-cloud-init.yaml from previous boot

Specifically, the netplan generator applies the previous configuration's names prior to running cloud-init local. We'll see something like `systemd-udevd[228]:
eth0: Failed to process device, ignoring: File exists`.

In one scenario, the data source is able to fetch updated network configuration, and cloud-init updates the config & udev rules just fine. However, networking stays offline ("pending") indefinitely. It can be forced to resolve by executing `sudo udevadm trigger --attr-match=subsystem=net`.

Example: Create a VM on Azure with two NICs, re-order them, then restart.

az vm create --name test-x1 --image Canonical:0001-com-ubuntu-server-focal:20_04-lts:latest --nics test-nic-01 test-nic-02
az vm deallocate --name test-x1
az vm nics set --vm-name test-x1 --nics test-nic-02 test-nic-01
az vm start --name test-x1

Upon doing that I am unable to login via serial console for 20 minutes until cloud init times out. In this case, Azure is trying to report ready but cannot because system networking never came up. We can remove /lib/systemd/system/cloud-init-local.service.d/50-azure-clear-persistent-obj-pkl.conf, cloud-init doesn't hang the boot, but networking still fails to initialize for the guest.

The behavior for 18.04 is a bit different. On 18.04, the renaming of the interfaces succeeds at early boot, which instead results in the Azure data source failing the local phase because the fallback_interface is no longer the primary NIC (eth1 secondary was renamed to eth0 to match previous boot's config).

Revision history for this message
Chris Patterson (cjp256) wrote :
Revision history for this message
Chris Patterson (cjp256) wrote :
Revision history for this message
James Falcon (falcojr) wrote :

Thanks for the thorough bug report. I have confirmed the 20.04 behavior and the root cause.

Changed in cloud-init:
status: New → Triaged
importance: Undecided → High
Revision history for this message
James Falcon (falcojr) wrote :

Adding netplan here as cloud-init is generating the netplan config correctly before network comes up.

SamKenXStream (samkenx)
Changed in netplan:
status: New → Incomplete
Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.