Comment 5 for bug 1766287

Revision history for this message
Ryan Harper (raharper) wrote :

Here's what I think is happening.

In the success case, the virtio nic is renamed by the kernel to a "stable" name prior to cloud-init local enumerating the system nics and picking a fallback device.

$ journalctl -o short-precise | egrep "(Cloud-init|rename)"
Apr 23 16:19:45.517627 ubuntu kernel: virtio_net virtio1 ens4: renamed from eth0
Apr 23 16:19:47.427137 ubuntu cloud-init[163]: Cloud-init v. 18.2 running 'init-local' at Mon, 23 Apr 2018 16:19:47 +0000. Up 6.12 seconds.

On the failing case, we see that the rename happens *after* cloud-init-local has started

Apr 23 10:33:24 ubuntu kernel: [ 3.334493] virtio_net virtio1 ens4: renamed from eth0
Apr 23 10:33:24 ubuntu cloud-init[165]: Cloud-init v. 18.2 running 'init-local' at Mon, 23 Apr 2018 10:33:21 +0000. Up 3.19 seconds.

Note here cloud-init's uptime value 3.19 seconds, is *before* the rename kernel time 3.33, about 14 milliseconds before.

When this race happens, cloud-init local reads /sys/class/net for interfaces and picks eth0; as it has not yet been renamed, then generates a config for eth0, and when rendered to netplan; it contains a Name=eth0 as part of the match section, so networkd does not apply the config as the interface is actually ens4 at this time.

There is a possibility that systemd-networkd isn't doing the rename properly; that is, in the failure path, the files will look like:

% cat /run/systemd/network$ cat 10-netplan-ens4.link
[Match]
MACAddress=42:01:0a:80:00:03

[Link]
Name=eth0
WakeOnLan=off

% cat 10-netplan-ens4.network
[Match]
MACAddress=42:01:0a:80:00:03
Name=eth0

[Network]
DHCP=ipv4

[DHCP]
UseMTU=true

The .link file should have forced ens4 back to eth0; and looks like this was happening with this log message:

Apr 23 10:33:24 ubuntu systemd-networkd[359]: ens4: Interface name change detected, ens4 has been renamed to eth0.
Apr 23 10:33:24 ubuntu systemd-networkd[359]: eth0: Interface name change detected, eth0 has been renamed to ens4.

But somehow it's moved back; when then means the .network config won't appy.