In the success case, the virtio nic is renamed by the kernel to a "stable" name prior to cloud-init local enumerating the system nics and picking a fallback device.
$ journalctl -o short-precise | egrep "(Cloud-init|rename)"
Apr 23 16:19:45.517627 ubuntu kernel: virtio_net virtio1 ens4: renamed from eth0
Apr 23 16:19:47.427137 ubuntu cloud-init[163]: Cloud-init v. 18.2 running 'init-local' at Mon, 23 Apr 2018 16:19:47 +0000. Up 6.12 seconds.
On the failing case, we see that the rename happens *after* cloud-init-local has started
Apr 23 10:33:24 ubuntu kernel: [ 3.334493] virtio_net virtio1 ens4: renamed from eth0
Apr 23 10:33:24 ubuntu cloud-init[165]: Cloud-init v. 18.2 running 'init-local' at Mon, 23 Apr 2018 10:33:21 +0000. Up 3.19 seconds.
Note here cloud-init's uptime value 3.19 seconds, is *before* the rename kernel time 3.33, about 14 milliseconds before.
When this race happens, cloud-init local reads /sys/class/net for interfaces and picks eth0; as it has not yet been renamed, then generates a config for eth0, and when rendered to netplan; it contains a Name=eth0 as part of the match section, so networkd does not apply the config as the interface is actually ens4 at this time.
There is a possibility that systemd-networkd isn't doing the rename properly; that is, in the failure path, the files will look like:
The .link file should have forced ens4 back to eth0; and looks like this was happening with this log message:
Apr 23 10:33:24 ubuntu systemd-networkd[359]: ens4: Interface name change detected, ens4 has been renamed to eth0.
Apr 23 10:33:24 ubuntu systemd-networkd[359]: eth0: Interface name change detected, eth0 has been renamed to ens4.
But somehow it's moved back; when then means the .network config won't appy.
Here's what I think is happening.
In the success case, the virtio nic is renamed by the kernel to a "stable" name prior to cloud-init local enumerating the system nics and picking a fallback device.
$ journalctl -o short-precise | egrep "(Cloud- init|rename) "
Apr 23 16:19:45.517627 ubuntu kernel: virtio_net virtio1 ens4: renamed from eth0
Apr 23 16:19:47.427137 ubuntu cloud-init[163]: Cloud-init v. 18.2 running 'init-local' at Mon, 23 Apr 2018 16:19:47 +0000. Up 6.12 seconds.
On the failing case, we see that the rename happens *after* cloud-init-local has started
Apr 23 10:33:24 ubuntu kernel: [ 3.334493] virtio_net virtio1 ens4: renamed from eth0
Apr 23 10:33:24 ubuntu cloud-init[165]: Cloud-init v. 18.2 running 'init-local' at Mon, 23 Apr 2018 10:33:21 +0000. Up 3.19 seconds.
Note here cloud-init's uptime value 3.19 seconds, is *before* the rename kernel time 3.33, about 14 milliseconds before.
When this race happens, cloud-init local reads /sys/class/net for interfaces and picks eth0; as it has not yet been renamed, then generates a config for eth0, and when rendered to netplan; it contains a Name=eth0 as part of the match section, so networkd does not apply the config as the interface is actually ens4 at this time.
There is a possibility that systemd-networkd isn't doing the rename properly; that is, in the failure path, the files will look like:
% cat /run/systemd/ network$ cat 10-netplan- ens4.link 42:01:0a: 80:00:03
[Match]
MACAddress=
[Link]
Name=eth0
WakeOnLan=off
% cat 10-netplan- ens4.network 42:01:0a: 80:00:03
[Match]
MACAddress=
Name=eth0
[Network]
DHCP=ipv4
[DHCP]
UseMTU=true
The .link file should have forced ens4 back to eth0; and looks like this was happening with this log message:
Apr 23 10:33:24 ubuntu systemd- networkd[ 359]: ens4: Interface name change detected, ens4 has been renamed to eth0. networkd[ 359]: eth0: Interface name change detected, eth0 has been renamed to ens4.
Apr 23 10:33:24 ubuntu systemd-
But somehow it's moved back; when then means the .network config won't appy.