Comment 8 for bug 1997124

Revision history for this message
Chad Smith (chad.smith) wrote :

The dbus race that is happening here is due to `networkctl reconfigure`[1] being run by netplan apply, failing to talk to dbus, and restarting systemd_networkd[2] at that point in time when systemd_network may actually be coming up and is in an indeterminate state.

[1] https://github.com/canonical/netplan/blob/main/netplan/cli/utils.py#L116
[2] https://github.com/canonical/netplan/blob/main/netplan/cli/commands/apply.py#L277

I'm guessing the restart here from netplan apply is what's triggering the occasional failure case where not all network config is applied (like IP addresses) in systemd-networkd. It doesn't happen all the time but it's racy as systemd-networkd is mid startup and we're restarting it again via netplan apply.

After discussion with waldi (Bastian Blank) in Debian land about the systemd dependency chain, it seems my suggestion about about adding dbus.socket to cloud-init.service will actually introduce an ordering cycle because dbus.socket is
  After=sysinit.target, yet cloud-init.service is Before=sysinit.target.

So, trying to shoehorn cloud-init into the dependency chain After=dbus.socket is impossible for systemd to schedule.

Maybe, we'd want one of the following instead:
 1. `netplan apply` provide an option to avoid falling back to `networkctl reconfigure` and exit non-zero so cloud-init can do something better, or retry where necessary
 2. `netplan apply` can defer or block/retry until dbus.socket/service is ready allowing this only to affect cases where netplan apply is called
 3. cloud-init to defer calling netplan apply on systemd-networkd environments until later boot stage (cloud-config.service) which comes after sysinit.target (and therefore can expect dbus.socket to be started at that point in boot.

I'll add netplan here to see if there are thoughts or counter suggestions here.