Comment 6 for bug 1949893

Revision history for this message
Lukas Märdian (slyon) wrote :

Okay, let me try to unpack this. I think we're running into multiple issues at the same time here.

1/ The reproducer in #3 works for me and is the root cause IMO. The sleep(0.1) vs sleep(2) makes the difference here. I think that is because the 'netplan-dbus' (deamon) just starts the 'netplan try' process and returns the .Try() dbus method immediately. It does not wait for it do actually move files around and trigger the relevant systemd-networkd actions. If we .Apply()/Accept/SIGUSR1 this process too early, it might stop without all the changes correctly applied. A 'netplan apply' (that is also executed within any 'netplan try' call) takes 0.6sec on average. So if we wait for 1-2 sec (instead of 0.1 sec) the try command got enough time to do it's thing and the 'br54' device appears.
This is clearly an issue in netplan, as the dbus .Try() method should not return before that process is ready to be interrupted again. I will try to implement some polling to make sure we wait long enough before we return that dbus method. As a simple/quick workaround you could add a 'sleep(2)' after that io.netplan.Netplan.Config.Try() call.

Also, I think all of this is pretty much unrelated to the potential race condition that @mardy mentions in #5 as that could only hit after the .Try() timeout (i.e. +30sec in this case), while our issue happens within the first second of calling .Try().

2/ The netplan.go implementation that you linked in the description sets "network=null" at the beginning, which is broken currently (LP: #1942930). That's probably something that we need to fix independently.

3/ The workaround you implemented in #2 runs into a different issue of calling 'netplan apply' without passing a '--state' argument (the dbus methods do that automatically for your). Therefore netplan is not aware of the old interface configuration and thus cannot delete/clear the br54 interface. (See https://github.com/canonical/netplan/commit/730fbbd5a59e94b365546024a23e05584d91411d which was recently SRUed down to Focal)

Let's focus on (1) for now.