"optional: true" flag introduces problem it's meant to fix in certain circumstances

Bug #2039083 reported by Adam Vest
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Netplan
Invalid
High
Unassigned
systemd (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Hello!

This bug is in relation to the situation where the "systemd-networkd-wait-online.service" hangs for several minutes on boot before eventually failing. I guess I don't know if this flag was introduced specifically for this situation, but I do know that one of the fixes for this issue is to add "optional: true" to any non-critical interfaces (as per the docs[1]). While this may be the case, it just so happens that adding this flag to an interface when it's the only configured interface in netplan can actually INTRODUCE the issue as well. Example:

---
:~# grep -Ev "^#" /etc/netplan/50-cloud-init.yaml
network:
    version: 2
    ethernets:
        enp5s0:
            dhcp4: true
            optional: true
---

The above config will cause the service hang/failure, and the removal of the flag will resolve the issue. I primarily opened this bug report with the idea that we might update aforementioned documentation to include a caveat that you want to avoid adding this flag to the only configured interface. However, it was also discussed that we might consider having the netplan config parser complain about such a setup and consider it invalid, which it kinda is. I believe in a situation where you may have a server that should have NO network connectivity, you would simply leave netplan unconfigured and/or stop any relevant services, rather than try to configure all interfaces as optional.

My original test was on Jammy, though I tested this also on Focal and Bionic, and neither of those appear to be affected by this - setting the only interface as optional in either of those does not cause the "systemd-networkd-wait-online" service to hang and the system boots normally.

Let me know if you'd like/need any more info from me! Thank you!

[1] https://netplan.io/faq#prevent-waiting-for-interface

Revision history for this message
Lukas Märdian (slyon) wrote :

I think this is related to a recent change in behavior in systemd.

It is supposed to be fixed, by implementing the "network-online.target" specification. Which defines what should be waited for: https://discourse.ubuntu.com/t/spec-definition-of-an-online-system/27838

But we first need to get some of the groundwork landed in upstream systemd-networkd.

Changed in netplan:
status: New → Triaged
importance: Undecided → High
tags: added: network-online-ordering
Revision history for this message
Birgit Edel (biredel) wrote :

closely related LP: #2036358
Whatever Ubuntu and systemd decide to do, the manual systemd-networkd-wait-online(8) also needs clarification:

"[..] wait for all links it is aware of and which are
 managed by systemd-networkd.service(8) to be fully configured or
 failed, and for at least one link to be online."

The "and at least one" phrase neither clarifies the previous, arguably preferable, Ubuntu patch - nor how the alternative is supposed to be used, where networkd knows it is not managing anything and goes to sleep(120) anyway, while some other network-online.target dependency might have already finished.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Isaac True (itrue) wrote :

Copying from my comment https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2036358/comments/43

systemd 249.11-0ubuntu3.11 doesn't resolve the issue I'm facing (this one). The long delay seems to be caused by systemd-networkd-wait-online not respecting "RequiredForOnline=no".

ubuntu@ubuntu:~$ cat /run/systemd/network/10-netplan-eth0.network
[Match]
Name=eth0

[Link]
RequiredForOnline=no

[Network]
DHCP=yes
LinkLocalAddressing=ipv6

[DHCP]
RouteMetric=100
UseMTU=true

Revision history for this message
Isaac True (itrue) wrote :

From https://www.man7.org/linux/man-pages/man5/systemd.network.5.html

If RequiredForOnline=no is set, systemd-networkd-wait-online should skip the interface:

    The network will be brought up normally (as configured by
    ActivationPolicy=), but in the event that there is no address
    being assigned by DHCP or the cable is not plugged in, the
    link will simply remain offline and be skipped automatically
    by systemd-networkd-wait-online if "RequiredForOnline=no".

Revision history for this message
Isaac True (itrue) wrote (last edit ):

As far as I can tell we are facing this issue: https://github.com/systemd/systemd/issues/25813

This was fixed in systemd 253 with the following patch https://github.com/systemd/systemd/commit/ab3aed4a0349bbaa26f53340770c1b59b463e05d (and https://github.com/systemd/systemd/commit/2f96a29c2c55bdd67cdd8e0b0cfd6971968e4bca to fix a regression introduced by the first patch).

If all managed interfaces have RequiredForOnline=no, and the rest are unmanaged, this issue pops up.

Revision history for this message
Lukas Märdian (slyon) wrote :

So if I understand correctly, this does not affect Focal or Bionic.
It also does not affect Mantic or Noble.

We're just hitting this issue on Jammy = systemd v249.11 (and probably Lunar = systemd v252.5).

Netplan's behavior seems to be correct, here. It writes sensible configuration for systemd-networkd. The issue is in (upstream) systemd, which blocks on an empty list, which it shouldn't. That behavior is apparently fixed in systemd v253.

I think we should not try to work around this systemd behaviour in Netplan, but either live with the upstream behavior for v249 or backport the fixes from systemd v253 into Jammy.

tags: added: rls-jj-incoming
Changed in netplan:
status: Triaged → Invalid
Revision history for this message
Nick Rosbrook (enr0n) wrote :

Lukas - upstream is actually broken in v253 until v253.6, and the reason it appears "OK" in v253.5 (which we have in Mantic) is that [1] introduces a bug upstream that makes systemd-networkd-wait-online behave similarly to Ubuntu's patched systemd-networkd-wait-online prior to systemd 249.11-0ubuntu3.10. Since (a) there has been lots of churn in this area already with SRUs and bug reports, and (b) we really need to implement our network-online spec to *really* fix this, I decided to leave it alone for Mantic.

Isaac - what happens if you add the --any flag to systemd-networkd-wait-online.service (best to do this with an override config), e.g.

# /etc/systemd/system/systemd-networkd-wait-online.service.d/override.conf
[Service]
ExecStart=
ExecStart=/lib/systemd/systemd-networkd-wait-online --any

That should make it so that it does not wait on all the other unmanaged interfaces. I realize this is a change in behavior in Jammy, but the old behavior systemd-networkd-wait-online was worse in my opinion. It's "wrong" to run systemd-networkd-wait-online without arguments on a system where not everything is managed by systemd-networkd, so I think the best solution for users in Isaac's situation is to add overrides that work for their setup.

[1] https://github.com/systemd/systemd/commit/ab3aed4a0349bbaa26f53340770c1b59b463e05d

Changed in systemd (Ubuntu):
status: Confirmed → Incomplete
tags: removed: rls-jj-incoming
Nick Rosbrook (enr0n)
Changed in systemd (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Brett Holman (holmanb) wrote :

Is this behavior supposed to be fixed in 24.04? I see a similar report here[1] based on 24.04 where the apparent fix involves removing optional: true to avoid blocking boot.

[1] https://github.com/Joshua-Riek/ubuntu-rockchip/issues/757

Revision history for this message
Lukas Märdian (slyon) wrote :

Yes, it's supposed to be fixed via bug #2060311

Netplan now generates a systemd-networkd-wait-online.service.d/10-netplan.conf override config to explicitly list any configured Network interface, thus making the system wait on all of them.

If an interface is marked "optional: true" it will not be listed and the system will not block on it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.