systemd-networkd-wait-online.service runs into a timeout during boot

Bug #2063331 reported by Benjamin Drung
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Fix Released
Undecided
Paride Legovini
subiquity
Triaged
High
Unassigned
netplan.io (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

I did an installation of http://cdimage.ubuntu.com/ubuntu-server/daily-live/20240423/noble-live-server-riscv64.img.gz on a Vision Five 2 board. The Vision Five 2 board has two LAN ports. I have one connected.
systemd-networkd-wait-online.service runs into a timeout during boot. subiquity configured the network config to require both ports to be active:

```
$ sudo cat /etc/cloud/cloud.cfg.d/90-installer-network.cfg
# This is the network config written by 'subiquity'
network:
  ethernets:
    end0:
      dhcp4: true
    end1:
      dhcp4: true
  version: 2
```

```
$ systemd-analyze
Startup finished in 12.290s (kernel) + 2min 7.647s (userspace) = 2min 19.937s
graphical.target reached after 2min 7.546s in userspace.
```

This might be related to bug #2060311.

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: subiquity (unknown)
ProcVersionSignature: Ubuntu 6.8.0-31.31.1-generic 6.8.1
Uname: Linux 6.8.0-31-generic riscv64
ApportVersion: 2.28.1-0ubuntu2
Architecture: riscv64
CasperMD5json:
 {
   "result": "skip"
 }
Date: Wed Apr 24 11:52:09 2024
InstallationDate: Installed on 2024-04-24 (0 days ago)
InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Release riscv64 (20240423)
SourcePackage: subiquity
Symptom: installer
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Benjamin Drung (bdrung) wrote :
Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

This configuration will create a wait-online drop in config that will wait for carrier and link-local addresses on both ports. As one is disconnected, it will wait for it.

As users might connect only one of them, and any one of them, the installer should probably set "optional: true" for all the interfaces.

Revision history for this message
Benjamin Drung (bdrung) wrote :

Setting `optional: true` solves the boot delay.

```
$ sudo cat /etc/netplan/50-cloud-init.yaml
# This file is generated from information provided by the datasource. Changes
# to it will not persist across an instance reboot. To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    ethernets:
        end0:
            dhcp4: true
            optional: true
        end1:
            dhcp4: true
            optional: true
    version: 2
```

The boot is fast now:
```
$ systemd-analyze
Startup finished in 12.297s (kernel) + 8.067s (userspace) = 20.365s
graphical.target reached after 7.962s in userspace.
```

description: updated
Revision history for this message
Lukas Märdian (slyon) wrote :

I guess we need a policy decision in subiquity: "Should it configure server NICs as <optional: true> or not?" rather than a technical fix here.

Longer term solution would be to somehow find a solution in cloud-init, not to block the boot process. But instead let the the systemd-networkd-wait-online.service timeout & fail while the rest of the boot process can continue in parallel.

Changed in ubuntu-release-notes:
assignee: nobody → Paride Legovini (paride)
Revision history for this message
Paride Legovini (paride) wrote (last edit ):

Confirmed on a laptop (amd64) where I plugged some extra usb-to-ethernet adapters.

* To clarify: this happens when the user reboots to the installed system, not at install time. And is it not specific to riscv64 in any way.

* Interfaces marked as 'disabled' at install time get configured as 'dhcp4' on the installed system. The 'optinal' setting is not set, and it defaults to 'false'.

* systemd-networkd-wait-online.service is triggered twice during the boot, so the total wait time is 4 minutes.

* `systemctl is-system-running` returns `degraded` because the unit failed to start.

* Users can fix this by manually editing the netplan config and dropping the undesired network interfaces, or marking them as optional. A reboot after this works fine (no wait time, is-system-running returning `running`).

* Subiquity should mark install-time unconfigured interfaces optional, or avoid configuring them at all if they are set as "disabled" at install time. (This would be my preference: if an interface is "disabled" I don't want it to be enabled and actively sending dhcp requests, but I'm sure there are good users stories that call for enabling dhcp4 by default.)

* This can mitigated by a subiquity refresh, however fully offline installs (where at least 1 network interface is present) will still hit the bug.

* Is it too late to attempt a respin for 24.04 images.

Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/2063331

tags: added: iso-testing
Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):

After some sparring with Steve, Dave and others, I think we should apply this policy in our installers:

#1 - Any interface that was connected and dynamically autoconfigured at installation-time should be configured the same way in the target system, as a normal/non-optional interface.

#2 - Any interface that was explicitly configured by the user in the installer UI, should be configured in that explicit way in the target system, as a normal/non-optional interface.

#3 - Any interface that was disabled at installation-time, should not be configured at all in the target system, to avoid unintended behaviour by plugging-in Ethernet cables into random network ports.
#3a - This applies to interfaces that were unplugged at installation-time or could not be autoconfigured by the installer.
#3b - This applies to interfaces that were explicitly disabled by the user through the installer UI

Optionally:
Interfaces from #3a might be listed in the target system's Netplan configuration and configured as "dhcp4: true" and "optional: true". But the network definitions describing such interfaces should be commented out in the Netplan configuration. They should merely function as an example to the user, should they wish to manually change their configuration afterwards.

Revision history for this message
Lukas Märdian (slyon) wrote :

As an appendix to comment #7:

The situation gets interesting when we consider Desktop images. According to our network-online.target [spec] we want to "implement a common behavior across the Distro", so we should apply the same policy as described in comment #7.

An interesting case is a desktop system (e.g. laptop) that gets installed with the eth0 interface connected and autoconfigured. But the cable gets unplugged afterwards and the system starts roaming around using the internal wlan0 WiFi interface. Once the system reboots (cable still unplugged) this would (potentially) block "network-online.target".

Which is totally fine!
- As long as the normal boot process (reaching "default.target") can continue in parallel. Only applications that really depend on connectivity should wait for "network-online.target" and thus be delayed (for a reason).

- As long as no application blocks "default.target" ("graphical.target"), while at the same time waiting on "network-online.target" itself. Should we detect such applications, those applications (their systemd dependencies) need to be fixed.

Note:
On desktop systems the "wait-online" policy will not currently be enforced at all. This is due to "NetworkManager-wait-online.service" handling the situation more lax and "systemd-networkd.service" + "systemd-networkd-wait-online.service" are disabled by default on desktop images.

[spec] https://discourse.ubuntu.com/t/spec-definition-of-an-online-system/27838

Dan Bungert (dbungert)
Changed in subiquity:
status: New → Triaged
importance: Undecided → High
tags: added: foundations-todo
Paride Legovini (paride)
Changed in ubuntu-release-notes:
status: New → Fix Released
Changed in netplan.io (Ubuntu):
milestone: none → noble-updates
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in netplan.io (Ubuntu):
status: New → Confirmed
Revision history for this message
Lukas Märdian (slyon) wrote :

This is to be fixed in the installer, not Netplan.

Changed in netplan.io (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Don Solaris (dsolaris) wrote :

Just a quick heads up (GA-790FXTA-UD5-rev-10 motherboard), adding the

...
optional: true
...

to the 50-cloud-init.yaml does indeed solve the bug.

I had exact same situation with twin ethernet port on my motherboard. It is now solved.

OT:
Can't believe in a long time Google actually found an answer for me from the first try. Tags: "networkd-wait-online.service" two ports

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.