network configuration failed on reboot

Bug #1930738 reported by Deepika Maharjan
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Won't Fix
Undecided
Unassigned
Focal
Won't Fix
Undecided
Unassigned

Bug Description

[impact]

number of statically defined addresses for an interface in systemd-networkd is limited

[test case]

Note: this only occurs in a container; this is not reproducable in a VM or bare metal.

Configure netplan with the attached yaml file (10-test.yaml)

enable debug for systemd-networkd

reboot the system and check the journalctl output to see if any errors were reported for systemd-networkd, e.g.:

$ journalctl -b -u systemd-networkd | grep 'could not set'
Jul 23 13:16:52 lp1930738-b systemd-networkd[189]: eth0: could not set address: Connection timed out
...

Note that systemd may be able to actually correctly set all addresses, but fails to communicate with netlink to determine the addresses are set, so just checking the output of 'ip a' is not enough, the systemd-networkd debug log should be checked

[regression potential]

possible failure to correctly apply all statically defined interfaces

[scope]

TBD, not fully fixed upstream

this is needed in f and b

this is fixed upstream with commits 628f08b66d43d1947b03419409d817d28eb47321 and PR 16982 which are included in v246 and later, so this is fixed in h and later

[other info]

I elided upstream commit d31f33e3c9f6ea3bdc873ee52f4398edbec74527 as that changes the udev-related behavior of networkd-manager inside a container, which is not appropriate for SRU for this bug, as I don't see any clear bug-related reason to change that behavior.

Additionally this requires the typo fix from commit 4934ba2121d76229659939e19ab7d70a89446629

[original description]

This issue was reported at https://github.com/systemd/systemd/issues/17012

**Used distribution**
 > Ubuntu 20.04.1 LTS

**systemd version the issue has been seen with**
> 245.4-4ubuntu3.2

**Issue details**
I configured 255 IPv4 address (including primary IP) using netplan but when the server restart, it time out on configuring the interface. If I limit total IPv4 addresses to 181 or less, it works. But anything larger than 181 fails.

Below are my configurations and error logs.

**/etc/netplan/10-ens3.yaml**
```
network:
  version: 2
  renderer: networkd
  ethernets:
    ens3:
      dhcp4: no
      addresses:
        - 140.XX.XX.XX/23
        - 103.XXX.XX.1/24
        - 103.XXX.XX.2/24
        - CONTINUED IP ADDRESS UPTO BELOW ...
        - 103.XXX.XX.254/24
      gateway4: 140.XX.XX.X
      nameservers:
        addresses: [1.1.1.1, 1.0.0.1]
      routes:
        - to: 169.254.0.0/16
          via: 140.XX.XX.X
          metric: 100
```
The above config works if I run `netplan apply` but when I reboot, it does not work.

**networkctl**
```
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 ens3 ether routable failed

2 links listed.
```

**/etc/systemd/system/systemd-networkd.service.d/override.conf**
```
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug
```

**systemctl status systemd-networkd.service**
```
● systemd-networkd.service - Network Service
     Loaded: loaded (/lib/systemd/system/systemd-networkd.service; enabled-runtime; vendor preset: enabled)
    Drop-In: /etc/systemd/system/systemd-networkd.service.d
             └─override.conf
     Active: active (running) since Thu 2020-09-10 19:46:58 UTC; 1min 36s ago
       Docs: man:systemd-networkd.service(8)
   Main PID: 346 (systemd-network)
     Status: "Processing requests..."
      Tasks: 1 (limit: 1074)
     Memory: 3.8M
     CGroup: /system.slice/systemd-networkd.service
             └─346 /lib/systemd/systemd-networkd

Sep 10 19:47:03 test-server systemd-networkd[346]: NDISC: Sent Router Solicitation, next solicitation in 7s
Sep 10 19:47:11 test-server systemd-networkd[346]: NDISC: No RA received before link confirmation timeout
Sep 10 19:47:11 test-server systemd-networkd[346]: NDISC: Invoking callback for 'timeout' event.
Sep 10 19:47:11 test-server systemd-networkd[346]: NDISC: Sent Router Solicitation, next solicitation in 15s
Sep 10 19:47:23 test-server systemd-networkd[346]: Assertion 'm->sealed' failed at src/libsystemd/sd-netlink/netlink-message.c:582, function netlink_message_read_internal(). Ignoring.
Sep 10 19:47:23 test-server systemd-networkd[346]: ens3: Could not set address: Connection timed out
Sep 10 19:47:23 test-server systemd-networkd[346]: ens3: Failed
Sep 10 19:47:23 test-server systemd-networkd[346]: ens3: State changed: configuring -> failed
Sep 10 19:47:23 test-server systemd-networkd[346]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_32 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=13 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
Sep 10 19:47:23 test-server systemd-networkd[346]: NDISC: Stopping IPv6 Router Solicitation client
```

Tags: patch
information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
Dan Streetman (ddstreet) wrote :

i can't reproduce this, and your log output looks like you have are having problems with your ipv6 configuration, not ipv4. you'll need to attach the full journal log if you still have this problem

Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
Frank Villaro (f-ran-k) wrote :

Hi,

In our case a backport of the specific commit fixed the issue. I really do think that this is (a rather difficult to reproduce) problem, which the above commit fixes.

Cheers

Revision history for this message
Dan Streetman (ddstreet) wrote :

can you provide steps to reproduce the problem?

Revision history for this message
Deepika Maharjan (deepika-maj) wrote :

From fresh installation of ubuntu, I simply configured one main and 255 IPv4 address from /24 subnet as mentioned on original post in /etc/netplan/10-ens3.yaml and this issue occurs. Even tried Ubuntu 20.04.2 LTS, same issue.

Dan Streetman (ddstreet)
description: updated
Changed in systemd (Ubuntu Bionic):
status: New → In Progress
Changed in systemd (Ubuntu Focal):
status: New → In Progress
Changed in systemd (Ubuntu):
status: Incomplete → Fix Released
Changed in systemd (Ubuntu Focal):
importance: Undecided → Low
Changed in systemd (Ubuntu Bionic):
importance: Undecided → Low
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Focal):
assignee: nobody → Dan Streetman (ddstreet)
Dan Streetman (ddstreet)
description: updated
Changed in systemd (Ubuntu Bionic):
status: In Progress → New
assignee: Dan Streetman (ddstreet) → nobody
Dan Streetman (ddstreet)
description: updated
Revision history for this message
Dan Streetman (ddstreet) wrote :

Deepika, Frank, is this happening for you inside an unprivileged container, or some other environment?

I'm able to reproduce this in a container, but it appears still broken with upstream systemd, so I'm confused by Frank's comment 2. Also, it isn't intermittent at all for me, so maybe you're seeing some other problem.

Changed in systemd (Ubuntu):
status: Fix Released → Incomplete
Changed in systemd (Ubuntu Bionic):
status: New → Incomplete
Changed in systemd (Ubuntu Focal):
status: In Progress → Incomplete
Revision history for this message
Dan Streetman (ddstreet) wrote :
description: updated
description: updated
Revision history for this message
Deepika Maharjan (deepika-maj) wrote :

I got this issue on a VPS from cloud provider. For me, this issue occurs every time I restart the VPS. To access the server after restart, I execute `netplan apply` command using web console/KVM.

Furthermore, this issue does not occurs in Ubuntu 21.04

There are more details regarding this issue and may be commit refs at https://github.com/systemd/systemd/issues/17012

Revision history for this message
Dan Streetman (ddstreet) wrote :

> There are more details regarding this issue and may be commit refs at
> https://github.com/systemd/systemd/issues/17012

I'm aware of that as noted in my updates to the description, but that doesn't actually fix this inside unprivileged containers, which is the only place I can reproduce this.

Please provide a full journal log from a boot on your VPS that reproduces this

Revision history for this message
Frank Villaro (f-ran-k) wrote :

Hi, Sorry for the late reply I didn't receive the notifications.

This problem happens on a "bare" environment: it's a simple server (IPv6 RA + v4 DHCP) with some IPv6 ranges assigned to the `lo` interface.

Don't hesitate if you need more info, although we don't reproduce the bug anymore due to our backport

Cheers !

Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Focal):
assignee: Dan Streetman (ddstreet) → nobody
importance: Low → Undecided
Changed in systemd (Ubuntu Bionic):
importance: Low → Undecided
Revision history for this message
Frank Villaro (f-ran-k) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "sd-netlink-make-timeout-message-sealed.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Nick Rosbrook (enr0n) wrote :

According to the reporter, this issue was not observed in Ubuntu 21.04 forward. I do not think there is a strong case for addressing this in focal or bionic.

Changed in systemd (Ubuntu Bionic):
status: Incomplete → Won't Fix
Changed in systemd (Ubuntu Focal):
status: Incomplete → Won't Fix
Changed in systemd (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.