systemd-networkd core dumps in bionic-proposed

Bug #1818340 reported by Mark Shuttleworth
28
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
High
Unassigned
Bionic
Fix Released
Critical
Dan Streetman
Cosmic
Fix Released
Critical
Dan Streetman
Disco
Fix Released
High
Unassigned

Bug Description

[Impact]

during restart, systemd-networkd fails an assertion and aborts, leaving the system networking partially (if at all) configured. Further restarts continue to fail.

[Test Case]

Install a bionic system (cosmic affected also) with only systemd-networkd networking (i.e. uninstall or do not configure netplan). Ensure no networkd conf files are in /run/systemd/network. Stop networkd (sudo systemctl stop systemd-networkd). The interface to test with networkd (e.g. ens3) should have no address assigned and should be down.

Create a file similar to below, adjusting for interface name:

$ cat /etc/systemd/network/10-netplan-ens3.network
[Match]
Name=ens3

[Network]
Address=192.168.122.68/24

Start networkd:

ubuntu@lp1818340-b:~$ sudo systemctl start systemd-networkd
ubuntu@lp1818340-b:~$ ip a show ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:6e:8c:9f brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.68/24 brd 192.168.122.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe6e:8c9f/64 scope link
       valid_lft forever preferred_lft forever

Stop networkd; ens3 should retain its address:

ubuntu@lp1818340-b:~$ sudo systemctl stop systemd-networkd
ubuntu@lp1818340-b:~$ ip a show ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:6e:8c:9f brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.68/24 brd 192.168.122.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe6e:8c9f/64 scope link
       valid_lft forever preferred_lft forever

Start networkd again; the bug is triggered:

ubuntu@lp1818340-b:~$ sudo systemctl start systemd-networkd
Job for systemd-networkd.service failed because a fatal signal was delivered causing the control process to dump core.
See "systemctl status systemd-networkd.service" and "journalctl -xe" for details.

Alternately, instead of separately stopping and then starting networkd, the failure can be reproduced with just a restart.

Note the failure only happens with statically-assigned addresses; interfaces configured with dhcp do not trigger this bug.

[Regression Potential]

TBD

[Other Info]

This was introduced by the SRU for bug 1812760; both the new behavior of networkd not removing managed addresses/routes from managed interfaces, as well as the assertion failure bug. This does not fail in disco; I believe additional commit(s) from upstream need to be backported.

Original description:

---

I run a number of servers with -proposed enabled and have seen a bunch of this today:

Mar 02 16:20:58 4-ridge-fw1 systemd[1]: systemd-networkd.service: Failed with result 'core-dump'.
Mar 02 16:20:58 4-ridge-fw1 systemd[1]: Failed to start Network Service.

These machines have numerous bonds, so I suspect that's a factor.

So far I have only observed the issue on machines with -proposed enabled so I suspect it is a problem with systemd 237-3ubuntu10.14

Example netplan.yaml attached.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :
Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Needless to say I suggest not promoting systemd from -proposed until this is figured out :)

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Further digging in journalctl shows:

-- Unit systemd-networkd.service has begun starting up.
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: bond-lan: netdev ready
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: bond-net: netdev ready
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: bond-peer: netdev ready
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: bond-plein: netdev ready
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: bond-protea-2: netdev ready
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: bond-protea: netdev ready
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: bond-peer: Gained IPv6LL
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: bond-plein: Gained IPv6LL
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: bond-protea-2: Gained IPv6LL
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: bond-protea: Gained IPv6LL
Mar 02 20:37:00 4-ridge-fw1 systemd-networkd[6851]: Assertion 'link->state == LINK_STATE_SETTING_ADDRESSES' failed at ../src/network/networkd-link
Mar 02 20:37:00 4-ridge-fw1 systemd[1]: systemd-networkd.service: Main process exited, code=dumped, status=6/ABRT
Mar 02 20:37:00 4-ridge-fw1 systemd[1]: systemd-networkd.service: Failed with result 'core-dump'.
Mar 02 20:37:00 4-ridge-fw1 systemd[1]: Failed to start Network Service.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

I found that the bond seems not to have come up properly:

$ sudo cat /proc/net/bonding/bond-lan
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: down
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 5e:07:b7:b3:1d:bd
bond bond-lan has no active aggregator

Slave Interface: p3p4
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 3c:fd:fe:77:32:cb
Slave queue ID: 0
Aggregator ID: N/A

Slave Interface: p3p3
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 3c:fd:fe:77:32:ca
Slave queue ID: 0
Aggregator ID: N/A

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Taking the bond down and then bringing it back up again seemed to sort out the aggregation. I was able to assign an IP address to the bond and ping it. However, even with the bond sorted, restarting systemd-networkd dumps core.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Apport info from an affected system is in #1818487

Revision history for this message
Sebastien Bacher (seb128) wrote :

I'm marking that bug regression-proposed and tagged bug #1812760 from the SRU as verification-failed to make sure the SRU is not promoted. Ccing people who have been involved in the SRU as well now

tags: added: regression-proposed
Changed in systemd (Ubuntu):
importance: Undecided → High
Dan Streetman (ddstreet)
tags: added: sts
Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Disco):
status: New → Fix Released
Changed in systemd (Ubuntu Cosmic):
status: New → In Progress
Changed in systemd (Ubuntu Bionic):
status: New → In Progress
Changed in systemd (Ubuntu Cosmic):
importance: Undecided → Critical
Changed in systemd (Ubuntu Bionic):
importance: Undecided → Critical
Changed in systemd (Ubuntu Cosmic):
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Bionic):
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Cosmic):
importance: Critical → High
Changed in systemd (Ubuntu Bionic):
importance: Critical → High
Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Bionic):
importance: High → Critical
Changed in systemd (Ubuntu Cosmic):
importance: High → Critical
Dan Streetman (ddstreet)
description: updated
Revision history for this message
Dan Streetman (ddstreet) wrote :

Note: I verified this regression is not related to the kernel; I installed and rebooted into 4.15.0-46-generic from bionic-proposed, but still with systemd 237-3ubuntu10.13 from bionic-updates, and could not reproduce the regression; this is definitely caused by the systemd in -proposed.

Dan Streetman (ddstreet)
description: updated
Revision history for this message
Dan Streetman (ddstreet) wrote :

I looked into the networkd-link.c code that bug 1812760 changed, and more patches are definitely needed than just what was included in that bug's MP; unfortunately it looks quite a bit more complex.

For now, I've uploaded new systemd srus to b/c that simply revert the patches from bug 1812760, until I can look closer at the patches and possible other/simpler ways to correct the problem in that bug.

sru uploads are in the queues.

Revision history for this message
Daniel Axtens (daxtens) wrote :

Oof, sorry! It's not clear to me from the bug report and subsequent comments - is it just Bionic that's affected, or is it also Cosmic?

Revision history for this message
Daniel Axtens (daxtens) wrote :

Never mind, I can reproduce on Cosmic.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote : Re: [Bug 1818340] Re: systemd-networkd core dumps in bionic-proposed

Just to let you know, I still see 237-3ubuntu10.14 in -proposed and I
think that's the one that has the issue.

Mark

Revision history for this message
Daniel Axtens (daxtens) wrote :

OK, so with the magic of debug symbols and gdb on Cosmic:

(gdb) run
...
ens8: Gained IPv6LL
Assertion 'link->state == LINK_STATE_SETTING_ADDRESSES' failed at ../src/network/networkd-link.c:803, function link_enter_set_routes(). Aborting.
...
(gdb) up
#3 0x000055555566b194 in link_enter_set_routes (link=0x55555571d050) at ../src/network/networkd-link.c:803
803 ../src/network/networkd-link.c: No such file or directory.
(gdb) p link->state
$3 = LINK_STATE_PENDING

Looking at the code, it seems we are hitting link_enter_set_routes() before link_enter_set_addresses() which is where the state is set. We're hitting link_enter_set_routes() because link_check_ready() now calls it straight off the bat.

I think the backport just needs to add a check to not flow through to setting the routes until after we've gone through the process of setting the addresses; we can do that with the attached patch. (It applies to the cosmic version, I haven't tested it against Bionic.)

Having said that Dan you've obviously had a closer look at the code and more recently, what patches did you think were needed? It looks like perhaps you could solve this by backporting c42ff3a1a7bf ("networkd: Track address configuration")
 and 289e6774d0da ("networkd: Use only a generic CONFIGURING state") - is that what you had in mind?

Revision history for this message
Dan Streetman (ddstreet) wrote :

> I think the backport just needs to add a check to not flow through to setting the
> routes until after we've gone through the process of setting the addresses; we
> can do that with the attached patch

yeah, maybe, I was thinking more along the lines of needing c42ff3a1a7bfea66dc4655096c79bd481159091b and maybe e4a71bf36f422c3728b902aaa5846add7bbc0eb9, and we might also need 2428613f854f46b6624199c2dc58d02617320133 to actually initialize our flags to false.

In short, the backport is much more complex than a quick patch; even if a 1-liner really is all that's needed, I need to look *very* closely at all 4 patches from bug 1812760 to make 100% that's the case.

Hence, removing those patches from the current systemd upload, which will fix this bug. Then, I can take more time in the original bug to evaluate the patches.

Revision history for this message
Dan Streetman (ddstreet) wrote :

With the patches from bug 1812670 reverted/removed, this should be now fixed in systemd at versions:

bionic-proposed: https://launchpad.net/ubuntu/+source/systemd/237-3ubuntu10.15

cosmic-proposed: https://launchpad.net/ubuntu/+source/systemd/239-7ubuntu10.10

Revision history for this message
Dan Streetman (ddstreet) wrote :

bug number typo in the last comment - i meant that the patches from bug 1812760 have been removed.

Changed in systemd (Ubuntu Bionic):
status: In Progress → Fix Released
Changed in systemd (Ubuntu Cosmic):
status: In Progress → Fix Released
Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Confirmed that 3ubuntu10.15 fixes the core-dump in systemd-networkd.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.