Network stops working after systemd update

Bug #1824806 reported by Matt Walters
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Undecided
Unassigned

Bug Description

I have four 18.04 servers that are regularly updated and have a bond/bridge configured via netplan. Every so often the interfaces themselves will have "speed changed to 0" and the server will be unresponsive over network. The servers are still usable via the console, though. I think I can confidently associate it with systemd updates. If I'm manually updating the servers, e.g., `apt dist-upgrade`, and there's a system update... the ssh session becomes unresponsive when updating systemd, timing of automatic security updates seems to coincide with it happening as well. Rebooting fixes the issue until the next systemd update. I've included some log excerpts and one of my netplan configs below. Thanks!

syslog:
`Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: using nameserver 127.0.0.53#53
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: enp4s0f3: Lost carrier
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: enp4s0f3: IPv6 successfully disabled
Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: reading /etc/resolv.conf
Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: using nameserver 127.0.0.53#53
Apr 9 06:39:01 stn-vm-host kernel: [1058497.661292] igb 0000:04:00.3 enp4s0f3: speed changed to 0 for port enp4s0f3
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: enp4s0f2: Lost carrier
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: enp4s0f2: IPv6 successfully disabled
Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: reading /etc/resolv.conf
Apr 9 06:39:01 stn-vm-host systemd[1]: Started Network Time Synchronization.
Apr 9 06:39:01 stn-vm-host systemd[1]: Stopped Flush Journal to Persistent Storage.
Apr 9 06:39:01 stn-vm-host systemd[1]: Stopping Flush Journal to Persistent Storage...
Apr 9 06:39:01 stn-vm-host kernel: [1058497.732487] systemd[1]: Stopping Journal Service...
Apr 9 06:39:01 stn-vm-host kernel: [1058497.732903] systemd-journald[672]: Received SIGTERM from PID 1 (systemd).
Apr 9 06:39:01 stn-vm-host kernel: [1058497.823634] igb 0000:04:00.2 enp4s0f2: speed changed to 0 for port enp4s0f2
Apr 9 06:39:01 stn-vm-host kernel: [1058497.833718] systemd[1]: Stopped Journal Service.
Apr 9 06:39:01 stn-vm-host kernel: [1058497.837652] systemd[1]: Starting Journal Service...
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: enp4s0f1: Lost carrier
Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: using nameserver 127.0.0.53#53
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: enp4s0f1: IPv6 successfully disabled
Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: reading /etc/resolv.conf
Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: using nameserver 127.0.0.53#53
Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: reading /etc/resolv.conf
Apr 9 06:39:01 stn-vm-host systemd[1]: Starting Flush Journal to Persistent Storage...
Apr 9 06:39:01 stn-vm-host kernel: [1058497.912976] systemd[1]: Started Journal Service.
Apr 9 06:39:01 stn-vm-host systemd[1]: Started Flush Journal to Persistent Storage.
Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: using nameserver 127.0.0.53#53
Apr 9 06:39:01 stn-vm-host kernel: [1058497.971223] igb 0000:04:00.1 enp4s0f1: speed changed to 0 for port enp4s0f1
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: enp4s0f0: Lost carrier
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: enp4s0f0: IPv6 successfully disabled
Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: reading /etc/resolv.conf
Apr 9 06:39:01 stn-vm-host dnsmasq[2557]: using nameserver 127.0.0.53#53
Apr 9 06:39:01 stn-vm-host kernel: [1058498.109063] igb 0000:04:00.0 enp4s0f0: speed changed to 0 for port enp4s0f0
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: bond0: Gained carrier
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: bond0: Configured
Apr 9 06:39:01 stn-vm-host systemd-networkd[13409]: br0: Configured`

Apt history, noting the systemd update just before the interfaces when down
`Start-Date: 2019-04-09 06:38:55
Commandline: /usr/bin/unattended-upgrade
Upgrade: libsystemd0:amd64 (237-3ubuntu10.15, 237-3ubuntu10.19), libpam-systemd:amd64 (237-3ubuntu10.15, 237-3ubuntu10.19), systemd:amd64 (237-3ubuntu10.15, 237-3ubuntu10.19), libnss-systemd:amd64 (237-3ubuntu10.15, 237-3ubuntu10.19)
End-Date: 2019-04-09 06:39:06`

The netplan config on this server:
`network:
        ethernets:
                enp4s0f0:
                        addresses: []
                        dhcp4: false
                        dhcp6: false
                enp4s0f1:
                        addresses: []
                        dhcp4: false
                        dhcp6: false
                enp4s0f2:
                        addresses: []
                        dhcp4: false
                        dhcp6: false
                enp4s0f3:
                        addresses: []
                        dhcp4: false
                        dhcp6: false
        bonds:
                bond0:
                        dhcp4: false
                        dhcp6: false
                        interfaces:
                        - enp4s0f0
                        - enp4s0f1
                        - enp4s0f2
                        - enp4s0f3
                        parameters:
                                lacp-rate: slow
                                mode: 802.3ad
                                transmit-hash-policy: layer2
        bridges:
                br0:
                        interfaces: [bond0]
                        dhcp4: no
                        dhcp6: no
                        addresses: [10.0.0.1/16]
                        gateway4: 10.0.0.10
                        nameservers:
                                addresses: [10.0.0.10,8.8.8.8]
                                search:
                                - lan
        version: 2`

Revision history for this message
Dan Streetman (ddstreet) wrote :

do you notice similar (problematic) behavior when just restarting networkd, e.g.:

$ sudo systemctl restart systemd-networkd

or, stopping systemd-networkd, e.g.:

$ sudo systemctl stop systemd-networkd

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Matt Walters (matt.walters) wrote :

Dan, thanks for the comment. I will test this evening and let you know. Sorry for the delay!

Revision history for this message
Sean Bright (sbright) wrote :

Not the reporter but having a similar issue. When the systemd package is updated my bond dies (with the "speed changed to 0 for port" message). Nothing that I have tried to restore the network connection worked short of a full reboot. I have attached my syslog showing what occurs during the upgrade.

I also see the following during this time:

  bond1: Warning: No 802.3ad response from the link partner for any adapters in the bond

The LACP configuration on the switch is identical for all servers with bonds, but the only ones having this problem are the ones running bionic (trusty & xenial are unaffected).

I will try to find some time to see if I can reproduce with a simple 'systemctl restart systemd-networkd'

Revision history for this message
Sean Bright (sbright) wrote :

Sorry - realized I needed to redact some stuff in the previous attachment.

Revision history for this message
Sean Bright (sbright) wrote :

And my netplan config:

network:
  version: 2
  renderer: networkd
  ethernets:
    eno1:
      dhcp4: no
    eno2:
      dhcp4: no
  bonds:
    bond1:
      addresses: [10.100.10.111/24]
      gateway4: 10.100.10.1
      nameservers:
        addresses: [192.168.168.168, 192.168.168.169]
      interfaces: [eno1, eno2]
      parameters:
        mode: 802.3ad
        lacp-rate: fast
        transmit-hash-policy: layer2+3

Revision history for this message
Sean Bright (sbright) wrote :

So I tested...

$ sudo systemctl restart systemd-networkd

... and got the same results - the "speed changed to 0 for port" messages and loss of connectivity requiring a reboot.

Afterwards, however, I noticed that I was not completely up to date so I ran an `apt full-upgrade` which brought in the latest systemd packages (237-3ubuntu10.20). During the upgrade I again saw the "speed changed to 0 for port" message and I had to reboot in order to restore connectivity.

Now when I run either test (`systemctl restart systemd-networkd` or `systemctl stop systemd-networkd`), the message does not appear and I do not lose connectivity.

To rule out that it only appears to occur during an upgrade, I noticed that systemd has an update in proposed (237-3ubuntu10.21), so I enabled that repository, did an `apt update` followed by an `apt full-upgrade` and the install went perfectly. So this may very well have been fixed by 237-3ubuntu10.20, but I don't want to speak for the original reporter.

Side note: after running `systemctl stop systemd-networkd` the systemd-networkd process disappears from my process list, but the network does not get brought down. I don't know if that is intended behavior, but in either case it is not directly related to this report.

Revision history for this message
Dan Streetman (ddstreet) wrote :

> So this may very well have been fixed by 237-3ubuntu10.20, but I don't want
> to speak for the original reporter

that's why I asked, because 10.20 included patches to prevent systemd from removing all addresses/routes that it has config for, during startup. So I hoped that would fix your issue as well, which it sounds like it has.

Revision history for this message
Dan Streetman (ddstreet) wrote :

> after running `systemctl stop systemd-networkd` the systemd-networkd process disappears
> from my process list, but the network does not get brought down. I don't know if that
> is intended behavior, but in either case it is not directly related to this report.

that is systemd-networkd intended behavior.

Changed in systemd (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Dan Streetman (ddstreet) wrote :

I marked this as 'fix released' because i believe (see last comments) the latest systemd upgrade included patches to fix this - but if it's still reproducable with the latest systemd release please feel free to comment and move this back to Confirmed.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers