high packet loss on 18.04 with systemd

Bug #1886826 reported by Luke Alexander
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Description: Ubuntu 18.04.4 LTS
Release: 18.04

systemctl --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid

Our issue is that we have a k8s (1.18, kube-router CNI) cluster comprised of a number of Ubuntu 16.04 nodes, we are in the process of upgrading the nodes to Ubuntu 18.04 - however after upgrading the first node and adding it back to the cluster we observed high packet loss from other cluster nodes to the 18.04 node as well as ping error messages when running a continuous ping from the 18.04 node to another node in the cluster, eg:

64 bytes from 10.8.11.1: icmp_seq=91 ttl=64 time=0.088 ms
64 bytes from 10.8.11.1: icmp_seq=92 ttl=64 time=0.076 ms
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available

Another observation is that one of our daemonset pods (promtail) could not start.

After many days of troubleshooting various possibilities, changing network cable, NIC, trying a different server, checking/changing various sysctl values, I eventually tried switching from systemd-networkd to NetworkManager as the backend via netplan config. After rebooting the server with NetworkManager in place and re-adding the node back to the k8s cluster, the packet loss disappeared as did the ping errors and the daemonset pod (promtail) also started normally.

I would like to use Ubuntu's default of systemd-networkd to handle our NIC/route config - but cannot until I've figured out what in systemd-networkd is breaking.

Any help is much appreciated on this matter.

Thanks!
Luke

Revision history for this message
Dan Streetman (ddstreet) wrote :

you should probably provide your network configuration

Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
Luke Alexander (lukealexander) wrote :

Yeah, here is an example of the netplan config, though I have tried with 3 different servers, 2 with very similar configs as below and one with bonded interfaces (all 3 have the same packet loss) - all 3 had the same/equivalent configs and worked without packet loss when running Ubuntu 16.04 - or when using netplan but with the renderer set to NetworkManger instead of networkd

network:
  version: 2
  renderer: networkd
  ethernets:
    ens15f0:
      dhcp4: false
      dhcp6: false
      addresses:
        - 10.0.0.1/24
      gateway4: 10.0.0.254
      nameservers:
        addresses:
          - 8.8.8.8
          - 1.1.1.1
    ens15f1:
      dhcp4: false
      dhcp6: false
      addresses:
        - 192.168.1.12/16

Revision history for this message
Luke Alexander (lukealexander) wrote :

An update on this: it appears that this is NOT a bug in systemd-networkd - I managed to trigger the same behaviour in NetworkManager too, it just seemed to take a little longer.

I have also tested a number of kernels (4.15, 4.21, 5.0, 5.1, 5.2, 5.3, 5.4) and Ubuntu 20.04 but each results in connection issues between the cluster and the new node.

I'm unsure what my next steps should be, maybe trying Debian or Centos. Any help to troubleshoot this problem is greatly appreciated!

summary: - high packet loss on 18.04 with systemd-networkd
+ high packet loss on 18.04 with systemd
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for systemd (Ubuntu) because there has been no activity for 60 days.]

Changed in systemd (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.