high packet loss on 18.04 with systemd
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| systemd (Ubuntu) |
Expired
|
Undecided
|
Unassigned | ||
Bug Description
Description: Ubuntu 18.04.4 LTS
Release: 18.04
systemctl --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-
Our issue is that we have a k8s (1.18, kube-router CNI) cluster comprised of a number of Ubuntu 16.04 nodes, we are in the process of upgrading the nodes to Ubuntu 18.04 - however after upgrading the first node and adding it back to the cluster we observed high packet loss from other cluster nodes to the 18.04 node as well as ping error messages when running a continuous ping from the 18.04 node to another node in the cluster, eg:
64 bytes from 10.8.11.1: icmp_seq=91 ttl=64 time=0.088 ms
64 bytes from 10.8.11.1: icmp_seq=92 ttl=64 time=0.076 ms
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
Another observation is that one of our daemonset pods (promtail) could not start.
After many days of troubleshooting various possibilities, changing network cable, NIC, trying a different server, checking/changing various sysctl values, I eventually tried switching from systemd-networkd to NetworkManager as the backend via netplan config. After rebooting the server with NetworkManager in place and re-adding the node back to the k8s cluster, the packet loss disappeared as did the ping errors and the daemonset pod (promtail) also started normally.
I would like to use Ubuntu's default of systemd-networkd to handle our NIC/route config - but cannot until I've figured out what in systemd-networkd is breaking.
Any help is much appreciated on this matter.
Thanks!
Luke

you should probably provide your network configuration