Comment 2 for bug 1780348

Dmitrii Shcherbakov (dmitriis) wrote :

Just to add some more context.

The default ARP-related sysctl settings come from the kernel defaults (described here http://man7.org/linux/man-pages/man7/arp.7.html).

net.ipv4.neigh.default.gc_interval = 30
net.ipv4.neigh.default.gc_stale_time = 60
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

As soon as gc_thresh3 is hit MAC learning stops and it is up to gc to clear stale entries if it can (it doesn't delete static entries, for example):
http://kernel.ubuntu.com/git/ubuntu/ubuntu-bionic.git/tree/net/core/neighbour.c#n314
   net_info_ratelimited("%s: neighbor table overflow!\n",
          tbl->id);

A hash table used for ARP table lookups in the kernel grows with the amount of neighbor table entries:

https://kernel.ubuntu.com/git/ubuntu/ubuntu-bionic.git/tree/net/core/neighbour.c?id=aa07f7dcb959603e1e6d56db7281b1d36bce9928#n395

https://kernel.ubuntu.com/git/ubuntu/ubuntu-bionic.git/tree/net/core/neighbour.c?id=aa07f7dcb959603e1e6d56db7281b1d36bce9928#n532

 if (atomic_read(&tbl->entries) > (1 << nht->hash_shift))
  nht = neigh_hash_grow(tbl, nht->hash_shift + 1);

* ARP table thresholds are not namespaced and can be modified on a per-system (kernel) basis. While ARP table entries have namespace affinity (`ip neigh` returns only entries relevant to a particular namespace), they share the same storage (the same global kernel neighbor table). So it is important to tune the global thresholds as we have many namespaces with their own contents (fip, qrouter, snat, dhcp);
* ARP table entries for floating IPs are only added to the source hypervisor host's FIP namespace ARP table (not to the destination hypervisor host's FIP namespace ARP table unless you ping a FIP from that namespace specifically);
* ARP table entries for remote DVR ports are added to destination FIP namespaces when ARP responses for FIPs are made (i.e. there may be as many entries as there are hypervisor hosts in the extreme case where VMs on every other hypervisor ping a FIP on one specific hypervisor);
* ARP table size will matter if you have a lot of east-west FIP to FIP communication.

Upstream kernel discussions around per-namespace tables and bumping up default limits:
https://lkml.org/lkml/2018/7/17/550

Example cloudinit-userdata config to enable this via a sysctl drop-in:

juju model-config userdata-sysctl-conf.yaml

$ cat userdata-sysctl-conf.yaml
cloudinit-userdata: |
  write_files:
  - content: |
      net.ipv4.neigh.default.gc_thresh1 = 16384
      net.ipv4.neigh.default.gc_thresh2 = 28672
      net.ipv4.neigh.default.gc_thresh3 = 32768
    owner: "root:root"
    path: /etc/sysctl.d/network-tuning.conf
    permissions: '0644'