Comment 9 for bug 1837580

Revision history for this message
Jeff Dileo (jtdileo) wrote :

This is currently an issue in 19.10's systemd (version 242). By default, unless services are configured to set LimitMEMLOCK, they will have 64k as their memlock limit (though oddly, systemd bumped its own memlock limit higher than previous versions have used). The only processes not affected are those that increase their own memlock rlimits at runtime, such as `systemd --user`.

```
# for pid in $(ps --ppid 1 | awk 'NR!=1 {print $1}'); do echo -n "${pid}: "; cat "/proc/${pid}/limits" | grep locked ; done
400: Max locked memory 65536 65536 bytes
480: Max locked memory 65536 65536 bytes
514: Max locked memory 65536 65536 bytes
559: Max locked memory 65536 65536 bytes
561: Max locked memory 65536 65536 bytes
596: Max locked memory 65536 65536 bytes
657: Max locked memory 65536 65536 bytes
658: Max locked memory 65536 65536 bytes
659: Max locked memory 65536 65536 bytes
661: Max locked memory 65536 65536 bytes
662: Max locked memory 65536 65536 bytes
665: Max locked memory 65536 65536 bytes
681: Max locked memory 65536 65536 bytes
685: Max locked memory 65536 65536 bytes
688: Max locked memory 65536 65536 bytes
704: Max locked memory 65536 65536 bytes
710: Max locked memory 65536 65536 bytes
711: Max locked memory 65536 65536 bytes
732: Max locked memory 65536 65536 bytes
939: Max locked memory 65536 65536 bytes
6673: Max locked memory 67108864 67108864 bytes
7310: Max locked memory 65536 65536 bytes
# ps aux | grep 6673
root 6673 0.0 0.8 18132 8348 ? Ss 00:07 0:00 /lib/systemd/systemd --user
root 10442 0.0 0.0 8020 864 pts/2 S+ 03:32 0:00 grep --color=auto 6673
```

This includes sshd, but the forked (still `sshd`) children of sshd appear to have their memlock limit increased. This results in direct shell operations under sshd having realistic limits. However, processes "kicked off" by an ssh shell session, but not actually originally parented under them, will have the austere 64k memlock limit. This is the case with docker (the ubuntu docker.io package) containers, as containerd's systemd configuration (/lib/systemd/system/containerd.service) does not set LimitMEMLOCK. And it should not have to.

Per this thread (https://twitter.com/ChaosDatumz/status/1198075570921394177), this is causing problems for eBPF related functionality running under docker due to the fact that the memlock limit is used to track eBPF maps and is tracked on the user, which is an issue because root in a non-user namespaced container is technically root on the outside, so on top of this paltry memlock limit, existing host processes running as root are counting towards the container's memlock limit. This likely has cascading effects for anything eBPF-related that isn't being started by a user's shell, but the user-based memlock accounting behavior will likely cause other issues for anything running in a container that performs such checks given that on a typical system, root host processes may well already have more than 64k in locked kernel memory allocated. I don't think the solution for this is just to special case containerd (or docker.io) with a configuration, but to fix this at its heart, systemd.