corosync locks all its current and future memory
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
corosync (Ubuntu) |
Fix Released
|
Medium
|
Dan Streetman | ||
Bionic |
Fix Released
|
Medium
|
Dan Streetman | ||
Focal |
Fix Released
|
Medium
|
Dan Streetman | ||
Groovy |
Fix Released
|
Medium
|
Dan Streetman | ||
Hirsute |
Fix Released
|
Medium
|
Dan Streetman |
Bug Description
[impact]
as with several other programs, corosync appears to think it's special and needs to have all its memory permanently locked so nothing is ever swapped. Before it does so, it attempts to increase its rlimit to infinity, which is good, as otherwise future memory allocation attempts will fail once the application's memory usage reaches its rlimit.
Unfortunately, while it tries to increase its rlimit, it doesn't actually check if the setrlimit() succeeded. This results in the unfortunate situation of locking all future memory *without* an infinite rlimit, which is essentially guaranteed to cause corosync to fail allocating memory at some point in the future.
[test case]
this causes autopkgtest failures such as bug 1828228 due to crashing (on memory allocation failure). it can be reproduced in an unprivileged container but just starting and using corosync.
$ lxc launch ubuntu:groovy lp1911904-g
$ lxc config set lp1911904-g limits.
$ lxc stop lp1911904-g
$ lxc start lp1911904-g
$ lxc shell lp1911904-g
root@lp1911904-g:~# prlimit | grep MEMLOCK
MEMLOCK max locked-in-memory address space 64000000 64000000 bytes
root@lp1911904-g:~# apt install -y corosync
corosync will fail to start due to bug 1918735, so edit /etc/corosync/
root@lp1911904-g:~# pidof corosync
1153
root@lp1911904-g:~# prlimit -p 1153 | grep MEMLOCK
MEMLOCK max locked-in-memory address space 64000000 64000000 bytes
root@lp1911904-g:~# grep VmLck /proc/1153/status
VmLck: 36396 kB
note that memory is locked, but the rlimit is not raised to infinity
[regression potential]
any regression likely would involve failure to start corosync, or a memory allocation failure during operation.
[scope]
this is still broken upstream, so it needs to be fixed upstream as well as in all releases.
[other info]
this "worked" before due to systemd enforcing a very low rlimit. In this case, corosync's call to increase its rlimit still failed, but since the rlimit was so low, it was less than corosync's initial memory size, so the mlockall() call also failed, and corosync ignored that (just logged it) and continued just fine without any of its memory locked.
then, systemd's rlimit was bumped by bug 1830746. This allowed corosync (and several other applications that think they are 'special' and should never swap: bug 1890394 and bug 1890394) to continue to fail in increasing its rlimit, but now its mlockall() call succeeds. This results in corosync failing a short time later as its memory usage reaches its rlimit and its memory allocations all fail.
As should be entirely clear since corosync hasn't been able to lock its memory at all up until now but yet seemed to work fine, there is virtually no application out there that *actually* should lock all its memory (qemu it a notable exception when sriov is involved, where it *does* need to lock all memory).
Upstream corosync should completely remove the call to increase its rlimit and mlockall(). It's not needed and only causes problems. However if upstream corosync insists that corosync is special and needs to lock all memory, it *at least* needs to check setrlimit() and avoid mlockall() if it's unable to increase its rlimit.
this is also related to bug 1918735 and bug 1828228
Related branches
- Bryce Harrington (community): Approve
- git-ubuntu developers: Pending requested
-
Diff: 368 lines (+240/-9)7 files modifieddebian/changelog (+197/-0)
debian/control (+5/-2)
debian/patches/Make-the-example-config-valid.patch (+13/-5)
debian/patches/lp1918735/0001-allow_knet_handle_fallback_default_yes.patch (+22/-0)
debian/patches/series (+1/-0)
debian/tests/control (+1/-1)
debian/tests/quorumtool (+1/-1)
no longer affects: | auto-package-testing |
Changed in corosync (Ubuntu Hirsute): | |
assignee: | nobody → Dan Streetman (ddstreet) |
Changed in corosync (Ubuntu Groovy): | |
assignee: | nobody → Dan Streetman (ddstreet) |
Changed in corosync (Ubuntu Focal): | |
assignee: | nobody → Dan Streetman (ddstreet) |
Changed in corosync (Ubuntu Bionic): | |
assignee: | nobody → Dan Streetman (ddstreet) |
Changed in corosync (Ubuntu Groovy): | |
importance: | Undecided → Medium |
Changed in corosync (Ubuntu Hirsute): | |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in corosync (Ubuntu Groovy): | |
status: | New → In Progress |
Changed in corosync (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in corosync (Ubuntu Focal): | |
status: | New → In Progress |
importance: | Undecided → Medium |
Changed in corosync (Ubuntu Bionic): | |
importance: | Undecided → Medium |
description: | updated |
opened upstream PR https:/ /github. com/corosync/ corosync/ pull/620