corosync fails to start in unprivileged containers - autopkgtest failure

Bug #1828228 reported by Dimitri John Ledkov on 2019-05-08
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Auto Package Testing
Undecided
Unassigned
corosync (Ubuntu)
High
Unassigned
pacemaker (Ubuntu)
High
Unassigned
pcs (Ubuntu)
Undecided
Unassigned

Bug Description

Currently pacemaker v2 fails to start in armhf containers (and by extension corosync too).

I found that it is reproducible locally, and that I had to bump a few limits to get it going.

Specifically I did:

1) bump memlock limits
2) bump rmem_max limits

= 1) Bump memlock limits =

I have no idea, which one of these finally worked, and/or is sufficient. A bit of a whack-a-mole.

cat >>/etc/security/limits.conf <<EOF
* soft memlock unlimited
* hard memlock unlimited
EOF

lxc config set nice-mako limits.kernel.memlock 33554432

mkdir -p /etc/systemd/system/snap.lxd.daemon.service.d/
cat >/etc/systemd/system/snap.lxd.daemon.service.d/override.conf <<EOF
[Service]
LimitMEMLOCK=6553600000
EOF
systemctl daemon-reload
systemctl restart snap.lxd.daemon.service

= 2) Bump rmem_max values =

Observed:
# strace -s99999 -f /usr/sbin/corosync 2>&1 | grep sockop
[pid 447] setsockopt(12, SOL_SOCKET, SO_RCVBUF, [8388608], 4) = 0
[pid 447] getsockopt(12, SOL_SOCKET, SO_RCVBUF, [425984], [4]) = 0
[pid 447] setsockopt(12, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)

Bumped mem_max using:
sudo sysctl -w net.core.wmem_max=8388608
sudo sysctl -w net.core.rmem_max=8388608

(Not sure if the desired sized depends on the machine/container I am running on)

Can we check the values for above things on our armhf containers and/or bump them? or like can we mark pacemaker v2.0 autopkgtest as ignored on armhf?

Related branches

Robie Basak (racb) wrote :

Am I right in thinking that the limits being too low are causing false positives in autopkgtests?

If so, we could check the limits in the test themselves and skip (exit 77 and declare "skippable") if on armhf and the limits aren't high enough. That's a reasonable action for the packages, I think.

Changed in corosync (Ubuntu):
status: New → Triaged
Changed in pacemaker (Ubuntu):
status: New → Triaged
Changed in corosync (Ubuntu):
importance: Undecided → Medium
Changed in pacemaker (Ubuntu):
importance: Undecided → Medium
tags: added: ubuntu-ha
Changed in corosync (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in pacemaker (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in pacemaker (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
tags: removed: ubuntu-ha

I assigned to myself to address comment #1 from Robie and try to bump needed values from the test itself. I'll test in an armhf environment just to make sure its good. This will unblock:

https://people.canonical.com/~ubuntu-archive/proposed-migration/update_excuses_by_team.html#ubuntu-server

pacemaker (1.1.18-2ubuntu1 to 2.0.1-4ubuntu1) in proposed for 56 days
- pacemaker/2.0.1-4ubuntu1: armhf (log, history)

And as soon as corosync is unblocked because of libknet1 MIR, we will be good for corosync and pacemaker.

Changed in pacemaker (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)

I flagged this as high as this is impacting pacemaker migration. After this being fixed, corosync (depending on libknet1 will still block migration, but thas has already been address in MIR https://bugs.launchpad.net/ubuntu/+source/kronosnet/+bug/1811139).

I'm working on this now.

Changed in corosync (Ubuntu):
importance: Medium → High
Changed in pacemaker (Ubuntu):
importance: Medium → High
Changed in corosync (Ubuntu):
status: Triaged → In Progress
Changed in pacemaker (Ubuntu):
status: Triaged → In Progress
Download full text (3.3 KiB)

Hello Dimitri,

I tried to reproduce the same behaviour using default LXC containers in real HW (ARMv8 - ARMHF containers) and wasn't able to.

Nevertheless, I was able to cause corosync not to start due to failed mlock() calls:

main.log:Jul 15 18:27:57 [2386] hasid01 corosync warning [MAIN ] main.c:corosync_mlockall:481 Could not lock memory of service to avoid page faults: Operation not permitted (1)
main.log:Jul 15 18:27:57 [2386] hasid01 corosync error [MAIN ] main.c:corosync_flock:1087 Corosync Executive couldn't create lock file.

when I made mlock soft/hard limit to be 0 for "hacluster/haclient" user/group like you said.

hacluster@hasid01:~$ strace -f /usr/sbin/corosync -f 2>&1 | grep -i mlock
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
mlockall(MCL_CURRENT|MCL_FUTURE) = -1 EPERM (Operation not permitted)

and both calls, prlimit64() and mlockall() failed with EPERM.

When testing with 1MB soft/hard limit:

hacluster@hasid01:~$ strace -f /usr/sbin/corosync -f 2>&1 | grep -i mlock
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
mlockall(MCL_CURRENT|MCL_FUTURE) = 0

only prlimit64() fails with EPERM.

It tries to set RLIMIT_MEMLOCK soft and hard limits to RLIM64_INFINITY, which is defined as:

#define RLIM64_INFINITY (~0ULL)

And it is, possibly, the "unlimited" value.

Since it failed with EPERM, checking return: EPERM = An unprivileged process tried to raise the hard limit; the CAP_SYS_RESOURCE capability is required to do this.

Looks like unless your container has "sys_resource" as lxc.cap.keep= value, AND you configure corosync to have CAP_SYS_RESOURCE enabled by default:

sudo setcap 'CAP_SYS_RESOURCE=+ep' /usr/sbin/corosync

the prlimit64() call will fail UNLESS you have unlimited value set for memlock, then it would work:

(c)inaddy@hasid01:~$ sudo su - hacluster
hacluster@hasid01:~$ ulimit -H -l
unlimited
hacluster@hasid01:~$ strace -f /usr/sbin/corosync -f 2>&1 | grep -i mlock
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = 0
mlockall(MCL_CURRENT|MCL_FUTURE) = 0

And, despite failing in other parts:

sched_setscheduler(0, SCHED_RR, [99]) = -1 EPERM (Operation not permitted)
setpriority(PRIO_PGRP, 0, -2147483648) = -1 EACCES (Permission denied)

It works:

(c)inaddy@hasid02:~$ sudo crm status
Stack: corosync
Current DC: hasid02 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Mon Jul 15 19:40:25 2019
Last change: Mon Jul 15 17:51:53 2019 by root via cibadmin on hasid01

3 nodes configured
0 resources configured

Node hasid01: pending
Online: [ hasid02 hasid03 ]

And

(c)inaddy@hasid02:~$ sudo corosync-quorumtool
Quorum information
------------------
Date: Mon Jul 15 19:40:41 2019
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 2
Ring ID: 1/136
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: ...

Read more...

Quick clarifications on next steps:

- corosync runs as root... so its unclear to me it would fail for prlimit64() inside a container if sys_resource is denied. for sure prlimit64() fails in 2 conditions: not root and no "cap_sys_resource" is configured for the binary (CAP_SYS_RESOURCE=+ep), which is not the case, and not root and ulimit for memlock is not unlimited, also not the case since corosync runs as root.

- i'm gonna test lxd defaults, since i was using vanilla lxc setup. intention is to check on sys_resource being default or not, and the impact of lacking sys_resource for root prlimit64() calls without memlock ulimit being unlimited if no sys_resource is set to container.

- will check anything else that might be stepping into our way.

Note that if this turns out to be challenging a "force-badtest" is likely to be acceptable to get the package migrated for now.

Thanks Robie, and I totally agree. I'll give a fast look in lxd cases and comment back here so we can take a decision.

This "bug" happens because of "unprivileged" containers:

root@corosync:~# corosync -f
Jul 20 21:26:32 notice [MAIN ] Corosync Cluster Engine 3.0.1 starting up
Jul 20 21:26:32 info [MAIN ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 20 21:26:32 warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Jul 20 21:26:32 warning [MAIN ] Could not set priority -2147483648: Permission denied (13)
Jul 20 21:26:32 notice [TOTEM ] Initializing transport (Kronosnet).
Jul 20 21:26:33 crit [TOTEM ] knet_handle_new failed: File name too long (36)
Jul 20 21:26:33 error [KNET ] transport: Failed to set socket buffer via force option 33: Operation not permitted
Jul 20 21:26:33 error [KNET ] transport: Unable to set local socketpair receive buffer: File name too long
Jul 20 21:26:33 error [KNET ] handle: Unable to initialize internal hostsockpair: File name too long
Jul 20 21:26:33 error [MAIN ] Can't initialize TOTEM layer
Jul 20 21:26:33 error [MAIN ] Corosync Cluster Engine exiting with status 15 at main.c:1529.

connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/fs/cgroup/cpu/cpu.rt_runtime_us", O_RDONLY) = -1 ENOENT (No such file or directory)
sched_setscheduler(0, SCHED_RR, [99]) = -1 EPERM (Operation not permitted)
setpriority(PRIO_PGRP, 0, -2147483648) = -1 EACCES (Permission denied)
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = -1 EPERM (Operation not permitted)
[pid 694] setsockopt(11, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
[pid 694] epoll_ctl(0, EPOLL_CTL_DEL, 11, 0xff968fb8) = -1 EINVAL (Invalid argument)
[pid 694] epoll_ctl(0, EPOLL_CTL_DEL, 0, 0xff968fb8) = -1 EINVAL (Invalid argument)
[pid 694] close(0) = -1 EBADF (Bad file descriptor)
[pid 694] close(0) = -1 EBADF (Bad file descriptor)
[pid 695] madvise(0xf6055000, 8368128, MADV_DONTNEED) = -1 EINVAL (Invalid argument)

----

I was able to reproduce the exact same issue by using lxd on armhf with unprivileged containers. And its pretty clear to check the issue by issuing:

root@corosync:~# ulimit -l unlimited
-bash: ulimit: max locked memory: cannot modify limit: Operation not permitted

as root and checking that "root" does not have "cap_sys_resource" capabilities. There is also the Kronosnet initialization failure because of low {r,w}mem_max values.

## unprivileged x64:

root@corosync:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Eoan Ermine (development branch)
Release: 19.10
Codename: eoan
root@corosync:~# uname -a
Linux corosync 5.0.0-21-generic #22-Ubuntu SMP Tue Jul 2 13:27:33 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

root@corosync:~# corosync -f
Jul 21 04:20:38 notice [MAIN ] Corosync Cluster Engine 3.0.1 starting up
Jul 21 04:20:38 info [MAIN ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 21 04:20:38 warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Jul 21 04:20:38 warning [MAIN ] Could not set priority -2147483648: Permission denied (13)
Jul 21 04:20:38 notice [TOTEM ] Initializing transport (Kronosnet).
Jul 21 04:20:38 crit [TOTEM ] knet_handle_new failed: Cannot allocate memory (12)
Jul 21 04:20:38 error [MAIN ] Can't initialize TOTEM layer
Jul 21 04:20:38 error [MAIN ] Corosync Cluster Engine exiting with status 15 at main.c:1529.

## unprivileged armhf

root@corosync:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Eoan Ermine (development branch)
Release: 19.10
Codename: eoan

root@corosync:~# uname -a
Linux corosync 5.0.0-21-generic #22-Ubuntu SMP Tue Jul 2 13:27:45 UTC 2019 armv8l armv8l armv8l GNU/Linux

root@corosync:~# corosync -f
Jul 21 04:21:35 notice [MAIN ] Corosync Cluster Engine 3.0.1 starting up
Jul 21 04:21:35 info [MAIN ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pierelro bindnow
Jul 21 04:21:35 warning [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Jul 21 04:21:35 warning [MAIN ] Could not set priority -2147483648: Permission denied (13)
Jul 21 04:21:35 notice [TOTEM ] Initializing transport (Kronosnet).
Jul 21 04:21:35 crit [TOTEM ] knet_handle_new failed: Resource temporarily unavailable (11)
Jul 21 04:21:35 error [KNET ] handle: Unable to allocate memory for link to datafd buffer: Resource temporarily unavailable
Jul 21 04:21:35 error [MAIN ] Can't initialize TOTEM layer
Jul 21 04:21:35 error [MAIN ] Corosync Cluster Engine exiting with status 15 at main.c:1529.

Somehow the lxd containers being used for autopkgtest are, likely, different. x64 seems to be running privileged containers for need_root tests, while armhf is not (orelse x64 selfpkgtests wouldn't pass either, like demonstrated in previous comment).

I'll suggest a hints-ubuntu test marking this as bad-test, but it seems that the environment is bad, and not the test.

no longer affects: pacemaker (Ubuntu)
Changed in corosync-qdevice (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)

corosync-qdevice autopkgtest is also failing because of the same reason (armhf architecture).

On Sun, Jul 21, 2019 at 04:24:08AM -0000, Rafael David Tinoco wrote:
> Somehow the lxd containers being used for autopkgtest are, likely,
> different. x64 seems to be running privileged containers for need_root
> tests, while armhf is not (orelse x64 selfpkgtests wouldn't pass either,
> like demonstrated in previous comment).

armhf is the only architecture that runs tests in containers. All other
architectures run them in VMs. (The only reason armhf doesn't use VMs is
because we can't deploy an armhf VM in openstack.)

> I'll suggest a hints-ubuntu test marking this as bad-test, but it seems
> that the environment is bad, and not the test.

Hard disagree. If the test can detect that it's running in an unprivileged
container, it should skip any tests which require privileges. If the tests
as a whole can't be run in an unprivileged container, then they should
declare Restrictions: isolation-machine instead of Restrictions:
isolation-container.

Changed in corosync (Ubuntu):
status: In Progress → Invalid
Changed in corosync-qdevice (Ubuntu):
status: In Progress → Invalid

Reopening per my preceding comment

Changed in corosync (Ubuntu):
status: Invalid → Triaged
Changed in corosync-qdevice (Ubuntu):
status: Invalid → Triaged
Changed in corosync (Ubuntu):
status: Triaged → In Progress
summary: - corosync fails to start in container (armhf) bump some limits
+ corosync fails to start in unprivileged containers - autopkgtest failure

Steve, you are right. I was preparing this comment few mins ago:

"""Speaking with Andreas we had the idea to just exit 2 (at least one test was skipped ret code) the test when running in a unprivileged environment. That can be easily tested by changing memlock size limit as root (need-root in test is needed) and checking for return error""" and it goes the same direction as you pointed.

I'll add isolation-machine and skip test if ulimit -H -l can't be done (since w/ need-root it will indicate a unprivileged namespace).

Tks!

Changed in auto-package-testing:
status: New → Invalid
Changed in corosync-qdevice (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
no longer affects: corosync-qdevice (Ubuntu)

Pacemaker also depends on corosync, =), and its autopkgtests can't run in armhf if in unprivileged container. Same change we did for corosync has to be done in pacemaker.

Changed in pacemaker (Ubuntu):
status: New → In Progress
importance: Undecided → High
Changed in pacemaker (Ubuntu):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in corosync (Ubuntu):
status: In Progress → Fix Released

This issue is fixed in both, pacemaker and corosync. Other regressions are being investigated at:

https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1838024

Changed in pacemaker (Ubuntu):
status: In Progress → Fix Released
Changed in pcs (Ubuntu):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
tags: added: update-excuse
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pcs - 0.10.4-3

---------------
pcs (0.10.4-3) unstable; urgency=medium

  [ Rafael David Tinoco ]
  * d/p/Fix-python-tornado-5.patch: bring back workaround that fixes
    python-tornado until v6 becomes available.
  * Skip autopkgtest for unprivileged containers (LP: #1828228)

  [ Valentin Vidic ]
  * d/patches: fix warnings in ruby testsuite
  * d/control: update Standards-Version to 4.5.0
  * d/tests: show verbose progress for python tests

 -- Valentin Vidic <email address hidden> Sun, 05 Apr 2020 19:40:03 +0200

Changed in pcs (Ubuntu):
status: In Progress → Fix Released
Changed in pcs (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in pacemaker (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in corosync (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers