lxc fails autopkgtests on (pure) cgroups v2 enabled system

Bug #1943704 reported by Lukas Märdian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxc (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

lxc fails 4 autopkgtests if ran on a cgroups v2 enabled systemd (248.3-1ubuntu7) using a pure unified hierarchy (in favor of the hybrid hierarchy used before).

https://autopkgtest.ubuntu.com/packages/lxc

FAIL: lxc-tests: lxc-test-apparmor-mount (0s)
FAIL: lxc-tests: lxc-test-autostart (360s)
FAIL: lxc-tests: lxc-test-no-new-privs (361s)
FAIL: lxc-tests: lxc-test-unpriv (0s)

I needed to skip the "lxc-test-exit-code" test to avoid my local autopkgtest to hang but that seems to be working on the Ubuntu infrastructure, so its probably related to my local environment:
diff --git a/debian/tests/exercise b/debian/tests/exercise
index 4a22f33..70231ee 100755
--- a/debian/tests/exercise
+++ b/debian/tests/exercise
@@ -88,6 +88,10 @@ for testbin in lxc-test-*; do
     echo "${testbin}" | grep -qv "\.in$" || continue
     STRING="lxc-tests: $testbin"

+ # Skip some tests because for testing
+ [ "$testbin" = "lxc-test-exit-code" ] && \
+ ignore "$STRING" && continue
+
     # Some tests can't be run standalone
     [ "$testbin" = "lxc-test-may-control" ] && continue

Reproducer (while being connected to the Canonical VPN, or setup another squid proxy):
$ autopkgtest-buildvm-ubuntu-cloud -v -r impish
$ autopkgtest lxc -s -U --apt-pocket=proposed=src:systemd --env "http_proxy=http://squid.internal:3128" --env "https_proxy=http://squid.internal:3128" --env "no_proxy=127.0.0.1,127.0.1.1,localhost,localdomain,novalocal,internal,archive.ubuntu.com,security.ubuntu.com,ddebs.ubuntu.com,changelogs.ubuntu.com,launchpad.net,10.24.0.0/24" -- qemu autopkgtest-impish-amd64.img

I used "../lxc_4.0.10-0ubuntu4+wip0_amd64.changes" instead of the "lxc" SRCPKG name, to use a custom package, skipping the additional "lxc-test-exit-code" test.

Interestingly, the same set of tests fails if I run the test using the old (non cgroups v2) systemd (248.3-1ubuntu3), i.e. by leaving out the "--apt-pocket=proposed=src:systemd" parameter. Although, they fail in a slightly different way (see attached lxc-vs-old-systemd.log). Running a baseline test using the old systemd passed on the Ubuntu infrastructure. – I cannot really explain this infra-baseline vs local-autopkgtest difference... But it doesn't matter too much either, as we need to fix the situation for the new (cgroupv2) enabled systemd.

Logs (full logs attached):

FAIL: lxc-tests: lxc-test-apparmor-mount (0s)
---
/usr/sbin/deluser: The user `lxcunpriv' does not exist.
./lxc-test-apparmor-mount: 152: cannot create /sys/fs/cgroup/-.mount/lxctest/tasks: Permission denied
lxc-destroy: tmp.6hX6BylHCU: tools/lxc_destroy.c: main: 242 Container is not defined
umount: /sys/kernel/security/apparmor/features/mount: not mounted.
sed: can't read /run/lxc/nics: No such file or directory
---
=> "./lxc-test-apparmor-mount: 152: cannot create /sys/fs/cgroup/-.mount/lxctest/tasks: Permission denied" seems to be relevant/related to unified cgroup hierarchy here.
=> fails in a different way with old (non cgroup v2) systemd, locally

FAIL: lxc-tests: lxc-test-autostart (21s)
---
Setting up the GPG keyring
Downloading the image index
ERROR: Failed to download http://images.linuxcontainers.org//meta/1.0/index-system
lxc-create: lxc-test-auto: lxccontainer.c: create_run_template: 1621 Failed to create container from template
lxc-create: lxc-test-auto: tools/lxc_create.c: main: 319 Failed to create container lxc-test-auto
FAIL
---
=> fails in the same way with old (non cgroup v2) systemd, locally.

FAIL: lxc-tests: lxc-test-no-new-privs (22s)
---
+ DONE=0
+ trap cleanup EXIT SIGHUP SIGINT SIGTERM
+ '[' '!' -d /etc/lxc ']'
+ ARCH=i386
+ type dpkg
++ dpkg --print-architecture
+ ARCH=amd64
+ lxc-create -t download -n c1 -- -d ubuntu -r xenial -a amd64
Setting up the GPG keyring
Downloading the image index
ERROR: Failed to download http://images.linuxcontainers.org//meta/1.0/index-system
lxc-create: c1: lxccontainer.c: create_run_template: 1621 Failed to create container from template
lxc-create: c1: tools/lxc_create.c: main: 319 Failed to create container c1
+ cleanup
+ cd /
+ lxc-destroy -n c1 -f
lxc-destroy: c1: tools/lxc_destroy.c: main: 242 Container is not defined
+ true
+ '[' 0 -eq 0 ']'
+ echo FAIL
FAIL
+ exit 1
---
=> fails in the same way with old (non cgroup v2) systemd, locally.

FAIL: lxc-tests: lxc-test-unpriv (0s)
---
./lxc-test-unpriv: line 163: /sys/fs/cgroup/-.mount/lxctest/tasks: Permission denied
cat: /tmp/tmp.w4zIOZHyAA: No such file or directory
---
=> "./lxc-test-unpriv: line 163: /sys/fs/cgroup/-.mount/lxctest/tasks: Permission denied" seems to be relevant/related to unified cgroup hierarchy here.
=> fails in a different way with old (non cgroup v2) systemd, locally.

Revision history for this message
Lukas Märdian (slyon) wrote :
Revision history for this message
Lukas Märdian (slyon) wrote :
description: updated
Lukas Märdian (slyon)
tags: added: update-excuse
Revision history for this message
Lukas Märdian (slyon) wrote :

From a side-channel discussion:

<cbrauner> all the fully unprivileged tests need to be disabled on cgroup v2. You can't run fully unprivileged containers on pure cgroup2 layouts. The delegation model doesn't allow it. Not without systemd making a fully empty delegated cgroup hierarchy available.

<stgraber> I think I'd prefer if those tests could figure out they're running on a cgroup2 systemd system and then do the systemd-run wrapper around lxc-start (or whatever it is) as that's realistically what users of unpriv LXC would do. Something along those lines: `systemd-run --unit=myshell --user --scope -p "Delegate=yes" lxc-start`

Revision history for this message
Lukas Märdian (slyon) wrote :

Some more findings after testing with systemd-run:

FAIL: lxc-tests: lxc-test-autostart (360s)
FAIL: lxc-tests: lxc-test-no-new-privs (361s)

This two tests fail during the (local) autopkgtest run. But after logging into the local autopkgtest VM via its debug shell (--shell-fail|-s parameter) and executing them manually `sudo src/tests/lxc-test-autostart` / `sudo src/tests/lxc-test-no-new-privs` they pass just fine.
(Well I needed to adopt from a "xenial" to a "focal" container for the "no-new-privs" test as the xenial container will fail with some apt fetch errors during `apt update`, maybe due to xenial EOL?)
So this is probably some intermittent networking failure or wrong/different proxy environment settings. – But seems to be rather unrelated to cgroupsv2.

FAIL: lxc-tests: lxc-test-apparmor-mount (0s)
FAIL: lxc-tests: lxc-test-unpriv (0s)

The other two tests seem to need some more porting work to make them compatible with cgroupsv2, as they are still making use of some deprecated cgroupsv1 functionality, such as the `cgroup.clone_children` or `tasks` files (see: https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/cgroup-v2.rst#deprecated-v1-core-features).

So in order to unblock the systemd 248.3-1ubuntu7 release (in impish-proposed) we could move forward with cbrauner's suggestion of skipping these tests. Patch/Debdiff attached.
In the long run the tests should be fixed and made compatible with cgroupsv2, though.

tags: added: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxc - 1:4.0.10-0ubuntu5

---------------
lxc (1:4.0.10-0ubuntu5) impish; urgency=medium

  * d/t/exercise: Skip tests that are incompatible with cgroups v2
    (LP: #1943704)

 -- Lukas Märdian <email address hidden> Fri, 17 Sep 2021 15:00:26 +0200

Changed in lxc (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.