Unprivileged LXC containers don't work under systemd
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| systemd (Ubuntu) |
Medium
|
Martin Pitt |
Bug Description
With systemd 208, unprivileged containers stop working when running under systemd (working fine under upstart with cgmanager). Quoting Stephane Graber:
In this setup, things don't work nearly as well. On login I'm only
placed into the name=systemd cgroup and not in any of the others, which
means that unprivileged LXC isn't usable.
Martin suggested setting JoinControllers in /etc/systemd/
upon closer inspection, this isn't at all what we want. This setting is
used to tell systemd what controllers to co-mount, by default this is
set to cpu,cpuset (which caused the earlier cgmanager breakage).
Even though this option isn't helpful for what we want (i.e. setting the
list of cgroup controllers the first PID of a user session should be
added to), we should nonetheless set it to an empty string which should
instruct systemd not to co-mount any controller, therefore giving us a
more reliable behavior (identical to what we have in the upstart world
and unlikely to confuse lxc and other stuff doing direct cgroup access).
Additionally, we need to find an equivalent to our good old
"Controllers" logind.conf option, or re-introduce it or just patch
logind so that it will always join all the controllers (similar to what
the shim does).
== Actions ==
* Update systemd.conf to set JoinControllers to an empty value.
* Make it so new user sessions are joined to all the available
controllers by doing one of the following:
- Find the magic undocumented config variable
- Re-introduce the "Controllers" option in logind.conf
- Patch logind to have it always join all available controllers
tags: | added: systemd-boot |
Changed in systemd (Ubuntu): | |
status: | New → Triaged |
Martin Pitt (pitti) wrote : | #1 |
Martin Pitt (pitti) wrote : | #2 |
Asked upstream about this: http://
Changed in systemd (Ubuntu): | |
importance: | Undecided → Medium |
Martin Pitt (pitti) wrote : | #3 |
For my own notes: No hints from upstream; my current theory is that the best place to hook this in would be in src/core/service.c service_spawn(): After a successful exec_spawn(), if the unit is a *.scope, also put it into all other cgroup controlles (cg_create() and cg_attach()).
Changed in systemd (Ubuntu): | |
milestone: | none → ubuntu-14.12 |
assignee: | nobody → Martin Pitt (pitti) |
Martin Pitt (pitti) wrote : | #4 |
I created a per-user container "t1", and confirm that it does start under upstart/cgmanger and doesn't under systemd. I now have a preliminary patch for putting the user slices into all cgroup controllers, plus some hand-crafted "chown ubuntu" for all the user-1000.slice cgroup directories so that they become writable (this part still needs to be added to the patch). I understand that this should now be sufficient:
ubuntu@ulxc$ cat /proc/$$/cgroup
10:devices:
9:memory:
8:cpuset:/
7:hugetlb:
6:blkio:
5:cpu,cpuacct:
4:freezer:
3:perf_
2:net_cls,
1:name=
ubuntu@ulxc:~$ ls -ld /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 2 ubuntu root 0 Nov 26 10:41 /sys/fs/
drwxr-xr-x 4 root root 0 Nov 26 10:33 /sys/fs/
I'm not sure why my login shell isn't in "cpuset", I'll debug that still. But I chown'ed /sys/fs/
But still lxc-start fails:
$ lxc-start -n t1 -F
lxc-start: cgfs.c: lxc_cgroupfs_
lxc-start: cgfs.c: cgroup_rmdir: 207 Permission denied - cgroup_rmdir: failed to delete /sys/fs/
lxc-start: cgfs.c: cgroup_rmdir: 207 Permission denied - cgroup_rmdir: failed to delete /sys/fs/
lxc-start: cgfs.c: cgroup_rmdir: 207 Permission denied - cgroup_rmdir: failed to delete /sys/fs/
lxc-start: cgfs.c: cgroup_rmdir: 207 Permission denied - cgroup_rmdir: failed to delete /sys/fs/
lxc-start: cgfs.c: cgroup_rmdir: 207 Read-only file system - cgroup_rmdir: failed to delete /sys/fs/
lxc-start: cgfs.c: cgroup_rmdir: 207 Permission denied -...
no longer affects: | lxc (Ubuntu) |
Martin Pitt (pitti) wrote : | #5 |
Ah, nevermind; it wanted to write /sys/fs/
Changed in systemd (Ubuntu): | |
status: | Triaged → In Progress |
Martin Pitt (pitti) wrote : | #6 |
Got it working now, with the patch set on http://
Martin Pitt (pitti) wrote : | #7 |
The above patches are included in https:/
Martin Pitt (pitti) wrote : | #8 |
Changed in systemd (Ubuntu): | |
status: | In Progress → Fix Committed |
Launchpad Janitor (janitor) wrote : | #9 |
This bug was fixed in the package systemd - 217-2ubuntu1
---------------
systemd (217-2ubuntu1) vivid; urgency=medium
* Merge with Debian unstable. See 217-1ubuntu1 for remaining Ubuntu changes.
* Put session scopes into all cgroup controllers instead of their parent
user slices. This works better with killing sessions and is consistent
with the "systemd" controller.
* Do not realize and migrate cgroups multiple times, in particular
"-.slice". This fixes PIDs in non-systemd cgroup controllers to be
randomly migrated back to /. (LP: #1346734)
* boot-and-services autopkgtest: Give test apparmor job some time to
actually finish.
systemd (217-2) experimental; urgency=medium
* Re-enable journal forwarding to syslog, until Debian's sysloggers
can/do all read from the journal directly.
* Fix hostnamectl exit code on success.
* Fix "diff failed with error code 1" spew with systemd-delta.
(Closes: #771397)
* Re-enable systemd-resolved. This wasn't meant to break the entire
networkd, just disable the new NSS module. Remove that one manually
instead. (Closes: #771423, LP: #1397361)
* Import v217-stable patches (up to commit bfb4c47 from 2014-11-07).
* Disable AppArmor again. This first requires moving libapparmor to /lib
(see #771667). (Closes: #771652)
* systemd.bug-script: Capture stderr of systemd-
(Closes: #771498)
-- Martin Pitt <email address hidden> Mon, 01 Dec 2014 17:17:30 +0100
Changed in systemd (Ubuntu): | |
status: | Fix Committed → Fix Released |
I have an unprivileged container setup in my test VM now, and they continue to work with 208. However, LXC under systemd currently requires some work (bug 1312532 and bug 1350947), so this should land first so that system-level containers work under systemd. Then I'll look into the cgroups issue.
Stéphane, can I check this without LXC somehow? I think my session processes already are in all cgroups:
$ cat /proc/$$/cgroup net_prio: / systemd: /user.slice/ user-1000. slice/session- c2.scope
10:hugetlb:/
9:perf_event:/
8:blkio:/
7:net_cls,
6:freezer:/
5:devices:/
4:memory:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=
$ grep $$ /sys/fs/ cgroup/ */cgroup. procs cgroup/ blkio/cgroup. procs:2898 cgroup/ cpuacct/ cgroup. procs:2898 cgroup/ cpu/cgroup. procs:2898 cgroup/ cpu,cpuacct/ cgroup. procs:2898 cgroup/ cpuset/ cgroup. procs:2898 cgroup/ devices/ cgroup. procs:2898 cgroup/ freezer/ cgroup. procs:2898 cgroup/ hugetlb/ cgroup. procs:2898 cgroup/ memory/ cgroup. procs:2898 cgroup/ net_cls/ cgroup. procs:2898 cgroup/ net_cls, net_prio/ cgroup. procs:2898 cgroup/ net_prio/ cgroup. procs:2898 cgroup/ perf_event/ cgroup. procs:2898
/sys/fs/
/sys/fs/
/sys/fs/
/sys/fs/
/sys/fs/
/sys/fs/
/sys/fs/
/sys/fs/
/sys/fs/
/sys/fs/
/sys/fs/
/sys/fs/
/sys/fs/
Or do I misunderstand this?