mounts cgroups unconditionally which causes undesired effects with cpu hotplug

Bug #1392176 reported by bugproxy
40
This bug affects 6 people
Affects Status Importance Assigned to Milestone
cgmanager (Ubuntu)
Fix Released
Medium
Unassigned
linux (Ubuntu)
Fix Released
Medium
Unassigned
systemd (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

== Comment: #0 - Preeti U. Murthy <email address hidden> - 2014-10-20 04:40:12 ==
---Problem Description---
Systemd mounts cgroups explicitly every boot. Since the user had no say in it, undesired consequences are observed in reaction to cpu hotplug operations. Here is how.

Systemd moves the tasks to the cgroup mounted by it. This cgroup automatically becomes the child of the root cgroup which is present by default. The children cgroups are not expected to remember their configured cpusets after hotplug operations in the kernel. Hence when cpus are taken offline and brought back online they are no longer used for load balancing of tasks and hence remain unused.
   This is an undesired consequence because the user had not even asked for cgroups to be mounted, yet is not able to use the full capacity of the system.

Only when the user himself creates cgroup hierarchies, should he be exposed to the side effects of cpu hotplug on cpusets. Else all online cpus must be made available to him which is not happening since systemd mounts cgroups on every boot.

Hence please revert this feature or provide an explaination as to why this is being done.

---uname output---
Linux tul181p1 3.16.0-18-generic #25-Ubuntu SMP Fri Sep 26 02:39:53 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = Tuleta 8286-42A
 ---Debugger---
A debugger was configured, however the system did not enter into the debugger

---Steps to Reproduce---
 $ taskset -p $$
$ 0-127
$ echo 0 > /sys/devices/system/cpu/cpu7/online
$ taskset -p $$
$ 0-6,8-127
$ echo 1 > /sys/devices/system/cpu/cpu7/online
$ taskset -p $$
$ 0-6,8-127

Userspace tool common name: systemd

The userspace tool has the following bit modes: 64-bit

Userspace rpm: systemd_208-8ubuntu8_ppc64el.deb

Userspace tool obtained from project website: 208-8ubuntu8

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-117800 severity-medium targetmilestone-inin---
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1392176/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → systemd (Ubuntu)
Revision history for this message
Martin Pitt (pitti) wrote :

systemd (in the sense of pid 1) doesn't do that. I. e. if you boot with init=/bin/systemd the only cgroup controller it puts tasks into (by default) is the "systemd" one, for that very reason. But if you boot with upstart (Ubuntu's default still), cgmanager creates cgroups. cgmanager puts tasks into *all* controllers (including "cpu"); as far as I know, this is so that user LXC containers work. So from cgmanager's POV this might be a design decision which can't otherwise be accomplished with the current kernel, but I'll let the cgmanager maintainers decide about whether this is a "wontfix" or whether there is a more elegant way to make user containers work.

summary: - Systemd mounts cgroups unconditionally which causes undesired effects
- with cpu hotplug
+ mounts cgroups unconditionally which causes undesired effects with cpu
+ hotplug
affects: systemd (Ubuntu) → cgmanager (Ubuntu)
Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1392176] Re: mounts cgroups unconditionally which causes undesired effects with cpu hotplug

I'm definately open to making this more flexible.

The queestion is how best to allow the configuration. We could add a
/etc/cgmanager.conf, or we could do it through command line options
specified in /etc/default/cgmanager

Changed in cgmanager (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Preeti (preeti) wrote :

Hi,

Is there any update on this front?

We are seeing the effect of this bug in several places on IBM PowerPC platforms
and would like to see it resolved soon. Can the cgroup mounting be made *only*
when the user explicitly asks for it?

Regards
Preeti U Murthy

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@preeti,

if it suffices for you to just not run cgmanager at all, then you can just disable it by doing

echo manual | sudo tee /etc/init/cgmanager.override

Is it specifically only the cpuset cgroup which you do not want mounted on your systems?

We could add a '-M' option to cgmanager so that "-M cpuset" would mean do not mount the cpuset controller. I would not however want that set by default, so the question is where would be the best place to specify it? It sounds like you would nee dit set for all powerpc platforms, or is this only on a specific cloud you control?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hold on, the actual mounting of the fs is not the problem, right? It's the movement of tasks into groups on login? So this should perhaps be fixed in systemd-shim instead?

Revision history for this message
Martin Pitt (pitti) wrote :

Serge Hallyn [2015-02-03 17:01 -0000]:
> Hold on, the actual mounting of the fs is not the problem, right? It's
> the movement of tasks into groups on login? So this should perhaps be
> fixed in systemd-shim instead?

Note that upstream systemd does not touch any cgroups other than
"systemd". We specifically do that in Ubuntu (with both cgmanager and
systemd itself) to support user LXC containers, which will fail if
they can't put the containers into all controllers.

Revision history for this message
Preeti (preeti) wrote :

The movement of existing tasks to the child cgroups created by cgmanager/systemd must be avoided as far as I can see.
If the additional cgroups are for LXC containers, the containers and the tasks spawned within them alone can reside under the children cgroups. Why move the existing tasks into them, when they are not going to benefit from it ? If this can be done there would be no need to avoid having to mount cpuset controllers; they can very well be there.

Regards
Preeti U Murthy

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2014-12-19 02:40 EDT-------
(In reply to comment #6)
> I'm definately open to making this more flexible.
>
> The queestion is how best to allow the configuration. We could add a
> /etc/cgmanager.conf, or we could do it through command line options
> specified in /etc/default/cgmanager

Would you be able to give some background on why cgroups are mounted
in the first place? This is so that we have some clarity on this front. I understand
that it is done for LXC containers, but why so?

So if you can make this cgroup mounting tunable, what would be the default?
It would be best if cgroups are not mounted after boot up and the user
explicitly asks for this if required.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

It seems that what you really want is for, when a cpu is on-lined, for all or some tasks to have that cpu automatically added to their cpuset? Would that suffice?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-02-06 12:26 EDT-------
Yes(In reply to comment #11)
> It seems that what you really want is for, when a cpu is on-lined, for all
> or some tasks to have that cpu automatically added to their cpuset? Would
> that suffice?

Yes, for those tasks which had the offlined cpu in their cpusets before hotplug,
the cpu should be added back to their respective cpusets when it comes online.

Regards
Preeti U Murthy

Revision history for this message
Preeti (preeti) wrote :

Yes, for those tasks which had the offlined cpu in their cpusets before hotplug,
the cpu should be added back to their respective cpusets when it comes online.

Regards
Preeti U Murthy

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cgmanager - 0.35-1ubuntu1

---------------
cgmanager (0.35-1ubuntu1) vivid; urgency=medium

  * 0001-implement-M-to-support-skip-mounting-certain-control.patch:
    This doesn't change the default, so may not suffice for powerpc,
    but at least offers a workaround. (LP: #1392176)
 -- Serge Hallyn <email address hidden> Tue, 10 Feb 2015 13:57:03 -0600

Changed in cgmanager (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Preeti (preeti) wrote :

The patch does not work for us.
We tried out the below test:

 root@ubuntu1504:~# cat /etc/issue
 Ubuntu Vivid Vervet (development branch) \n \l
 root@ubuntu1504:~# cgmanager --version
 cgmanager 0.36
 root@ubuntu1504:~# uname -a
 Linux ubuntu1504 3.18.0-13-generic #14-Ubuntu SMP Fri Feb 6 09:57:41 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
 root@ubuntu1504:~#
 root@ubuntu1504:/sys/devices/system/cpu# ps aux | grep bash
 root 955 0.2 0.3 8576 7104 hvc0 S 01:52 0:00 -bash
 root 1001 0.0 0.1 4992 3072 hvc0 S+ 01:53 0:00 grep --color=auto bash
 root@ubuntu1504:/sys/devices/system/cpu# taskset -p 955
 pid 955's current affinity mask: ffff
 root@ubuntu1504:/sys/devices/system/cpu# echo 0 > cpu15/online
 root@ubuntu1504:/sys/devices/system/cpu# taskset -p 955
 pid 955's current affinity mask: 7fff
 root@ubuntu1504:/sys/devices/system/cpu# echo 1 > cpu15/online
 root@ubuntu1504:/sys/devices/system/cpu# taskset -p 955
 pid 955's current affinity mask: 7fff

You can see that the cpumask of the task does not have the cpu that we got back online
and we are using the version of cgmanager that has the fix in.

Can you explain what the patch does ? This will help us figure out why its not working.
On another note, to make things clearer, there are two requirements to take care of:

a. When a cpu goes offline and comes back online, the cpuset of tasks must get updated both times to reflect the online mask
b. When a cpu in the possible_mask is brought online anytime after bootup, that too should get updated in the cpuset of the tasks.

Regards
Preeti U Murthy

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1392176] Re: mounts cgroups unconditionally which causes undesired effects with cpu hotplug

> Can you explain what the patch does ? This will help us figure out why its not working.

The pach is: implement -M to support skip mounting certain controllers

So you need to start cgmanager with "-M cpuset" to get the behavior you
are looking for.

The changelog entry said:

 This doesn't change the default, so may not suffice for powerpc,
 but at least offers a workaround. (LP: #1392176)

So it is a starting point and gives you a workaround. It seesm to me
that this is something that should be configurable in the kernel.

It also seems worthwhile for cgmanager to watch for cpu hotplug events
and do something when it gets those. But exactly what it should do
and how this is best implemented is not clear to me.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-03-23 17:37 EDT-------
Making my own comment external:

Preeti, is this fixed upstream with the default hierarchy and the "effective_cpus" file?

be4c9dd7aee5ecf3e748da68c27b38bdca70d444

e2b9a3d7d8f4ab2f3491b8ed2ac6af692a2269b2

It seems like with the new default hierarchy upstream and the effective_cpus file, we now will be able to distinguish between configured cpuset and effective cpusets, which is the root cause of this bug, afaict.

Does 15.04 ship with the legacy hierarchy on by default, I'm assuming it does to minimize regressions? Sort of annoying to have a cgmanager flag that only should apply if legacy is in-use?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1392176] Comment bridged from LTC Bugzilla

> Does 15.04 ship with the legacy hierarchy on by default, I'm assuming it
> does to minimize regressions? Sort of annoying to have a cgmanager flag
> that only should apply if legacy is in-use?

Legacy will be in use for a long time, because the unified hierarchy
breaks a great deal of existing software. Had unified hierarchy tried
harder to be backward compatible, its adoption would be much faster.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-03-23 18:25 EDT-------
(In reply to comment #19)
> > Does 15.04 ship with the legacy hierarchy on by default, I'm assuming it
> > does to minimize regressions? Sort of annoying to have a cgmanager flag
> > that only should apply if legacy is in-use?
>
> Legacy will be in use for a long time, because the unified hierarchy
> breaks a great deal of existing software. Had unified hierarchy tried
> harder to be backward compatible, its adoption would be much faster.

Yep, so I've noticed. I just wanted clarity on the upstream status -- as there won't be any attempt to fix the underlying issue upstream for legacy.

I think you've answered my question(s) indirectly, though, thanks!

-Nish

Revision history for this message
Bharata B Rao (bharata-rao) wrote :

Serge,

What's the recommended way to start cgmanager with -M cpuset ? I added cgmanager_opts="-M cpuset" in /etc/default/cgmanager but it still starts w/o -M option.

root@ubuntu1504:~# ps aux | grep cgm
root 624 0.0 0.1 4096 3264 ? Ss 04:58 0:00 /sbin/cgmanager -m name=systemd

root@ubuntu1504:~# cat /etc/default/cgmanager
cgmanager_opts="-M cpuset"

I stopped the service myself and start it manually

root@ubuntu1504:~# service cgmanager stop
root@ubuntu1504:~# /sbin/cgmanager -m name=systemd -M cpuset &

root@ubuntu1504:~# ps aux | grep cgm
root 863 0.0 0.1 4096 3328 pts/0 S 05:00 0:00 /sbin/cgmanager -m name=systemd -M cpuset

Now if I offline and online a CPU and try to tasket a process to that CPU, it fails. Expected ?

root@ubuntu1504:~# cgmanager --version
cgmanager 0.36

Revision history for this message
Bharata B Rao (bharata-rao) wrote :

Any update on comment #19 ?

Also any plans to get CPU hotplug work seamlessly ? I see that CPU hotplug is affected by this bug, haven't been able to use -M cpuset option to verify if that helps CPU hotplug case too.

Revision history for this message
Breno Leitão (breno-leitao) wrote :

It seems this fixed was reverted when cgmanager was upgraded to 0.36.

Looking into cgmager for Ubuntu vivid, I don't see the patch 0001-implement-M-to-support-skip-mounting-certain-control.patch anymore, also, cgmanager is not being loaded with -M option, as showed:

ubuntu@ubuntu1504:~/source/cgmanager-0.36$ ls debian/patches/ -la
total 40
drwxrwxr-x 2 ubuntu ubuntu 4096 Mar 23 22:32 .
drwxrwxr-x 5 ubuntu ubuntu 4096 Mar 31 09:42 ..
-rw-rw-r-- 1 ubuntu ubuntu 1690 Feb 13 15:32 0001-pivot_root-bind-mount-the-old-rather-than-starting-w.patch
-rw-rw-r-- 1 ubuntu ubuntu 1742 Feb 13 15:32 0002-bind-mount-run-from-host-into-cgmanager-s-fs-as-well.patch
-rw-rw-r-- 1 ubuntu ubuntu 842 Mar 23 17:20 0004-prune_from_string-handle-a-corner-case.patch
-rw-rw-r-- 1 ubuntu ubuntu 863 Mar 23 19:21 0005-Fix-the-last-commit.patch
-rw-rw-r-- 1 ubuntu ubuntu 7286 Mar 23 22:29 0006-cgmanager-make-exception-for-proxys-placed-in-system.patch
-rw-rw-r-- 1 ubuntu ubuntu 1196 Mar 10 12:20 fix-tests-on-systemd
-rw-rw-r-- 1 ubuntu ubuntu 294 Mar 23 22:27 series

After the package is installed, I see:
$ ps aux | grep cgmanager
root 2347 0.0 0.0 4288 3392 ? Ss 09:39 0:00 /sbin/cgmanager -m name=systemd

So, I understand that this bug should be reopened

Revision history for this message
John Paul Adrian Glaubitz (glaubitz) wrote :

> Looking into cgmager for Ubuntu vivid,

Correct me if I'm wrong, but the sole reason why cgmanager was conceived was to have something to manage CGroups when systemd is not running. And, as Ubuntu 15.04 (vivid), is settling on systemd by default, you can just uninstall cgmanager as systemd does all the necessary CGroups management.

Unless there is something I am overlooking here?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1392176] Re: mounts cgroups unconditionally which causes undesired effects with cpu hotplug

Quoting Bharata B Rao (<email address hidden>):
> Serge,
>
> What's the recommended way to start cgmanager with -M cpuset ? I added

When running systemd you must edit the file /lib/systemd/system/cgmanager.service
so that the ExecStart line reads

ExecStart=/sbin/cgmanager -m name=systemd -M cpuset

After making that change you may need to do

sudo systemctl daemon-reload

and then you may need to

sudo mount -o remount,rw /sys/fs/cgroup

to allow cgmanager to create a new socket for itself.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Breno Leitão (<email address hidden>):
> It seems this fixed was reverted when cgmanager was upgraded to 0.36.

No, I've verified that this still works in 15.04. The patch
"implement -M to support skip mounting certain controllers"
is a part of the 0.36 release.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@glaubits,

cgmanager is not for managing but for delegating cgroups, which systemd does not yet provide. (I'd like to work toward that with the systemd community) When that is not needed then cgmanager can indeed be removed - however cpusets still end up being mounted by systemd itself.

@bharata-rao

My reading of kernel/cpusets.c comments is that the new cpuset.effective_cpus is supposed to give you what you want. If you've written 0-64 into cpuset.cpus, and some cpus are removed, then cpuset.cpus won't be changed, only cpuset.effective_cpus. When you plug those cpus back in, then they should show back up in cpuset.effective_cpus.

I dont' have any hardware to test on, and couldn't get libvirt setvcpus to do this for me, but could you please test on a ubuntu 15.04 host (which should have a new enough kernel to have cpuset.effective_cpus)?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-04-06 09:26 EDT-------
On the legacy hierarchy, cpuset.cpus changes with hotplug. It does not on the unified/default hierarchy. The issue arises because cpuset.cpus changes in the legacy hierarchy and the effective cpus is equivalent to it.

Regards
Preeti U Murthy

Revision history for this message
Breno Leitão (breno-leitao) wrote :

I understand that this problem is still not fixed yet. It should be reopened:

# taskset -p $$
pid 2523's current affinity mask: ff

# echo 0 > /sys/devices/system/cpu/cpu7/online

# taskset -p $$
pid 12787's current affinity mask: 7f

# echo 1 > /sys/devices/system/cpu/cpu7/online

# cat /sys/devices/system/cpu/cpu7/online
1

# taskset -p $$
pid 12787's current affinity mask: 7f

So, it seems that the mask doesn't get back to ff after the CPU is back online.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hope had been that the kernel's new support for cpuset.effective_cpus would fix this. Removing a cpu from a parent cgroup or offlining a cpu would remove it from effective_cpus, but not from cpuset.cpus. Apparently that's not the case (kernel 3.19.0-10-generic was used for the test in comment #27)

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Changed in cgmanager (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
Nish Aravamudan (nacc-p) wrote :

Serge,

That should only be true for unified hierarchy. In legacy hierarchy, effective_cpus follows cpus, I think?

Revision history for this message
Nish Aravamudan (nacc-p) wrote :

Breno,

Was your test done with a cgmanager with the -M flag passed?

Revision history for this message
Breno Leitão (breno-leitao) wrote :

Yes Nish, take a look at the full example:

root@ubuntu1504:/sys/fs/cgroup/cpuset# cat cpuset.cpus ; cat user.slice/cpuset.cpus
0-7
0-7
root@ubuntu1504:/sys/fs/cgroup/cpuset# echo 0 > /sys/devices/system/cpu/cpu7/online
root@ubuntu1504:/sys/fs/cgroup/cpuset# cat cpuset.cpus ; cat user.slice/cpuset.cpus
0-6
0-6
root@ubuntu1504:/sys/fs/cgroup/cpuset# echo 1 > /sys/devices/system/cpu/cpu7/online
root@ubuntu1504:/sys/fs/cgroup/cpuset# cat cpuset.cpus ; cat user.slice/cpuset.cpus
0-7
0-6
root@ubuntu1504:/sys/fs/cgroup/cpuset# ps aux | grep cgmanager
root 5761 0.0 0.0 5120 3072 pts/1 S+ 10:35 0:00 grep --color=auto cgmanager
root 28368 0.0 0.0 4288 3392 ? Ss 10:31 0:00 /sbin/cgmanager -m name=systemd -M cpuset

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

When doing that check, please show the results also of

cgm listcontrollers
sudo cat /proc/$(pidof cgmanager)/mountinfo
cat /proc/self/mountinfo

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(In particular I'm looking to confirm that cgmanager didn't mount cpuset)

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-04-07 15:56 EDT-------
(In reply to comment #33)
> Yes Nish, take a look at the full example:
>
> root@ubuntu1504:/sys/fs/cgroup/cpuset# cat cpuset.cpus ; cat
> user.slice/cpuset.cpus
> 0-7
> 0-7
> root@ubuntu1504:/sys/fs/cgroup/cpuset# echo 0 >
> /sys/devices/system/cpu/cpu7/online
> root@ubuntu1504:/sys/fs/cgroup/cpuset# cat cpuset.cpus ; cat
> user.slice/cpuset.cpus
> 0-6
> 0-6
> root@ubuntu1504:/sys/fs/cgroup/cpuset# echo 1 >
> /sys/devices/system/cpu/cpu7/online
> root@ubuntu1504:/sys/fs/cgroup/cpuset# cat cpuset.cpus ; cat
> user.slice/cpuset.cpus
> 0-7
> 0-6
> root@ubuntu1504:/sys/fs/cgroup/cpuset# ps aux | grep cgmanager
> root 5761 0.0 0.0 5120 3072 pts/1 S+ 10:35 0:00 grep
> --color=auto cgmanager
> root 28368 0.0 0.0 4288 3392 ? Ss 10:31 0:00
> /sbin/cgmanager -m name=systemd -M cpuset

I *think* you'd need to have cgmanager's configuration file be correct at boot-time, and have started your system fresh.

The workaround provided by Serge is to simply not mount the cpuset cgroup.

So if you have /sys/fs/cgroup/cpuset (or really, `mount | grep cpuset`, as you can mount it wherever you want) upon boot, then the workaround is not working. Perhaps something else is mounting cpuset.

But I'm a bit worried, doesn't not mounting cpuset mean that containers, for instance, wouldn't work so well?

That is, even if cgmanager doesn't mount the cpuset cgroup, if *anything* mounts it, processes in that cgroup tree will experience the underlying issue, no?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1392176] Comment bridged from LTC Bugzilla

> But I'm a bit worried, doesn't not mounting cpuset mean that containers,
> for instance, wouldn't work so well?

You just won't be able to lock containers to cpusets.

> That is, even if cgmanager doesn't mount the cpuset cgroup, if
> *anything* mounts it, processes in that cgroup tree will experience the
> underlying issue, no?

Yes.

And I still think that systemd is currently mounting it regardless
of cgmanager.

So ideally the effective_cpus thing would be fixed to work for
non-unified hierarchies.

Revision history for this message
Breno Leitão (breno-leitao) wrote :
Download full text (5.1 KiB)

Serge,

This is the output of what you have request.

root@ubuntu1504:/sys/fs/cgroup/cpuset# cat cpuset.cpus ; cat user.slice/cpuset.cpus
0-7
0-6

root@ubuntu1504:/sys/fs/cgroup/cpuset# cgm listcontrollers
blkio
cpu,cpuacct
devices
freezer
hugetlb
memory
net_cls,net_prio
perf_event
name=systemd

root@ubuntu1504:/sys/fs/cgroup/cpuset# ps aux | grep cgmanager
root 28368 0.0 0.1 5120 4352 ? Ss Apr07 0:00 /sbin/cgmanager -m name=systemd -M cpuset

root@ubuntu1504:/sys/fs/cgroup/cpuset# cat /proc/28368/mountinfo
55 88 0:36 / /run/cgmanager/fs rw,relatime - tmpfs cgmfs rw,size=128k,mode=755
73 87 0:4 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
74 73 0:31 / /proc/sys/fs/binfmt_misc rw,relatime - autofs systemd-1 rw,fd=21,pgrp=1,timeout=300,minproto=5,maxproto=5,direct
76 55 0:23 / /run/cgmanager/fs/blkio rw,relatime - cgroup blkio rw,blkio
77 55 0:25 / /run/cgmanager/fs/cpu rw,relatime - cgroup cpu rw,cpu,cpuacct
78 55 0:25 / /run/cgmanager/fs/cpuacct rw,relatime - cgroup cpuacct rw,cpu,cpuacct
79 55 0:24 / /run/cgmanager/fs/devices rw,relatime - cgroup devices rw,devices
80 55 0:30 / /run/cgmanager/fs/freezer rw,relatime - cgroup freezer rw,freezer
81 55 0:29 / /run/cgmanager/fs/hugetlb rw,relatime - cgroup hugetlb rw,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb
82 55 0:26 / /run/cgmanager/fs/memory rw,relatime - cgroup memory rw,memory
83 55 0:22 / /run/cgmanager/fs/net_cls rw,relatime - cgroup net_cls rw,net_cls,net_prio
84 55 0:22 / /run/cgmanager/fs/net_prio rw,relatime - cgroup net_prio rw,net_cls,net_prio
85 55 0:28 / /run/cgmanager/fs/perf_event rw,relatime - cgroup perf_event rw,perf_event,release_agent=/run/cgmanager/agents/cgm-release-agent.perf_event
86 55 0:20 / /run/cgmanager/fs/none,name=systemd rw,relatime - cgroup none,name=systemd rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd
87 45 253:2 / / rw,relatime - ext4 /dev/disk/by-uuid/1ce3e5ed-71cf-4682-91f5-261804741e81 rw,errors=remount-ro,data=ordered
88 87 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs rw,size=403904k,mode=755

root@ubuntu1504:/sys/fs/cgroup/cpuset# cat /proc/self/mountinfo
16 21 0:15 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw
17 21 0:4 / /proc rw,nosuid,nodev,noexec,relatime shared:12 - proc proc rw
18 21 0:6 / /dev rw,relatime shared:2 - devtmpfs udev rw,size=1972160k,nr_inodes=30815,mode=755
19 18 0:13 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
20 21 0:16 / /run rw,nosuid,noexec,relatime shared:5 - tmpfs tmpfs rw,size=403904k,mode=755
21 0 253:2 / / rw,relatime shared:1 - ext4 /dev/disk/by-uuid/1ce3e5ed-71cf-4682-91f5-261804741e81 rw,errors=remount-ro,data=ordered
22 16 0:11 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:8 - securityfs securityfs rw
23 18 0:17 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw
24 20 0:18 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs tmpfs rw,size=5120k
25 16 0:19 / /sys/fs/cgroup rw shared:9 - tmpfs tmpfs rw,mode=755
26 25 0:20 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,xattr,release_agent=/lib/syst...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-04-09 02:55 EDT-------
(In reply to comment #36)
> > But I'm a bit worried, doesn't not mounting cpuset mean that containers,
> > for instance, wouldn't work so well?
>
> You just won't be able to lock containers to cpusets.
>
> > That is, even if cgmanager doesn't mount the cpuset cgroup, if
> > *anything* mounts it, processes in that cgroup tree will experience the
> > underlying issue, no?
>
> Yes.
>
> And I still think that systemd is currently mounting it regardless
> of cgmanager.
>
> So ideally the effective_cpus thing would be fixed to work for
> non-unified hierarchies.

Ok, so given the situation, I suggest the following:

Fixing this in the kernel will be an ugly hack. Moreover,
userspace must take care of updating cpusets after hotplug
operations. Therefore I see two ways forward:

1. Can systemd/cgmanager (whoever is mounting cgroups) mount
cpuset controllers under the unified hierarchy, while mounting the
rest under the legacy hierarchy? Here is the suggestion from the
community: https://lkml.org/lkml/2015/4/6/196.

2. Systemd/cgmanager must have a daemon listening to hotplug
events. On hotplug, the parent cgroups cpuset must be percolated
down to the children. This is a better solution because the situation
where cpus are hotplugged in for the first time (i.e from the
cpu_possible_mask to cpu_online_mask), will be handled too.

Can either of the above be done in systemd/cgmanager ?

Regards
Preeti U Murthy

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-04-09 09:58 EDT-------
*** Bug 121220 has been marked as a duplicate of this bug. ***

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Preeti,

(I replied to the lkml thread, but a separate comment here)

this sort of functionality was always intended as the next step of cgmanager or whatever-cgmanager-became functionality. Now (as soon as 15.04 is released) is the right time to discuss where to do it.

I do believe systemd is the right place for it (just as I hope all of cgmanager's functionality can move to systemd at some point)

However while a quick short-term daemon could be written in a few days, it'll probably take at least one 6-month cycle to properly place and implement this functionality.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Note that when i start up a vivid vm without cgmanager installed, cpuset is still mounted and login sessions get a cpuset cgroup:

ubuntu@v1:~$ dpkg -l | grep cgmanager
ubuntu@v1:~$ cat /proc/self/cgroup
10:perf_event:/user.slice/user-1000.slice/session-1.scope
9:freezer:/user.slice/user-1000.slice/session-1.scope
8:memory:/user.slice/user-1000.slice/session-1.scope
7:hugetlb:/user.slice/user-1000.slice/session-1.scope
6:net_cls,net_prio:/user.slice/user-1000.slice/session-1.scope
5:cpu,cpuacct:/user.slice/user-1000.slice/session-1.scope
4:blkio:/user.slice/user-1000.slice/session-1.scope
3:devices:/user.slice/user-1000.slice/session-1.scope
2:cpuset:/user.slice/user-1000.slice/session-1.scope
1:name=systemd:/user.slice/user-1000.slice/session-1.scope

And I do think that's the right thing to do. We simply need the daemon.

@Preeti,

will you be available during the next set of UOS (http://summit.ubuntu.com/uos-1505/)
to discuss a good design? Issues include:

. Where to ship the code
. How to configure defaults and exceptions
. What all needs to be handled (cpusets, memory, hugetlb?)

Revision history for this message
Martin Pitt (pitti) wrote :

> Note that when i start up a vivid vm without cgmanager installed, cpuset is still mounted and login sessions get a cpuset cgroup:
> 2:cpuset:/user.slice/user-1000.slice/session-1.scope

Note that this is by request of Stèphane, it's an ubuntu specific patch to make user LXC containers work under systemd. I didn't follow the discussion here in depth and I don't know much about the cgroup internals -- I just wanted to say let me know if the above is unintended and systemd should stop configuring the cpuset controller for user sessions (then user LXC would need to get some adjustments for that too, though)

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-04-10 06:05 EDT-------
(In reply to comment #43)
> Note that when i start up a vivid vm without cgmanager installed, cpuset is
> still mounted and login sessions get a cpuset cgroup:
>
> ubuntu@v1:~$ dpkg -l | grep cgmanager
> ubuntu@v1:~$ cat /proc/self/cgroup
> 10:perf_event:/user.slice/user-1000.slice/session-1.scope
> 9:freezer:/user.slice/user-1000.slice/session-1.scope
> 8:memory:/user.slice/user-1000.slice/session-1.scope
> 7:hugetlb:/user.slice/user-1000.slice/session-1.scope
> 6:net_cls,net_prio:/user.slice/user-1000.slice/session-1.scope
> 5:cpu,cpuacct:/user.slice/user-1000.slice/session-1.scope
> 4:blkio:/user.slice/user-1000.slice/session-1.scope
> 3:devices:/user.slice/user-1000.slice/session-1.scope
> 2:cpuset:/user.slice/user-1000.slice/session-1.scope
> 1:name=systemd:/user.slice/user-1000.slice/session-1.scope
>
> And I do think that's the right thing to do. We simply need the daemon.
>
> @Preeti,
>
> will you be available during the next set of UOS
> (http://summit.ubuntu.com/uos-1505/)
> to discuss a good design? Issues include:
>
> . Where to ship the code
> . How to configure defaults and exceptions
> . What all needs to be handled (cpusets, memory, hugetlb?)

From what I can make out, the sessions are in May ? In which case
I will be happy to discuss.

Regards
Preeti U Murthy

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1392176] Re: mounts cgroups unconditionally which causes undesired effects with cpu hotplug

Quoting Martin Pitt (<email address hidden>):
> > Note that when i start up a vivid vm without cgmanager installed, cpuset is still mounted and login sessions get a cpuset cgroup:
> > 2:cpuset:/user.slice/user-1000.slice/session-1.scope
>
> Note that this is by request of Stèphane, it's an ubuntu specific patch
> to make user LXC containers work under systemd. I didn't follow the
> discussion here in depth and I don't know much about the cgroup
> internals -- I just wanted to say let me know if the above is unintended
> and systemd should stop configuring the cpuset controller for user
> sessions (then user LXC would need to get some adjustments for that too,
> though)

Cpusets are not *required* for lxc. Perhaps we should in fact default
to only providing name=systemd, devices and freezer cgroups for users?
We'd want to very widely advertise how to enable other cgroups.

Currently lxc would fail this way, but we could teach it to ignore
inability to create cgroups which aren't required. (This isn't as
simple as it seems, since using the keyword "all" for controllers
would no longer work, but it's doable)

Revision history for this message
Martin Pitt (pitti) wrote :

Serge Hallyn [2015-04-17 17:49 -0000]:
> Cpusets are not *required* for lxc. Perhaps we should in fact default
> to only providing name=systemd, devices and freezer cgroups for users?
> We'd want to very widely advertise how to enable other cgroups.

Right, I mostly understood it so that we need to create all those
controllers in the host that the container workload can *potentially*
use all these cgroups as well, not that they are inherently required.

> Currently lxc would fail this way, but we could teach it to ignore
> inability to create cgroups which aren't required. (This isn't as
> simple as it seems, since using the keyword "all" for controllers
> would no longer work, but it's doable)

"all" could still try to join all controllers, but ignore the ones it
doesn't have permissions for?

Maybe also (1) a new weak version of "all" which implements that
behaviour, or (2) a new strong version which will fail if it cannot
join any controller. TBH I don't know which way around would break
backwards compat less: (1) requires changing all existing container
configs on upgrade once we stop putting the user session into all
controllers, and (2) might break existing container workloads which
actually expect the dropped controllers.

My gut feeling is that (2) is the better option.

Martin
--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Revision history for this message
Martin Pitt (pitti) wrote :

Setting systemd task to incomplete for now. Please let me know how we want the cgroups set up for user sessions, and I'll change our patch accordingly. Thanks!

Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-04-20 03:20 EDT-------
Hi,

We want cgroups to be mounted *without* the cpuset controller.

From your conversation I could make out the following:

1. LXC does not have a hard requirement on cpusets. But the challenge in not mounting
cpusets would be to teach LXC to identify that all controllers may not be mounted when it
requests for cgroups.

2. If LXC can identify this, when any container workload asks for cpusets, LXC must fail
and ask the user to mount cpusets by himself.

3. But the concern is about workloads that expect cpusets to be mounted implicitly.
If this is the case, then this is clearly not the way forward.

Is it possible to survey the existing workloads to verify this? Because if there are no
such workloads, mounting cgroups without cpusets is the simplest way to address
the problem.

Another approach is the right one, that being having a cgroup hotplug daemon,
which listens on udev events for cpu hotplug operations and update the allowed
cpus and mems mask. Such a daemon must be implemented by the service
which mounts cgroups, which is systemd in this case ? This will take longer to implement ?

Regards
Preeti U Murthy

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1392176] Re: mounts cgroups unconditionally which causes undesired effects with cpu hotplug

Quoting Martin Pitt (<email address hidden>):
> Serge Hallyn [2015-04-17 17:49 -0000]:
> > Cpusets are not *required* for lxc. Perhaps we should in fact default
> > to only providing name=systemd, devices and freezer cgroups for users?
> > We'd want to very widely advertise how to enable other cgroups.
>
> Right, I mostly understood it so that we need to create all those
> controllers in the host that the container workload can *potentially*
> use all these cgroups as well, not that they are inherently required.
>
> > Currently lxc would fail this way, but we could teach it to ignore
> > inability to create cgroups which aren't required. (This isn't as
> > simple as it seems, since using the keyword "all" for controllers
> > would no longer work, but it's doable)
>
> "all" could still try to join all controllers, but ignore the ones it
> doesn't have permissions for?
>
> Maybe also (1) a new weak version of "all" which implements that
> behaviour, or (2) a new strong version which will fail if it cannot
> join any controller. TBH I don't know which way around would break
> backwards compat less: (1) requires changing all existing container
> configs on upgrade once we stop putting the user session into all
> controllers, and (2) might break existing container workloads which
> actually expect the dropped controllers.
>
> My gut feeling is that (2) is the better option.

I agree. (the same will be needed for create)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Martin Pitt (<email address hidden>):
> Setting systemd task to incomplete for now. Please let me know how we
> want the cgroups set up for user sessions, and I'll change our patch
> accordingly. Thanks!

I'll discuss this with Stéphane next week.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1392176] Comment bridged from LTC Bugzilla

Quoting bugproxy (<email address hidden>):
> ------- Comment From <email address hidden> 2015-04-20 03:20 EDT-------
> Hi,
>
> We want cgroups to be mounted *without* the cpuset controller.
>
> >From your conversation I could make out the following:
>
> 1. LXC does not have a hard requirement on cpusets. But the challenge in not mounting
> cpusets would be to teach LXC to identify that all controllers may not be mounted when it
> requests for cgroups.
>
> 2. If LXC can identify this, when any container workload asks for cpusets, LXC must fail
> and ask the user to mount cpusets by himself.
>
> 3. But the concern is about workloads that expect cpusets to be mounted implicitly.
> If this is the case, then this is clearly not the way forward.
>
> Is it possible to survey the existing workloads to verify this?

Implement the change and look for breakages :)

I'm still not convinced that we don't want to make the change only for
powerpc systemx - x86 systems AFAIK don't hotplug like drunken sailors.

> Because if there are no
> such workloads, mounting cgroups without cpusets is the simplest way to address
> the problem.
>
> Another approach is the right one, that being having a cgroup hotplug daemon,
> which listens on udev events for cpu hotplug operations and update the allowed
> cpus and mems mask. Such a daemon must be implemented by the service
> which mounts cgroups, which is systemd in this case ? This will take longer to implement ?

It'll require lots of discussion. If it turns out that upstream is
happy with the feature, it could actually happen very quickly.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-06-16 04:50 EDT-------
Hi,

An update on this:

We are looking at solving this issue in either of the following two ways:

1. Have a config option where user specifies the controllers to mount.
2. Have the patch that mounts cgroups for containers in systemd-shim,
rather than systemd.

Regards
Preeti U Murthy

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-07-12 06:07 EDT-------
*** Bug 127595 has been marked as a duplicate of this bug. ***

bugproxy (bugproxy)
tags: added: targetmilestone-inin1510
removed: targetmilestone-inin---
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

cgmanager (in git, not yet packaged) can now build a pam module which could be used in place of our systemd patch to move tasks into cgroups upon login. This would allow simple configuration of the cgroup controllers to be used. I'm waiting on feedback in private email about whether or not we want to go that route.

Revision history for this message
Benjamin Drung (bdrung) wrote :

> I'm still not convinced that we don't want to make the change only for
> powerpc systemx - x86 systems AFAIK don't hotplug like drunken sailors.

We do on amd64. We run Ubuntu as virtual machine guests and do allow hot-plugging CPUs. We do not have cgmanager installed.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1392176] Re: mounts cgroups unconditionally which causes undesired effects with cpu hotplug

Quoting Benjamin Drung (<email address hidden>):
> > I'm still not convinced that we don't want to make the change only for
> > powerpc systemx - x86 systems AFAIK don't hotplug like drunken sailors.
>
> We do on amd64. We run Ubuntu as virtual machine guests and do allow
> hot-plugging CPUs. We do not have cgmanager installed.

The default pam_cgm config line only enables freezer, not cpusets,
so I think this is now moot. (Well, will be in the next cycle when
we switch from the systemd patch to using libpam-cgm)

Revision history for this message
Ali (asaidi) wrote :

Serge,

Does the issue being moot apply to wiley or 16.04?

Thanks,
Ali

Revision history for this message
Matt Dirba (5qxm) wrote :

FYI: My use case for hot plugging my x86 system like a drunken sailor is to evaluate the amount of CPUs required to complete a given task before I schedule it to run on other potentially CPU bound machines.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

No - this being moot does not apply to wily.

Actually the xenial work has been delayed so it does not *yet* apply there either.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Sqxm - thanks for that input.

For what it's worth you should be able to use ppa:serge-hallyn/systemd in xenial to get cpusets not created by default. Unfortunately I need to make some more changes (in particular to use the systemd-created cgroups when they exist) before pushing this to the archive.

Changed in cgmanager (Ubuntu):
status: Confirmed → Fix Released
Changed in systemd (Ubuntu):
status: Incomplete → Fix Released
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-06-17 06:01 EDT-------
(In reply to comment #63)
>
> @Sqxm - thanks for that input.
>
> For what it's worth you should be able to use ppa:serge-hallyn/systemd in
> xenial to get cpusets not created by default. Unfortunately I need to make
> some more changes (in particular to use the systemd-created cgroups when
> they exist) before pushing this to the archive.

Serge,
Have these fixes covered LXC cases, like docker and KVM?

If I understand correctly, you mentioned 2 fixes:
- one for cgmanager with libpam-cgm
- another for systemd.

Thanks,
- Simon

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

"LXC cases, like docker and KVM" - did you mean non-lxc cases?

xenial by default should now be using libpam-cgfs, should not be using cgmanager, and should not be creating cpusets.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-06-18 09:49 EDT-------
(In reply to comment #77)
> "LXC cases, like docker and KVM" - did you mean non-lxc cases?
>
> xenial by default should now be using libpam-cgfs, should not be using
> cgmanager, and should not be creating cpusets.

Thanks for the info. However what I cares is the docker/KVM case. For example, when I created docker on xenial 16.04, the container process will still be added into /sys/fs/cgroup/cpuset/docker; When creating KVM, the KVM process will be added into sub cpuset cgroup as well. As a result, the issue can still impact those tasks.

For the above, is it already covered(on higher xenial version) or does it need more consideration?

Regards,
- Simon

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I recommend opening new bugs against libvirt and docker. Libvirt moves VMS into a cpuset by default. I assume docker does the same. (My xenial laptop runs upstart, so this is not systemd's doing)

Revision history for this message
bugproxy (bugproxy) wrote :

sudo cat /proc//mountinfo

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-07-18 15:08 EDT-------
.

bugproxy (bugproxy)
tags: added: targetmilestone-inin1604
removed: targetmilestone-inin1510
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.