ubuntu 22.04 net_cls cgroup mounted over cgroup2 prevents LXD containers from starting

Bug #1971571 reported by brian mullan
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
lxd (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

I used LXD a lot.

I had 4 Ubuntu 20.04 systems I wanted to upgrade to 22.04.

Upgrading the 1st two systems (both laptops) was successful BUT .. after the upgrade I could LXD could create a new Ubuntu 22.04 or Ubuntu 20.04 container but the containers could not "start".

I spent a lot of time trying to figure out why and then gave up and just decided wipe those 2 laptops and do a "clean" install of 22.04.

The installation of 22.04 on both was successful AND.. LXD worked (ie I could create & run Ubuntu 20.04 and 22.04 containers).

The 2nd two systems were larger Desktop 20.04 systems (12 core AMD, 3-4 TB SSD, 64GB ram).

I "upgraded" 1 of those 2 Desktop systems successfully to 22.04.

However, after the upgrade, again LXD could create but NOT start Ubuntu 22.04 or 20.04 containers??

$ grep cgroup /proc/mounts**
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
net_cls /sys/fs/cgroup/net_cls cgroup rw,relatime,net_cls 0 0

From the above you can see that for some reason "net_cls" cgroup is being mounted over cgroup2

I found a "workaround" was to disable cgroup2 which I did by:

add the following string to the GRUB_CMDLINE_LINUX line in /etc/default/grub
and then run sudo update-grub.

"systemd.unified_cgroup_hierarchy=0"

Once I'd done the above and rebooted... LXD worked correctly again.

I just wanted to report this since I had multiple instances of this occurring.

There are also several others that have encountered this also.

Brian

Tags: cgroup cgroup2
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1971571/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → lxd (Ubuntu)
Revision history for this message
brian mullan (bmullan) wrote (last edit ):

I do not think this is an LXD Bug !
I think there is some problem with the "upgrade" process
do-release-upgrade -d

As I stated a "clean install" of 22.04 everything works.

On an "upgrade" is where cgroup2 gets screwed up.

One piece of info I forgot to mention...

Before doing do-release-upgrade -d
on one system I removed LXD snap

Then did a successful upgrade to 22.04

Then reinstalled the LXD Snap and it could launch
but not Run an Ubuntu Container until I removed cgroup2 on
the Host

To me that says they Ubuntu "upgrade"
process is the cause because a
clean install works.

Just my .02
Brian

Revision history for this message
brian mullan (bmullan) wrote :

sorry for typos
I'm on my phone w autocorrect

Revision history for this message
brian mullan (bmullan) wrote :

chroup2 = cgroup2

Revision history for this message
Stéphane Graber (stgraber) wrote :

Can you show:
 - cat /proc/self/cgroup
 - cat /proc/self/mounts

On a broken system?

Changed in lxd (Ubuntu):
status: New → Incomplete
Revision history for this message
Sherlock (shift2-freemail) wrote :
Download full text (6.5 KiB)

Hi,

Different user, same problem.

Here is the output of the commands you asked for:

Please advise next steps.

Cheers,
David

david@notebook:~$ cat /proc/self/cgroup
13:devices:/user.slice
12:misc:/
11:pids:/user.slice/user-1000.slice/user@1000.service
10:freezer:/
9:hugetlb:/
8:cpu,cpuacct:/user.slice
7:blkio:/user.slice
6:memory:/user.slice/user-1000.slice/user@1000.service
5:rdma:/
4:net_cls,net_prio:/
3:perf_event:/
2:cpuset:/
1:name=systemd:/user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-f8d676d1-e23f-4e5e-be35-d086ed8dc1e5.scope
0::/user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-f8d676d1-e23f-4e5e-be35-d086ed8dc1e5.scope

david@notebook:~$ cat /proc/self/mounts
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,nosuid,relatime,size=3976416k,nr_inodes=994104,mode=755,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,noexec,relatime,size=802224k,mode=755,inode64 0 0
/dev/sda5 / ext4 rw,noatime,errors=remount-ro 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,inode64 0 0
cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0
bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset,clone_children 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/rdma cgroup rw,nosuid,nodev,noexec,relatime,rdma 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/misc cgroup rw,nosuid,nodev,noexec,relatime,misc 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=18856 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0
tracefs /sys/kernel/tracing tracefs rw,nosuid,nodev,noexec,relatime 0 0
fusectl /sy...

Read more...

Revision history for this message
Stefan Fleischmann (sfleischmann) wrote :

I'm seeing the same problem on a newly installed system that has been installed from MAAS, i.e. using a Ubuntu 22.04 cloud image. This is just a test system so I'm happy to try all kinds of hacks ;-) I already know that I can get things working again by booting with systemd.unified_cgroup_hierarchy=false but thought it could be worth trying to figure out the root cause here.

Revision history for this message
Stefan Fleischmann (sfleischmann) wrote :

One error message from the syslog that I thought could be related:

 lxd.daemon[2577]: time="2022-08-16T21:19:00+02:00" level=error msg="Error reading host's cpuset.cpus"

Revision history for this message
Stefan Fleischmann (sfleischmann) wrote :

In my case this turned out to be a case of cgroup on top of cgroup2. Our workload manager seems to mount the freezer cgroup:

$ grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0

If I unmount /sys/fs/cgroup/freezer I can start my containers again.

Revision history for this message
brian mullan (bmullan) wrote (last edit ):

Stefan's discovery/comment:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1971571/comments/10

is correct in one sense.

When I did this I had "net_cls" being my problem.

$ grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
net_cls /sys/fs/cgroup/net_cls cgroup rw,relatime,net_cls 0 0

$ sudo umount /sys/fs/cgroup/net_cls

I can create & start my containers again.

So is this problem with whatever "orders" the mounts ?

Brian

Revision history for this message
brian mullan (bmullan) wrote :

Bump

I hope some DE see this as its at least affecting LXC/LXD users but possibly others if CGROUP2 isn't available.

Revision history for this message
brian mullan (bmullan) wrote :

Just completed a NEW install of 22.04.1
and this problem still exists!

brian mullan (bmullan)
summary: - ubuntu 22.04 cgroup2 works for clean install but upgrade to 22.04 causes
- cgroup2 problems
+ ubuntu 22.04 net_cls cgroup mounted over cgroup2 prevents LXD containers
+ from starting
brian mullan (bmullan)
description: updated
brian mullan (bmullan)
tags: added: cgroup
removed: bot-comment
description: updated
description: updated
Revision history for this message
Stéphane Graber (stgraber) wrote :

We've seen some reports recently hinting at some third party VPN software causing this particular mount.

Revision history for this message
Thomas Parrott (tomparrott) wrote :
Revision history for this message
brian mullan (bmullan) wrote (last edit ):

I use Mullvad VPN and I just found this BUG filed with Mullvad:

net_cls interfering with lxd #3651

https://github.com/mullvad/mullvadvpn-app/issues/3651

So this BUG (Bug #1971571) is resolved as it was caused by Mullvad VPN installation
and apparently remounting NET_CLS somewhere else on the system should resolve it.

So this bug report can be closed.

Revision history for this message
William Hunter (metaquery) wrote :

The Private Internet Access (PIA) client can also cause this problem. Somebody wrote a blog post about tracking this down: https://ryan.himmelwright.net/post/pia-client-podman-issues/

The smoking gun is file ownership:
$ ls -l /sys/fs/cgroup/net_cls
shows "piavpn" as the group owner.

As stated above, a temporary workaround is
$ sudo umount /sys/fs/cgroup/net_cls

Or for a permanent fix, uninstall the PIA client and reboot.

Now that there is a FOSS version of the PIA client, a bug report was filed (still open):
https://github.com/pia-foss/desktop/issues/50

I'm posting here because search engines seem to point to this page, and seeing this may prevent others from blaming LXD incorrectly.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.