Comment 7 for bug 1886668

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

https://launchpad.net/~cascardo/+archive/ubuntu/ppa/+sourcepub/11419106/+listing-archive-extra

So, this package on my ppa is built for bionic, but should work on other series too.

It has a service that will call a wrapper that will start the reproducer and reboot. The reason for the reboot is because once we add a task to net_prio cgroup, it will disable cgroup bpf and we can't call the reproducer again. And the reproducer, though it can cause the refcount to go below 0 every time, it won't always cause the exact crash from this bug.

Once you want to disable the reproducer, you should add to the kernel cmdline the parameter "systemd.mask=cgroup-bpf-net-prio-crash.service". Then, you need to remove the package and can get your system back.

You may be running some service that will add a task to net_prio or net_cls cgroup, thus preventing the reproducer to run at all (but not stop it from rebooting your system over and over again). lxd comes to mind here.

You may check that it's the case (before installing the reproducer) by looking at dmesg and searching for:
cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation

The following WARN is the demonstration that the refcount underflow has happened (though not the crash):
[ 12.581125] ------------[ cut here ]------------
[ 12.585021] percpu ref (cgroup_bpf_release_fn) <= 0 (-357) after switching to atomic
[ 12.585092] WARNING: CPU: 2 PID: 665 at lib/percpu-refcount.c:160 percpu_ref_switch_to_atomic_rcu+0x12e/0x140

The crash will cause a panic and likely prevent the system from rebooting, showing you have reproduced the issue.

If you never see the WARN, the bug has been mitigated, though it can still happen if we modify the reproducer slightly to also change net_cls.classid.

Cascardo.