isolcpus are ignored when using cgroups V2, causing processes to have wrong affinity

Bug #2076957 reported by Matthew Ruffell
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Medium
Matthew Ruffell

Bug Description

BugLink: https://bugs.launchpad.net/bugs/2076957

[Impact]

In latency sensitive environments, it is very common to use isolcpus to reserve a set of cpus that no other processes are to be placed on, and run just dpdk in poll mode.

There is a bug in the jammy kernel, where if cgroups V2 are enabled, after several minutes the kernel will place other processes onto these reserved isolcpus at random. This disturbs dpdk and introduces latency.

The issue does not occur with cgroups V1, so a workaround is to use cgroups V1 instead of V2 for the moment.

[Fix]

I arrived at this commit after a full git bisect, which fixes the issue. It landed in 6.2-rc1:

commit 7fd4da9c1584be97ffbc40e600a19cb469fd4e78
Author: Waiman Long <email address hidden>
Date: Sat Nov 12 17:19:39 2022 -0500
Subject: cgroup/cpuset: Optimize cpuset_attach() on v2
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fd4da9c1584be97ffbc40e600a19cb469fd4e78

Only the 5.15 Jammy kernel needs this fix. Focal works correctly as is.

The commit skips calls to cpuset_attach() if the underlying cpusets or memory have not changed in a cgroup, and it seems to fix the issue.

[Testcase]

Deploy a bare metal server, ideally with a number of cores, 56 should be plenty.
Use Jammy, with the 5.15 GA kernel.

1) Edit /etc/default/grub and set GRUB_CMDLINE_LINUX_DEFAULT to have
"isolcpus=4-7,32-35 rcu_nocb_poll rcu_nocbs=4-7,32-35 systemd.unified_cgroup_hierarchy=1"
2) sudo reboot
3) sudo cat /sys/devices/system/cpu/isolated
4-7,32-35
4) sudo apt install s-tui stress
5) sudo s-tui
6) htop
7) $ while true; do sudo ps -eLF | head -n 1; sudo ps -eLF | grep stress | awk -v a="4" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="5" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="6" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="7" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="32" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="33" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="34" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="35" '$9 == a {print;}'; sleep 5; done

Setup isolcpus to separate off 4-7 and 32-35, so each NUMA node has a set of isolated CPUs.

s-tui is a great frontend for stress, and it starts stress processes. All stress processes should initially be on non-isolated CPUs, confirm this with htop, that 4-7 and 32-25 are at 0% while every other cpu is at 100%.

After 3 minutes, but sometimes it takes up to 10 minutes, a stress process, or the s-tui process will be incorrectly placed onto an isolated cpu, causing it to increase in usage in htop. The while script checking ps with cpu affinities will also likely be printing the incorrectly placed process.

A test kernel is available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf391137-test

If you install it, the processes will not be placed onto the isolated cpus.

[Where problems could occur]

The patch changes how cgroups determines when cpuset_attach() should be called. cpuset_attach() is currently called very frequently in the 5.15 Jammy kernel, but most operations should be NOP due to no changes occurring in cpusets or memory in the cgroup the process is attached to. We are changing it to instead skip calling cpuset_attach() if there are no changes, which should offer a small performance increase, as well as fixing this isolcpus bug.

If a regression were to occur, it would affect cgroups V2 only, and it could cause resource limits to be applied incorrectly in the worst case.

Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Jammy):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Matthew Ruffell (mruffell)
description: updated
tags: added: jammy sts
Revision history for this message
Matthew Ruffell (mruffell) wrote :
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-122.132 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux' to 'verification-done-jammy-linux'. If the problem still exists, change the tag 'verification-needed-jammy-linux' to 'verification-failed-jammy-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-v2 verification-needed-jammy-linux
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Jammy.

I started a n2-highcpu-32 instance on GCP due to bare metal systems being unavailable due to the certification lab move.

I edited /etc/default/grub.d/50-cloudimg-settings.cfg and set:

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200 isolcpus=4-7,16-20 rcu_nocb_poll rcu_nocbs=4-7,16-20 systemd.unified_cgroup_hierarchy=1"

ran sudo update-grub and rebooted.

Due to 5.15.0-121-generic still being in -proposed due to 2024.08.05 releasing slightly later than expected, I enabled -proposed and installed 5.15.0-121-generic to get a baseline.

I rebooted again.

I then set up htop, s-tui and the while loop to check for processes on 4-7,16-20.

I started s-tui, and there were processes placed on the other cores within 3 minutes. By 10 minutes, all cores had stress running on them, and isolation was completely ignored.

I then enabled -proposed2 and installed 5.15.0-122-generic:

$ uname -rv
5.15.0-122-generic #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024

I re-ran s-tui and started stress.

After 1 hour and 20 minutes, the isolcated cpus were still completely isolated, with no processes running on them. Stress was only confined to regular cpus.

The kernel in -proposed fixes the issue. Happy to mark verified for jammy.

tags: added: verification-done-jammy-linux
removed: verification-needed-jammy-linux
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.15.0-122.132

---------------
linux (5.15.0-122.132) jammy; urgency=medium

  * jammy/linux: 5.15.0-122.132 -proposed tracker (LP: #2078154)

  * isolcpus are ignored when using cgroups V2, causing processes to have wrong
    affinity (LP: #2076957)
    - cgroup/cpuset: Optimize cpuset_attach() on v2

  * Jammy update: v5.15.164 upstream stable release (LP: #2076100) //
    CVE-2024-41009
    - bpf: Fix overrunning reservations in ringbuf

  * CVE-2024-39494
    - ima: Fix use-after-free on a dentry's dname.name

  * CVE-2024-39496
    - btrfs: zoned: fix use-after-free due to race with dev replace

  * CVE-2024-42160
    - f2fs: check validation of fault attrs in f2fs_build_fault_attr()
    - f2fs: Add inline to f2fs_build_fault_attr() stub

  * CVE-2024-38570
    - gfs2: Rename sd_{ glock => kill }_wait
    - gfs2: Fix potential glock use-after-free on unmount

  * CVE-2024-42228
    - drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc

  * CVE-2024-27012
    - netfilter: nf_tables: restore set elements when delete set fails

  * CVE-2024-26677
    - rxrpc: Fix delayed ACKs to not set the reference serial number

 -- Manuel Diewald <email address hidden> Thu, 29 Aug 2024 14:23:02 +0200

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra/5.15.0-1029.29 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-nvidia-tegra' to 'verification-done-jammy-linux-nvidia-tegra'. If the problem still exists, change the tag 'verification-needed-jammy-linux-nvidia-tegra' to 'verification-failed-jammy-linux-nvidia-tegra'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-tegra-v2 verification-needed-jammy-linux-nvidia-tegra
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-mtk/5.15.0-1035.41 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-mtk' to 'verification-done-jammy-linux-mtk'. If the problem still exists, change the tag 'verification-needed-jammy-linux-mtk' to 'verification-failed-jammy-linux-mtk'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-mtk-v2 verification-needed-jammy-linux-mtk
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra-igx/5.15.0-1019.19 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-nvidia-tegra-igx' to 'verification-done-jammy-linux-nvidia-tegra-igx'. If the problem still exists, change the tag 'verification-needed-jammy-linux-nvidia-tegra-igx' to 'verification-failed-jammy-linux-nvidia-tegra-igx'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia-tegra-igx-v2 verification-needed-jammy-linux-nvidia-tegra-igx
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra-5.15/5.15.0-1030.30~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal-linux-nvidia-tegra-5.15' to 'verification-done-focal-linux-nvidia-tegra-5.15'. If the problem still exists, change the tag 'verification-needed-focal-linux-nvidia-tegra-5.15' to 'verification-failed-focal-linux-nvidia-tegra-5.15'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-nvidia-tegra-5.15-v2 verification-needed-focal-linux-nvidia-tegra-5.15
Juerg Haefliger (juergh)
tags: added: kernel-daily-bug
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.