Enable CONFIG_NO_HZ_FULL on supported architectures

Bug #1919154 reported by Marcelo Cerri
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
gerald.yang
Focal
Won't Fix
Undecided
Unassigned
Groovy
Won't Fix
Undecided
Unassigned
Hirsute
Won't Fix
Undecided
Marcelo Cerri
Jammy
Won't Fix
Undecided
gerald.yang
Lunar
Won't Fix
Undecided
gerald.yang
Mantic
Won't Fix
Undecided
gerald.yang
linux-lowlatency (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Won't Fix
Undecided
Unassigned
Groovy
Won't Fix
Undecided
Unassigned
Hirsute
Won't Fix
Undecided
Unassigned
Jammy
Won't Fix
Undecided
Unassigned
Lunar
Won't Fix
Undecided
Unassigned
Mantic
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid
sending scheduling-clock interrupts to CPUs with a single runnable task,
and such CPUs are said to be "adaptive-ticks CPUs". This is important
for applications with aggressive real-time response constraints because
it allows them to improve their worst-case response times by the maximum
duration of a scheduling-clock interrupt. It is also important for
computationally intensive short-iteration workloads: If any CPU is
delayed during a given iteration, all the other CPUs will be forced to
wait idle while the delayed CPU finishes. Thus, the delay is multiplied
by one less than the number of CPUs. In these situations, there is
again strong motivation to avoid sending scheduling-clock interrupts.

[Test Plan]

In order to verify the change will not cause performance issues in context switch we should compare the results for:

./stress-ng --seq 0 --metrics-brief -t 15

Running on a dedicated machine and with the following services disabled: smartd.service, iscsid.service, apport.service, cron.service, anacron.timer, apt-daily.timer, apt-daily-upgrade.timer, fstrim.timer, logrotate.timer, motd-news.timer, man-db.timer.

The results didn't show any performance regression:

https://kernel.ubuntu.com/~mhcerri/lp1919154/

[Where problems could occur]

Performance degradation might happen for workloads with intensive context switching.

Revision history for this message
Marcelo Cerri (mhcerri) wrote :
Changed in linux (Ubuntu Groovy):
status: New → In Progress
Changed in linux (Ubuntu Focal):
status: New → In Progress
Marcelo Cerri (mhcerri)
description: updated
Revision history for this message
Marcelo Cerri (mhcerri) wrote :
Revision history for this message
Brian Murray (brian-murray) wrote :

The Groovy Gorilla has reached end of life, so this bug will not be fixed for that release

Changed in linux (Ubuntu Groovy):
status: In Progress → Won't Fix
Changed in linux (Ubuntu Jammy):
status: New → In Progress
Changed in linux (Ubuntu Lunar):
status: New → In Progress
Changed in linux (Ubuntu Jammy):
assignee: nobody → gerald.yang (gerald-yang-tw)
Changed in linux (Ubuntu Lunar):
assignee: nobody → gerald.yang (gerald-yang-tw)
Changed in linux (Ubuntu Mantic):
assignee: Marcelo Cerri (mhcerri) → gerald.yang (gerald-yang-tw)
Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

Since we have some customers need NO_HZ_FULL, I'd like to provide some updates and progress:

I've borrowed some machines including intel, AMD EPYC, arm64 servers
and now running some tests on a test kernel with
1. CONFIG_NO_HZ_FULL=y
2. not enable nohz_full in kernel cmdline
to evaluate if there is any performance impact

Old kernel seems to have some issues with NO_HZ_FULL built-in but not enable
So I will focus on kernel >= 5.15

For 5.15 test PPA (built with CONFIG_NO_HZ_FULL=y):
https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/no-hz-full

for other versions 5.19 and 6.2, I will create more test PPAs for them
and keep updating the test status here

Thanks,
Gerald

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :
Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

Test program from Jay

Revision history for this message
gerald.yang (gerald-yang-tw) wrote (last edit ):

The attached test code is borrowed from Jay, it measures the average time for running getpid() 100000000 times to see if NO_HZ_FULL will cause any performance degradation for context switch when it’s built into kernel

There are 4 test scopes:
1. Without NO_HZ_FULL built-in
the default value of NO_HZ is NO_HZ_IDLE in ubuntu kernel
2. With NO_HZ_FULL built-in, but not activate it in kernel cmdline
3. With NO_HZ_FULL built-in, and activate nohz_full on some CPUs (also isolcpu and rcu_nocbs) in kernel cmdline
4. With NO_HZ_FULL built-in, and activate nohz_full on some CPUs, but run test program on a non-activate nohz_full CPU

For the first 3 test scopes above also contain two test cases
1. Not pin the test program
2. Pin the test program to a specific CPU, in my test, it’s pinned to CPU 4 as below
taskset --cpu-list 4 ./test

For the forth test scope, only pin the test program to a specific CPU which is not in nohz_full

Revision history for this message
gerald.yang (gerald-yang-tw) wrote (last edit ):
Download full text (3.2 KiB)

Results on Intel machine

Hardware configs:
Dell PowerEdge R730xd
Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
40 CPUs
188G RAM, numa nodes: 2

Software configs:
OS: ubuntu 20.04
Official kernel: 5.15 hwe (5.15.0-86.96~20.04.1)
Test kernel: 5.15 hwe (5.15.0-86.96~20.04.1+test20231013b0)
https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/focal-no-hz-full

Test case 1, without NO_HZ_FULL built-in (default ubuntu kernel config):
Run test program 4 times without taskset
tail -n 2 log/notaskset.*
==> log/notaskset.1 <==
total 49116169085 nsec
avg 491 nsec

==> log/notaskset.2 <==
total 47852147979 nsec
avg 478 nsec

==> log/notaskset.3 <==
total 49077846508 nsec
avg 490 nsec

==> log/notaskset.4 <==
total 49037126328 nsec
avg 490 nsec

Run test program 4 times with taskset to CPU 4
tail -n 2 log/taskset.*
==> log/taskset.1 <==
total 48534105655 nsec
avg 485 nsec

==> log/taskset.2 <==
total 48220818730 nsec
avg 482 nsec

==> log/taskset.3 <==
total 48496349690 nsec
avg 484 nsec

==> log/taskset.4 <==
total 48224935123 nsec
avg 482 nsec

Test case 2, with NO_HZ_FULL built-in but not activate in kernel cmdline:
Run test program 4 times without taskset
tail -n 2 nohz-log/notaskset.*
==> nohz-log/notaskset.1 <==
total 48533643569 nsec
avg 485 nsec

==> nohz-log/notaskset.2 <==
total 47933581377 nsec
avg 479 nsec

==> nohz-log/notaskset.3 <==
total 49396311930 nsec
avg 493 nsec

==> nohz-log/notaskset.4 <==
total 48812288206 nsec
avg 488 nsec

Run test program 4 times with taskset to CPU 4
tail -n 2 nohz-log/taskset.*
==> nohz-log/taskset.1 <==
total 48929140711 nsec
avg 489 nsec

==> nohz-log/taskset.2 <==
total 48231661796 nsec
avg 482 nsec

==> nohz-log/taskset.3 <==
total 48482539803 nsec
avg 484 nsec

==> nohz-log/taskset.4 <==
total 48272541984 nsec
avg 482 nsec

Test case 3, with NO_HZ_FULL built-in, activate nohz_full in kernel cmdline:
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.15.0-86-generic root=UUID=69036292-bdc0-4904-8724-974723f1095a ro isolcpus=2-19,22-39 nohz_full=2-19,22-39 rcu_nocbs=2-19,22-39

Run test program 4 times without taskset
tail -n 2 nohz-activate-log/notaskset.*
==> nohz-activate-log/notaskset.1 <==
total 52088354594 nsec
avg 520 nsec

==> nohz-activate-log/notaskset.2 <==
total 49226221648 nsec
avg 492 nsec

==> nohz-activate-log/notaskset.3 <==
total 51462517639 nsec
avg 514 nsec

==> nohz-activate-log/notaskset.4 <==
total 51516303613 nsec
avg 515 nsec

Run test program 4 times with taskset to CPU 4
tail -n 2 nohz-activate-log/taskset.*
==> nohz-activate-log/taskset.1 <==
total 56753345940 nsec
avg 567 nsec

==> nohz-activate-log/taskset.2 <==
total 55720022538 nsec
avg 557 nsec

==> nohz-activate-log/taskset.3 <==
total 55701214354 nsec
avg 557 nsec

==> nohz-activate-log/taskset.4 <==
total 55740784595 nsec
avg 557 nsec

Test case 4, with NO_HZ_FULL built-in, activate nohz_full in kernel cmdline, but run on non-activate CPU:
Run test program on non-activate nohz_full CPU 20
tail -n 2 nohz-activate-off-log/*
==> nohz-activate-off-log/taskset.1 <==
total 49686932587 nsec
avg 496 nsec

==> nohz-activate-off-log/taskset.2 <==
total 49141560622 nsec
avg 491 nsec

==> nohz-activate-off-log/taskset.3 <==
total 490720...

Read more...

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :
Download full text (3.2 KiB)

On arm64 machine

Hardware configs:
Aarch64
128 CPUs
502G RAM, numa nodes: 4

Software configs:
OS: ubuntu 20.04
Official kernel: 5.15 hwe (5.15.0-86.96~20.04.1)
Test kernel: 5.15 hwe (5.15.0-86.96~20.04.1+test20231013b0)
https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/focal-no-hz-full

Test case 1, without NO_HZ_FULL built-in (default ubuntu kernel config):
Run test program 4 times without taskset
tail -n 2 log/notaskset.*
==> log/notaskset.1 <==
total 29370767905 nsec
avg 293 nsec

==> log/notaskset.2 <==
total 29359558119 nsec
avg 293 nsec

==> log/notaskset.3 <==
total 29370043654 nsec
avg 293 nsec

==> log/notaskset.4 <==
total 29362365433 nsec
avg 293 nsec

Run test program 4 times with taskset to CPU 4
tail -n 2 log/taskset.*
==> log/taskset.1 <==
total 29372156600 nsec
avg 293 nsec

==> log/taskset.2 <==
total 29367538079 nsec
avg 293 nsec

==> log/taskset.3 <==
total 29366224367 nsec
avg 293 nsec

==> log/taskset.4 <==
total 29367978392 nsec
avg 293 nsec

Test case 2, with NO_HZ_FULL built-in but not activate in kernel cmdline:
Run test program 4 times without taskset
tail -n 2 nohz-log/notaskset.*
==> nohz-log/notaskset.1 <==
total 27591230003 nsec
avg 275 nsec

==> nohz-log/notaskset.2 <==
total 27582359987 nsec
avg 275 nsec

==> nohz-log/notaskset.3 <==
total 27585635138 nsec
avg 275 nsec

==> nohz-log/notaskset.4 <==
total 27587532170 nsec
avg 275 nsec

Run test program 4 times with taskset to CPU 4
tail -n 2 nohz-log/taskset.*
==> nohz-log/taskset.1 <==
total 27587206878 nsec
avg 275 nsec

==> nohz-log/taskset.2 <==
total 27579854104 nsec
avg 275 nsec

==> nohz-log/taskset.3 <==
total 27588163798 nsec
avg 275 nsec

==> nohz-log/taskset.4 <==
total 27589441746 nsec
avg 275 nsec

Test case 3, with NO_HZ_FULL built-in, activate nohz_full in kernel cmdline:
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.15.0-86-generic root=UUID=7c25ee2a-4c18-462a-90db-94273e5de74b ro isolcpus=2-63,66-127 nohz_full=2-63,66-127 rcu_nocbs=2-63,66-127 sysrq_always_enabled

Run test program 4 times without taskset
tail -n 2 nohz-activate-log/notaskset.*
==> nohz-activate-log/notaskset.1 <==
total 29986516050 nsec
avg 299 nsec

==> nohz-activate-log/notaskset.2 <==
total 29982386090 nsec
avg 299 nsec

==> nohz-activate-log/notaskset.3 <==
total 29976017400 nsec
avg 299 nsec

==> nohz-activate-log/notaskset.4 <==
total 29977079348 nsec
avg 299 nsec

Run test program 4 times with taskset to CPU 4
tail -n 2 nohz-activate-log/taskset.*
==> nohz-activate-log/taskset.1 <==
total 40561421305 nsec
avg 405 nsec

==> nohz-activate-log/taskset.2 <==
total 40556501183 nsec
avg 405 nsec

==> nohz-activate-log/taskset.3 <==
total 40554876491 nsec
avg 405 nsec

==> nohz-activate-log/taskset.4 <==
total 40554776851 nsec
avg 405 nsec

Test case 4, with NO_HZ_FULL built-in, activate nohz_full in kernel cmdline, but run on non-activate CPU:
Run test program on non-activate nohz_full CPU 64
tail -n 2 nohz-activate-off-log/*
==> nohz-activate-off-log/taskset.1 <==
total 29980106645 nsec
avg 299 nsec

==> nohz-activate-off-log/taskset.2 <==
total 29982445376 nsec
avg 299 nsec

==> nohz-activate-off-log/taskset.3 <==
total 29973087899 nsec
avg 299 nsec

==> nohz-activat...

Read more...

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :
Download full text (6.5 KiB)

On AMD EPYC 7252

Hardware configs:
AMD64
32 CPUs
128G RAM, numa nodes: 2

Software configs:
OS: ubuntu 20.04
Official kernel: 5.15 hwe (5.15.0-86.96~20.04.1)
Test kernel: 5.15 hwe (5.15.0-86.96~20.04.1+test20231013b0)
https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/focal-no-hz-full

Test case 1, without NO_HZ_FULL built-in (default ubuntu kernel config):
Run test program 4 times without taskset
tail -n 2 log/notaskset.*
==> log/notaskset.1 <==
total 14800791827 nsec
avg 148 nsec

==> log/notaskset.2 <==
total 14800224701 nsec
avg 148 nsec

==> log/notaskset.3 <==
total 14995047523 nsec
avg 149 nsec

==> log/notaskset.4 <==
total 15056157604 nsec
avg 150 nsec

Run test program 4 times with taskset to CPU 4
...

Read more...

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

Spend some time to get access to AMD EPYC machine

For 5.15 kernel

If we build NO_HZ_FULL into kernel and compare with default kernel (NO_HZ_IDLE)
- on Intel machine, there is no much different
- on Arm64 machine, interestingly, the context switch is a bit faster with NO_HZ_FULL built-in
- on AMD EPYC machine, with NO_HZ_FUL built-in, the context switch is 3.3% worse

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

Run some tests with Jammy hwe kernel (6.2.0) on AMD EPYC 7252

Test case 1, without NO_HZ_FULL built-in (default ubuntu kernel config):
Run test program 4 times without taskset
tail -n 2 log/notaskset.*
==> log/notaskset.1 <==
total 23703299350 nsec
avg 237 nsec

==> log/notaskset.2 <==
total 23738030187 nsec
avg 237 nsec

==> log/notaskset.3 <==
total 23777052540 nsec
avg 237 nsec

==> log/notaskset.4 <==
total 23773975186 nsec
avg 237 nsec

Run test program 4 times with taskset to CPU 4
tail -n 2 log/taskset.*
==> log/taskset.1 <==
total 23817956038 nsec
avg 238 nsec

==> log/taskset.2 <==
total 23734814153 nsec
avg 237 nsec

==> log/taskset.3 <==
total 23708314067 nsec
avg 237 nsec

==> log/taskset.4 <==
total 23776322738 nsec
avg 237 nsec

Test case 2, with NO_HZ_FULL built-in but not activate in kernel cmdline:
Run test program 4 times without taskset
tail -n 2 nohz-log/notaskset.*
==> nohz-log/notaskset.1 <==
total 24664321060 nsec
avg 246 nsec

==> nohz-log/notaskset.2 <==
total 24644369258 nsec
avg 246 nsec

==> nohz-log/notaskset.3 <==
total 24717800210 nsec
avg 247 nsec

==> nohz-log/notaskset.4 <==
total 24843361108 nsec
avg 248 nsec

Run test program 4 times with taskset to CPU 4
tail -n 2 nohz-log/taskset.*
==> nohz-log/taskset.1 <==
total 24644004125 nsec
avg 246 nsec

==> nohz-log/taskset.2 <==
total 24864693785 nsec
avg 248 nsec

==> nohz-log/taskset.3 <==
total 24745717217 nsec
avg 247 nsec

==> nohz-log/taskset.4 <==
total 24778959889 nsec
avg 247 nsec

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

on 6.2 kernel

With NO_HZ_FULL built-in, the context switch performance is ~4% worse than default config

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

The above tests is based on getpid() system call, which doesn't have much workload except context switch, so we evaluate the additional overhead caused by NO_HZ_FULL built-in on AMD EPYC machine

I also used LTP to run scheduler related tests, will attach the test data later

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

The attached is ltp test results with default ubuntu kernel

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

The attached is ltp test results with NO_HZ_FULL built-in but not actiavte

Revision history for this message
gerald.yang (gerald-yang-tw) wrote (last edit ):

The attached is ltp test results with NO_HZ_FULL built-in and activate on kernel cmdline, e.g.
isolcpus=2-15,18-31 nohz_full=2-15,18-31 rcu_nocbs=2-15,18-31
also set irq affinity to avoid interrupts hit isolated CPUs

tests were run on cpu 4

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

The attached is ltp test results with NO_HZ_FULL built-in and activate in kernel cmdline, e.g.
isolcpus=2-15,18-31 nohz_full=2-15,18-31 rcu_nocbs=2-15,18-31

but tests were run on non-activate CPU 16

Revision history for this message
gerald.yang (gerald-yang-tw) wrote (last edit ):

Another note is that NO_HZ_FULL is already built-in on 6.5 "lowlatency" kernel:
https://bugs.launchpad.net/ubuntu/+source/linux-lowlatency/+bug/2023007

But currently it's only available on Mantic, I think we should also consider if it's more proper for lowlatency kernel instead of generic, especially for highly responsive use case, or only need to run a single task on a CPU

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

Some observations from test results between NO_HZ_FULL built-in but not enable and default kernel
Tests are from LTP scheduling related under "realtime" category
And there is "no" taskset when running the tests

- Gettimeofday latency (ns basis)
For no_hz_full built-in:
The average is almost the same, diff is 0.x ns
But stddev is much higher

- Pthread kill latency (us basis)
For no_hz_full built-in:
The average is a bit higher, 0.x - 2 us
Stddev is a bit higher too

- Scheduling jitter (ns basis)
For no_hz_full built-in:
Realtime process delta is higher, delta is the time between doing a fixed amount of work
The scheduler overhead is higher?

code snippet:
clock_gettime(CLOCK_MONOTONIC, &start);
do_work(NUMLOOPS);
clock_gettime(CLOCK_MONOTONIC, &stop);

 /* calc delta, min and max */
delta = ts_sub(stop, start);

- Scheduling latency (us basis)
For no_hz_full built-in:
The average is little bit higher
And stddev is higher

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

Gerald,

Using gettimeofday for testing the effects of NO_HZ_FULL on context switch duration may not be measuring anything that changes with regards to NO_HZ_FULL. gettimeofday is implemented via VDSO, and is not an actual system call that requires a context switch.

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

Thanks Jay for pointing this out!

I just read the man page of vdso, it says gettimeofday is not a real system call and just read the shared memory exports by kernel.
It shouldn't be used to measure the user-kernel context switch overhead caused by NO_HZ_FULL.

From what the tests does, I think scheduling jitter should be more suitable for measuring the overhead.
It measures the time of doing a fixed amount of work multiple times, if there is no additional context-switch overhead, the result should be similar under the same workloads when NO_HZ_FULL is built-in,
and I didn't generate any workload on the test machine

Changed in linux (Ubuntu Focal):
status: In Progress → Won't Fix
Changed in linux (Ubuntu Hirsute):
status: In Progress → Won't Fix
Changed in linux (Ubuntu Jammy):
status: In Progress → Won't Fix
Changed in linux (Ubuntu Lunar):
status: In Progress → Won't Fix
Changed in linux (Ubuntu Mantic):
status: In Progress → Won't Fix
Changed in linux-lowlatency (Ubuntu):
status: New → Fix Released
Changed in linux-lowlatency (Ubuntu Mantic):
status: New → Fix Released
Changed in linux-lowlatency (Ubuntu Lunar):
status: New → Won't Fix
Changed in linux-lowlatency (Ubuntu Jammy):
status: New → Won't Fix
Juerg Haefliger (juergh)
Changed in linux-lowlatency (Ubuntu Hirsute):
status: New → Won't Fix
Changed in linux-lowlatency (Ubuntu Groovy):
status: New → Won't Fix
Changed in linux-lowlatency (Ubuntu Focal):
status: New → Won't Fix
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.