ubuntu_ltp_controllers:cpuset_sched_domains: tests 3,9,11,17,19,25 report incorrect sched domain for cpu#32
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kunpeng920 |
Fix Released
|
Low
|
dann frazier | ||
Ubuntu-18.04 |
Fix Released
|
Undecided
|
dann frazier | ||
Ubuntu-18.04-hwe |
Fix Released
|
Undecided
|
dann frazier | ||
Ubuntu-20.04 |
Fix Released
|
Undecided
|
dann frazier | ||
Upstream-kernel |
Fix Released
|
Undecided
|
Unassigned | ||
ubuntu-kernel-tests |
Invalid
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
dann frazier | ||
Focal |
Fix Released
|
Undecided
|
dann frazier | ||
Hirsute |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
[Impact]
The LTP cpuset_
server that has 4 NUMA nodes:
https:/
This does appear to be a real bug. /proc/schedstat displays 4 domain levels for
CPUs on 2 of the nodes, but only 3 levels for the others 2 (see below).
I assume this means the scheduler is making suboptimal decisions about
where to place/move processes.
[Test Case]
On a 128 core Kunpeng 920 system, observe that half the CPUs are missing a 3rd level scheduling domain:
ubuntu@d06-4:~$ grep domain2 /proc/schedstat | wc -l
128
ubuntu@d06-4:~$ grep domain3 /proc/schedstat | wc -l
64
ubuntu@d06-4:~$
[What Could Go Wrong]
This changes the code used for populating sched domains, so it could potentially break on other systems, leading to poor scheduling characteristics (higher latencies, lower overall throughput etc).
CVE References
Changed in kunpeng920: | |
importance: | Undecided → Low |
assignee: | nobody → dann frazier (dannf) |
description: | updated |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
description: | updated |
I can reproduce on scobee w/ latest LTP from git. This issue is also reproducible on the previous hirsute kernel (5.11.0-40.44), so does not appear to be a regression. scobee has 4 x 32 core NUMA nodes, totaling 128 cpus. Interestingly I could not reproduce on a system w/ the same SoC but only 3 NUMA nodes (96 cpus).
It isn't clear to me what exactly the test thinks is wrong, but I did find something interesting. I captured /proc/schedstat at the time of the failure and I noticed that half of the cpus (0-31, 64-95) have 4 sched domains (domain0-domain3), while the other half (32-63, 96-127) only include the first 3 sched domains. I'll attach the full file, but here's a snippet contrasting the entries for cpu31 and cpu32:
---------------- 00000000, 00000000, ffffffff 0 0 0 [...] 00000000, ffffffff, ffffffff 0 0 0 [...] ffffffff, ffffffff, ffffffff 0 0 0 [...] ffffffff, ffffffff, ffffffff 0 0 0 [...] 00000000, ffffffff, 00000000 0 0 0 [...] 00000000, ffffffff, ffffffff 0 0 0 [...] ffffffff, ffffffff, ffffffff 0 0 0 [...]
cpu31 0 0 0 0 0 0 21656590030 600326550 26576
domain0 00000000,
domain1 00000000,
domain2 00000000,
domain3 ffffffff,
cpu32 0 0 0 0 0 0 5351733990 838031470 39918
domain0 00000000,
domain1 00000000,
domain2 00000000,
----------------
Note that domain3 is the domain that comprises all CPUs. If that is what the test is looking for, then it would make sense that the test would begin to fail at CPU32. I turned on sched-domain debugging (/sys/kernel/ debug/sched_ debug), and verified that it seems to match what I see in /proc/schedstat. Specifically, that CPU32 is not assigned a domain-3:
[18683.213478] CPU31 attaching sched-domain(s):
[18683.213480] domain-0: span=0-31 level=MC
[18683.213485] groups: 31:{ span=31 }, 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }, 3:{ span=3 }, 4:{ span=4 }, 5:{ span=5 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 12:{ span=12 }, 13:{ span=13 }, 14:{ span=14 }, 15:{ span=15 }, 16:{ span=16 }, 17:{ span=17 }, 18:{ span=18 }, 19:{ span=19 }, 20:{ span=20 }, 21:{ span=21 }, 22:{ span=22 }, 23:{ span=23 }, 24:{ span=24 }, 25:{ span=25 }, 26:{ span=26 }, 27:{ span=27 }, 28:{ span=28 }, 29:{ span=29 }, 30:{ span=30 }
[18683.213582] domain-1: span=0-63 level=NUMA
[18683.213586] groups: 0:{ span=0-31 cap=32768 }, 32:{ span=32-63 cap=32768 }
[18683.213599] domain-2: span=0-95 level=NUMA
[18683.213604] groups: 0:{ span=0-63 cap=65536 }, 64:{ span=64-95 cap=32768 }
[18683.213615] domain-3: span=0-127 level=NUMA
[18683.213622] groups: 0:{ span=0-95 mask=0-31 cap=98304 }, 96:{ span=64-127 mask=96-127 cap=65536 }
[18683.213655] CPU32 attaching sched-domain(s):
[18683.213658] domain-0: span=32-63 level=MC
[18683.213662] groups: 32:{ span=32 }, 33:{ span=33 }, 34:{ span=34 }, 35:{ span=35 }, 36:{ span=36 }, 37:{ span=37 }, 38:{ span=38 }, 39:{ span=39 }, 40:{ span=40 }, 41:{ span=41 }, 42:{ span=42 }, 43:{ span=43 }, 44:{ span=44 }, 45:{ span=45 }, 46:{ span=46 }, 47:{ span=47 }, 48:{ span=48 }, 49:{ span=49 }, 50:{ span=50 }, 51:{ span=51 }, 52:{ span=52 }, 53:{ span=53 }, 54:{ span=54 }, 55:{ span=55 }, 56:{ span=56 }, 57:{ span=57 }, 58:{ span=58 }, 59:{ span=59 },...