2022-05-04 00:06:51 |
dann frazier |
description |
On scobee-kernel(arm64) with hirsute:linux(5.11.0-41.45) for sru-20211108 there are several reports about the sched domain not covering the full range. The same does not happen on kuzzle. But 32 is a bit of a suspicious number.
Running tests.......
cpuset_sched_domains 1 TINFO: CPUs are numbered continuously starting at 0 (0-127)
cpuset_sched_domains 1 TINFO: Nodes are numbered continuously starting at 0 (0-3)
cpuset_sched_domains 1 TINFO: root group load balance test
cpuset_sched_domains 1 TINFO: sched load balance: 0
cpuset_sched_domains 1 TINFO: CPU hotplug:
cpuset_check_domains 1 TPASS : check_sched_domains passed
cpuset_sched_domains 1 TPASS: partition sched domains succeeded.
cpuset_sched_domains 3 TINFO: root group load balance test
cpuset_sched_domains 3 TINFO: sched load balance: 1
cpuset_sched_domains 3 TINFO: CPU hotplug:
cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
cpuset_sched_domains 3 TFAIL: partition sched domains failed.
cpuset_sched_domains 5 TINFO: root group load balance test
cpuset_sched_domains 5 TINFO: sched load balance: 0
cpuset_sched_domains 5 TINFO: CPU hotplug:
cpuset_check_domains 1 TPASS : check_sched_domains passed
cpuset_sched_domains 5 TPASS: partition sched domains succeeded.
cpuset_sched_domains 7 TINFO: root group load balance test
cpuset_sched_domains 7 TINFO: sched load balance: 0
cpuset_sched_domains 7 TINFO: CPU hotplug:
cpuset_check_domains 1 TPASS : check_sched_domains passed
cpuset_sched_domains 7 TPASS: partition sched domains succeeded.
cpuset_sched_domains 9 TINFO: root group load balance test
cpuset_sched_domains 9 TINFO: sched load balance: 1
cpuset_sched_domains 9 TINFO: CPU hotplug:
cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
cpuset_sched_domains 9 TFAIL: partition sched domains failed.
cpuset_sched_domains 11 TINFO: root group load balance test
cpuset_sched_domains 11 TINFO: sched load balance: 1
cpuset_sched_domains 11 TINFO: CPU hotplug:
cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
cpuset_sched_domains 11 TFAIL: partition sched domains failed.
cpuset_sched_domains 13 TINFO: general group load balance test
cpuset_sched_domains 13 TINFO: root group info:
cpuset_sched_domains 13 TINFO: sched load balance: 0
cpuset_sched_domains 13 TINFO: general group info:
cpuset_sched_domains 13 TINFO: cpus: -
cpuset_sched_domains 13 TINFO: sched load balance: 1
cpuset_check_domains 1 TPASS : check_sched_domains passed
cpuset_sched_domains 13 TPASS: partition sched domains succeeded.
cpuset_sched_domains 15 TINFO: general group load balance test
cpuset_sched_domains 15 TINFO: root group info:
cpuset_sched_domains 15 TINFO: sched load balance: 0
cpuset_sched_domains 15 TINFO: general group info:
cpuset_sched_domains 15 TINFO: cpus: 1
cpuset_sched_domains 15 TINFO: sched load balance: 0
cpuset_check_domains 1 TPASS : check_sched_domains passed
cpuset_sched_domains 15 TPASS: partition sched domains succeeded.
cpuset_sched_domains 17 TINFO: general group load balance test
cpuset_sched_domains 17 TINFO: root group info:
cpuset_sched_domains 17 TINFO: sched load balance: 1
cpuset_sched_domains 17 TINFO: general group info:
cpuset_sched_domains 17 TINFO: cpus: -
cpuset_sched_domains 17 TINFO: sched load balance: 1
cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
cpuset_sched_domains 17 TFAIL: partition sched domains failed.
cpuset_sched_domains 19 TINFO: general group load balance test
cpuset_sched_domains 19 TINFO: root group info:
cpuset_sched_domains 19 TINFO: sched load balance: 1
cpuset_sched_domains 19 TINFO: general group info:
cpuset_sched_domains 19 TINFO: cpus: 1
cpuset_sched_domains 19 TINFO: sched load balance: 1
cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
cpuset_sched_domains 19 TFAIL: partition sched domains failed.
cpuset_sched_domains 21 TINFO: general group load balance test
cpuset_sched_domains 21 TINFO: root group info:
cpuset_sched_domains 21 TINFO: sched load balance: 0
cpuset_sched_domains 21 TINFO: general group info:
cpuset_sched_domains 21 TINFO: cpus: 1,2
cpuset_sched_domains 21 TINFO: sched load balance: 0
cpuset_check_domains 1 TPASS : check_sched_domains passed
cpuset_sched_domains 21 TPASS: partition sched domains succeeded.
cpuset_sched_domains 23 TINFO: general group load balance test
cpuset_sched_domains 23 TINFO: root group info:
cpuset_sched_domains 23 TINFO: sched load balance: 0
cpuset_sched_domains 23 TINFO: general group info:
cpuset_sched_domains 23 TINFO: cpus: 1,2
cpuset_sched_domains 23 TINFO: sched load balance: 1
cpuset_check_domains 1 TPASS : check_sched_domains passed
cpuset_sched_domains 23 TPASS: partition sched domains succeeded.
cpuset_sched_domains 25 TINFO: general group load balance test
cpuset_sched_domains 25 TINFO: root group info:
cpuset_sched_domains 25 TINFO: sched load balance: 0
cpuset_sched_domains 25 TINFO: general group info:
cpuset_sched_domains 25 TINFO: cpus: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127
cpuset_sched_domains 25 TINFO: sched load balance: 1
cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
cpuset_sched_domains 25 TFAIL: partition sched domains failed.
cpuset_sched_domains 27 TINFO: general group load balance test
cpuset_sched_domains 27 TINFO: root group info:
cpuset_sched_domains 27 TINFO: sched load balance: 0
cpuset_sched_domains 27 TINFO: general group1 info:
cpuset_sched_domains 27 TINFO: cpus: 1
cpuset_sched_domains 27 TINFO: sched load balance: 1
cpuset_sched_domains 27 TINFO: general group2 info:
cpuset_sched_domains 27 TINFO: cpus: 0
cpuset_sched_domains 27 TINFO: sched load balance: 1
cpuset_sched_domains 27 TINFO: CPU hotplug: none
cpuset_sched_domains 27 TPASS: partition sched domains succeeded.
cpuset_sched_domains 29 TINFO: general group load balance test
cpuset_sched_domains 29 TINFO: root group info:
cpuset_sched_domains 29 TINFO: sched load balance: 0
cpuset_sched_domains 29 TINFO: general group1 info:
cpuset_sched_domains 29 TINFO: cpus: 1,2
cpuset_sched_domains 29 TINFO: sched load balance: 1
cpuset_sched_domains 29 TINFO: general group2 info:
cpuset_sched_domains 29 TINFO: cpus: 0-3
cpuset_sched_domains 29 TINFO: sched load balance: 0
cpuset_sched_domains 29 TINFO: CPU hotplug: none
cpuset_sched_domains 29 TPASS: partition sched domains succeeded.
cpuset_sched_domains 31 TINFO: general group load balance test
cpuset_sched_domains 31 TINFO: root group info:
cpuset_sched_domains 31 TINFO: sched load balance: 0
cpuset_sched_domains 31 TINFO: general group1 info:
cpuset_sched_domains 31 TINFO: cpus: 1,2
cpuset_sched_domains 31 TINFO: sched load balance: 1
cpuset_sched_domains 31 TINFO: general group2 info:
cpuset_sched_domains 31 TINFO: cpus: 0,3
cpuset_sched_domains 31 TINFO: sched load balance: 1
cpuset_sched_domains 31 TINFO: CPU hotplug: none
cpuset_sched_domains 31 TPASS: partition sched domains succeeded.
cpuset_sched_domains 33 TINFO: general group load balance test
cpuset_sched_domains 33 TINFO: root group info:
cpuset_sched_domains 33 TINFO: sched load balance: 0
cpuset_sched_domains 33 TINFO: general group1 info:
cpuset_sched_domains 33 TINFO: cpus: 1,2
cpuset_sched_domains 33 TINFO: sched load balance: 1
cpuset_sched_domains 33 TINFO: general group2 info:
cpuset_sched_domains 33 TINFO: cpus: 1,3
cpuset_sched_domains 33 TINFO: sched load balance: 1
cpuset_sched_domains 33 TINFO: CPU hotplug: none
cpuset_sched_domains 33 TPASS: partition sched domains succeeded.
cpuset_sched_domains 35 TINFO: general group load balance test
cpuset_sched_domains 35 TINFO: root group info:
cpuset_sched_domains 35 TINFO: sched load balance: 0
cpuset_sched_domains 35 TINFO: general group1 info:
cpuset_sched_domains 35 TINFO: cpus: 1,2
cpuset_sched_domains 35 TINFO: sched load balance: 1
cpuset_sched_domains 35 TINFO: general group2 info:
cpuset_sched_domains 35 TINFO: cpus: 1,3
cpuset_sched_domains 35 TINFO: sched load balance: 1
cpuset_sched_domains 35 TINFO: CPU hotplug: offline
cpuset_sched_domains 35 TPASS: partition sched domains succeeded.
cpuset_sched_domains 37 TINFO: general group load balance test
cpuset_sched_domains 37 TINFO: root group info:
cpuset_sched_domains 37 TINFO: sched load balance: 0
cpuset_sched_domains 37 TINFO: general group1 info:
cpuset_sched_domains 37 TINFO: cpus: 1,2
cpuset_sched_domains 37 TINFO: sched load balance: 1
cpuset_sched_domains 37 TINFO: general group2 info:
cpuset_sched_domains 37 TINFO: cpus: 1,3
cpuset_sched_domains 37 TINFO: sched load balance: 1
cpuset_sched_domains 37 TINFO: CPU hotplug: online
cpuset_sched_domains 37 TPASS: partition sched domains succeeded.
INFO: ltp-pan reported some tests FAIL
LTP Version: 20210927
INFO: Test end time: Sat Nov 6 19:28:17 UTC 2021 |
[Impact]
The LTP cpuset_sched_domains test, authored by Miao Xie, fails on a Kunpeng920
server that has 4 NUMA nodes:
https://launchpad.net/bugs/1951289
This does appear to be a real bug. /proc/schedstat displays 4 domain levels for
CPUs on 2 of the nodes, but only 3 levels for the others 2 (see below).
I assume this means the scheduler is making suboptimal decisions about
where to place/move processes.
[Test Case]
On a 128 core Kunpeng 920 system, observe that half the CPUs are missing a 3rd level scheduling domain:
ubuntu@d06-4:~$ grep domain2 /proc/schedstat | wc -l
128
ubuntu@d06-4:~$ grep domain3 /proc/schedstat | wc -l
64
ubuntu@d06-4:~$
[What Could Go Wrong]
This changes the code used for populating sched domains, so it could potentially break on other systems, potentially leading to poor scheduling characteristics (higher latencies, lower overall throughput etc). |
|