2017-05-16 14:37:42 |
dann frazier |
description |
[Impact]
CONFIG_NUMA_BALANCING and CONFIG_NUMA_BALANCING_DEFAULT_ENABLED were both set to =y in hwe-x/hwe-y. This changed to =n in hwe-z, unintentionally as far as I can tell. This can lead to performance degradation on NUMA-based arm64 systems when processes migrate, and their memory accesses now suffer additional latency.
[Test Case]
At a functional level:
test -f /proc/sys/kernel/numabalancing
Performance?
[Regression Risk] |
[Impact]
CONFIG_NUMA_BALANCING and CONFIG_NUMA_BALANCING_DEFAULT_ENABLED were both set to =y in hwe-x/hwe-y. This changed to =n in hwe-z, unintentionally as far as I can tell. This can lead to performance degradation on NUMA-based arm64 systems when threads migrate, and their memory accesses now suffer additional latency.
[Test Case]
At a functional level:
$ test -f /proc/sys/kernel/numabalancing
Performance:
$ perf bench numa -a
I didn't see any significant changes in the RAM-bw tests (expected).
For the convergence tests, I observed the following results, which appear to be all within reasonable variance.
Test | Balancing=n | Balancing=y
-------------------------------------
1x3 | No-Converge | No-Converge
1x4 | No-Converge | 0.576s
1x6 | No-Converge | No-Converge
2x3 | No-Converge | No-Converge
3x3 | No-Converge | No-Converge
4x4 | No-Converge | No-Converge
4x4-NOTHP| No-Converge | No-Converge
4x6 | No-Converge | No-Converge
4x8 | No-Converge | No-Converge
8x4 | No-Converge | No-Converge
8x4-NOTHP| No-Converge | No-Converge
3x1 | 0.848s | 1.212s
4x1 | 0.832s | 0.712s
8x1 | 0.792s | 0.649s
16x1 | 1.511s | 1.485s
32x1 | 0.750s | 0.899s
Finally, for the bw tests, I see significant improvements across the board:
Test | BW Improvement
-------------------------
======= Process =========
2x1 | 2.2%
3x1 | 61.4%
4x1 | 25.0%
8x1 | 104.6%
8x1-NOTHP | 107.6%
16x1 | 200.9%
======= Thread ==========
4x1 | 10.9%
8x1 | 107.4%
16x1 | 230.7%
32x1 | 239.7%
2x3 | 13.5%
4x4 | 69.2%
4x6 | 84.4%
4x8 | 79.7%
4x8-NOTHP | 152.5%
3x3 | 96.1%
5x5 | 150.2%
2x16 | 122.6%
1x32 | 40.5%
[Regression Risk]
This is changing a config only on arm64, so the regression risk will be limited to those platforms. The code we will be enabling on arm64 is already enabled on other architectures (!s390x), so has been tested within Ubuntu zesty already. This was previous also enabled on arm64 in hwe-x/hwe-y, so we can gain some confidence from that.
There is certainly a possibility that this negatively impacts performance for certain workloads on NUMA/arm64 systems. If that occurs, there is a sysctl that can be used to disable this feature. |
|