stress-ng-cpu-long times out in bionic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Medium
|
Adam Collard |
Bug Description
I'm running Maas 2.3.5-6511-
If I configure commissioning series to Bionic, stress-ng-cpu-long test times out, but only on NUMA machines (that is, machines that have more than one CPU).
If the commissioning series to xenial, then tests run fine, as expected.
This is the command MAAS runs when running stress-ng for stress-ng-cpu-long test:
stress-ng --aggressive -a 0 --class cpu,cpu-cache --ignite-cpu --log-brief --metrics-brief --times --tz --verify --timeout 12h
I have tried running this test with the timeout of one hour, or even just 10 minutes, and I've discovered that, on bionic/
For instance, on a dual Intel Xeon E5-2698 v3 machine (with total of 32/64 cores/threads), when I run 1h stress-ng test, these are completion times on different series:
xenial: 1.003 hours (60m 13s)
bionic: 1.120 hours (67m 12s)
cosmic: 1.190 hours (71m 35s)
disco: 1.470 hours (88m 20s)
When I run those tests on non-NUMA (single CPU) machines, the tests are done within 60 minutes, same as on xenial on NUMA machines.
I am able to make tests complete without an error if I change the metadata timeout value to 14 hours, making sure that the timeout is way greater than the expected test run.
I have opened a bug against stress-ng too, as I'm not sure if this is, maybe, normal stress-ng behavior on newer Ubuntu series.
The bug number is LP: #1826791
Related branches
- MAAS Lander: Approve
- Jack Lloyd-Walters: Approve
-
Diff: 26 lines (+2/-2)2 files modifiedsrc/metadataserver/builtin_scripts/testing_scripts/stress-ng-cpu-long.sh (+1/-1)
src/metadataserver/builtin_scripts/testing_scripts/stress-ng-memory-long.sh (+1/-1)
description: | updated |
Changed in maas: | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in maas: | |
status: | Triaged → Fix Committed |
Changed in maas: | |
milestone: | 3.3.0 → 3.3.0-beta1 |
Changed in maas: | |
status: | Fix Committed → Fix Released |
Base on the input from LP: #1826791, this is the way stress-ng behaves on newer series.
So, this should be fixed in MAAS, probably by not running stress-ng withour --agressive and/or wirhout -a 0.