Hardware test stress-ng-cpu-long bind failed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Invalid
|
High
|
Lee Trager | ||
Stress-ng |
Fix Released
|
Medium
|
Colin Ian King | ||
stress-ng (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
SRU Request, Bionic
[Justification]
The af-alg stressor in stress-ng is reporting bind failures when resources run low; this is not an error that should be reported; it should be silently handled rather than causing the stressor to bail out and finish prematurely.
[Fix]
Upstream fix:
a5c2cb02e8ed check for EBUSY bind failures
And prerequisites:
13c4c58d0150 expand error message to capture more information
39184c74f1e0 forgot to add in \n
aed180cb7b2f make ENOKEY a non-critical failure
7f1a617adcd6 skip over ciphers that may not exist
88cbe87a3cc1 fix errno = ENOENT assignment, should be == comparison
3ec28f2f5438 return EXIT_NOT_
[Testcase]
Without the fix, testing on large CPU systems one sees:
stress-ng --af-alg 0 -t 60
stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
With the fix the connection timed out error does not get reported and the test works.
[Regression Potential]
This just affects the af-alg stressor and contains upstream stress-ng commits that are in cosmic, disco and have been exercised on large systems. This patch set reduces the false positive errors from the af-alg stressor. If there are issues, just the error reports from failed AF-ALG kernel algorithms will be ignored rather than we get false positives.
-----
Running the hardware test 'stress-
Here's snippet from the logs:
Hardware: HP Dl360 Gen10 stress-ng-cpu-long
OS: Ubuntu 18.04
MAAS:
# SNIPPET OF KERN.LOG
...
request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module crypto-xor-all, throttling...
request_module: modprobe crypto-xor-all cannot be processed, kmod busy with 50 threads for more than 5 seconds now
request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module crypto-ofb(aes), throttling...
request_module: modprobe crypto-ofb(aes) cannot be processed, kmod busy with 50 threads for more than 5 seconds now
request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module crypto-
...
# TEST OUTPUT
...
disabled 'cpu-online' as it may hang the machine (enable it with the --pathological option)
dispatching hogs: 72 af-alg, 72 atomic, 72 branch, 72 bsearch, 72 cache, 72 context, 72 cpu, 72 crypt, 72 fp-error, 72 funccall, 72 getrandom, 72 heapsort, 72 hsearch, 72 icache, 72 ioport, 72 lockbus, 72 longjmp, 72 lsearch, 72 malloc, 72 matrix, 72 membarrier, 72 memcpy, 72 mergesort, 72 nop, 72 numa, 72 opcode, 72 qsort, 72 radixsort, 72 rdrand, 72 str, 72 stream, 72 tree, 72 tsc, 72 tsearch, 72 vecmath, 72 wcs, 72 zlib
stress-ng-numa: system has 2 of a maximum 1024 memory NUMA nodes
stress-ng-stream: stressor loosely based on a variant of the STREAM benchmark code
stress-ng-stream: do NOT submit any of these results to the STREAM benchmark results
stress-ng-stream: Using CPU cache size of 25344K
stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
...
Changed in maas: | |
importance: | Undecided → High |
Changed in maas: | |
assignee: | nobody → Lee Trager (ltrager) |
description: | updated |
Changed in stress-ng: | |
status: | Triaged → In Progress |
importance: | Low → Medium |
Changed in stress-ng: | |
status: | In Progress → Fix Released |
Changed in maas: | |
status: | Incomplete → Invalid |
How CPU/cores does your system have? How much RAM does your system have? Does the stress-ng-cpu-short test pass?