lots of "fail" and "error" messages in mmap test, yet test exits with a 0 code
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Stress-ng |
Fix Released
|
Medium
|
Colin Ian King | ||
stress-ng (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Colin Ian King |
Bug Description
== SRU Justification BIONIC ==
Stress-ng is reporting error messages when it should be silent and nor complaining about a non-error condition. This has already been fixed in later releases of Ubuntu, so backport this trivial fix to Bionic.
== Fix ==
Upstream commit:
From c0ce27a5870fc87
From: Colin Ian King <email address hidden>
Date: Fri, 1 Feb 2019 19:32:54 +0000
Subject: [PATCH] stress-mmap: be less noisy on mmap failures and fix directory cleanup (LP: #1807732)
== Test ==
Run certification tests (see below). Without the fix, one sees lots of bogus error messages from the mmap test even though it successefully completes. With the fix, the errors won't appear.
== Regression Potential ==
Small, this touches one stress-ng test (mmap test) and the backport is just a small wiggle of the upstream fix that addressed this original bug (but never got backported to Bionic).
-------
This is probably just a labeling issue, but we need to either confirm that and hopefully correct the labeling, or we need to figure out why this is failing. Running the stress-ng based memory tests for certification on a system (tested at OEM, so I do not have direct access to the hardware) and in the test output I noticed the following:
Running stress-ng mmap stressor for 3760 seconds....
stress-ng: info: [174923] dispatching hogs: 416 mmap
stress-ng: error: [175726] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175523] stress-ng-mmap: rmdir './tmp-
stress-ng: error: [175742] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175598] stress-ng-mmap: rmdir './tmp-
stress-ng: error: [175743] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: error: [175748] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175601] stress-ng-mmap: rmdir './tmp-
stress-ng: fail: [175603] stress-ng-mmap: rmdir './tmp-
stress-ng: error: [175623] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175590] stress-ng-mmap: rmdir './tmp-
stress-ng: error: [175721] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175505] stress-ng-mmap: rmdir './tmp-
stress-ng: error: [175705] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175417] stress-ng-mmap: rmdir './tmp-
stress-ng: error: [175669] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175060] stress-ng-mmap: rmdir './tmp-
stress-ng: error: [175633] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175620] stress-ng-mmap: rmdir './tmp-
stress-ng: error: [175690] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175282] stress-ng-mmap: rmdir './tmp-
stress-ng: error: [175643] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175033] stress-ng-mmap: rmdir './tmp-
there are a LOT of messages marked "error" and "fail", however at the end of that particular test there is this:
stress-ng: error: [175640] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175028] stress-ng-mmap: rmdir './tmp-
stress-ng: error: [175683] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175219] stress-ng-mmap: rmdir './tmp-
stress-ng: info: [174923] successful run completed in 3760.09s (1 hour, 2 mins, 40.09 secs)
return_code is 0
So despite a LOT (416 of them) of these fail messages, the test exits cleanly with a 0 for passing. If the test truly IS passing and this is expected, then these should be warning messages, NOT fail messages. Fail implies something that would fail the test and block passing of the test. Warning is something unexpected but not blocking. In my opinion...
Can we get some clarity on this?
Changed in stress-ng: | |
assignee: | nobody → Colin Ian King (colin-king) |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in stress-ng: | |
status: | Fix Committed → Fix Released |
Changed in stress-ng: | |
status: | Confirmed → Fix Released |
Changed in stress-ng (Ubuntu Bionic): | |
assignee: | nobody → Colin Ian King (colin-king) |
Valid points.
I'm going to make "gave up trying to mmap, no available memory" as an info message from now on, as it indicates why the stress child process gave up trying to allocate memory. I'm also going to bump the number of retries up to 65536 with a 100000ms delay between each retry.