lots of "fail" and "error" messages in mmap test, yet test exits with a 0 code

Bug #1807732 reported by Jeff Lane 
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Fix Released
Medium
Colin Ian King
stress-ng (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Colin Ian King

Bug Description

== SRU Justification BIONIC ==

Stress-ng is reporting error messages when it should be silent and nor complaining about a non-error condition. This has already been fixed in later releases of Ubuntu, so backport this trivial fix to Bionic.

== Fix ==

Upstream commit:

From c0ce27a5870fc879967555d7c975dcbe032bb10e Mon Sep 17 00:00:00 2001
From: Colin Ian King <email address hidden>
Date: Fri, 1 Feb 2019 19:32:54 +0000
Subject: [PATCH] stress-mmap: be less noisy on mmap failures and fix directory cleanup (LP: #1807732)

== Test ==

Run certification tests (see below). Without the fix, one sees lots of bogus error messages from the mmap test even though it successefully completes. With the fix, the errors won't appear.

== Regression Potential ==

Small, this touches one stress-ng test (mmap test) and the backport is just a small wiggle of the upstream fix that addressed this original bug (but never got backported to Bionic).

------------------------------------

This is probably just a labeling issue, but we need to either confirm that and hopefully correct the labeling, or we need to figure out why this is failing. Running the stress-ng based memory tests for certification on a system (tested at OEM, so I do not have direct access to the hardware) and in the test output I noticed the following:

Running stress-ng mmap stressor for 3760 seconds....
stress-ng: info: [174923] dispatching hogs: 416 mmap
stress-ng: error: [175726] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175523] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175523-352' failed, errno=2 (No such file or directory)
stress-ng: error: [175742] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175598] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175598-396' failed, errno=2 (No such file or directory)
stress-ng: error: [175743] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: error: [175748] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175601] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175601-399' failed, errno=2 (No such file or directory)
stress-ng: fail: [175603] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175603-401' failed, errno=2 (No such file or directory)
stress-ng: error: [175623] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175590] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175590-391' failed, errno=2 (No such file or directory)
stress-ng: error: [175721] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175505] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175505-343' failed, errno=2 (No such file or directory)
stress-ng: error: [175705] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175417] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175417-290' failed, errno=2 (No such file or directory)
stress-ng: error: [175669] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175060] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175060-108' failed, errno=2 (No such file or directory)
stress-ng: error: [175633] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175620] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175620-413' failed, errno=2 (No such file or directory)
stress-ng: error: [175690] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175282] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175282-218' failed, errno=2 (No such file or directory)
stress-ng: error: [175643] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175033] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175033-81' failed, errno=2 (No such file or directory)

there are a LOT of messages marked "error" and "fail", however at the end of that particular test there is this:

stress-ng: error: [175640] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175028] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175028-76' failed, errno=2 (No such file or directory)
stress-ng: error: [175683] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [175219] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-175219-184' failed, errno=2 (No such file or directory)
stress-ng: info: [174923] successful run completed in 3760.09s (1 hour, 2 mins, 40.09 secs)
return_code is 0

So despite a LOT (416 of them) of these fail messages, the test exits cleanly with a 0 for passing. If the test truly IS passing and this is expected, then these should be warning messages, NOT fail messages. Fail implies something that would fail the test and block passing of the test. Warning is something unexpected but not blocking. In my opinion...

Can we get some clarity on this?

Changed in stress-ng:
assignee: nobody → Colin Ian King (colin-king)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :

Valid points.

I'm going to make "gave up trying to mmap, no available memory" as an info message from now on, as it indicates why the stress child process gave up trying to allocate memory. I'm also going to bump the number of retries up to 65536 with a 100000ms delay between each retry.

Revision history for this message
Colin Ian King (colin-king) wrote :

Which version of stress-ng was this? I'm trying to figure out why there is a failure to delete temp directories on completion.

Revision history for this message
Colin Ian King (colin-king) wrote :

Fix committed: https://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=c0ce27a5870fc879967555d7c975dcbe032bb10e

I'll get a new release out next week once I have a few more buglets fixed up and pushed.

Changed in stress-ng:
status: In Progress → Fix Committed
Changed in stress-ng:
status: Fix Committed → Fix Released
Revision history for this message
Jeff Lane  (bladernr) wrote :

https://certification.canonical.com/hardware/201908-27259/submission/148999/test/62536/result/11200424/

Reopened this as it seems those messages in MMAP still appear as Errors not Info:

Running stress-ng mmap stressor for 20460 seconds....
stress-ng: info: [5769] dispatching hogs: 96 mmap
stress-ng: error: [5826] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [5818] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-5818-26' failed, errno=2 (No such file or directory)
stress-ng: error: [5803] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [5795] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-5795-14' failed, errno=2 (No such file or directory)
stress-ng: error: [5864] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [5842] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-5842-39' failed, errno=2 (No such file or directory)
stress-ng: error: [5955] stress-ng-mmap: gave up trying to mmap, no available memory
stress-ng: fail: [5897] stress-ng-mmap: rmdir './tmp-stress-ng-mmap-5897-69' failed, errno=2 (No such file or directory)

Just for fun, also, this was run against 2TB of RAM w/ 3TB of swap space. Is it possible that those tmpdir isses are related to filing up the root filesystem?

Changed in stress-ng:
status: Fix Released → Confirmed
Revision history for this message
Colin Ian King (colin-king) wrote :

Just to clarify, which version of stress-ng is this? stress-ng -V

Revision history for this message
Colin Ian King (colin-king) wrote :

This was fixed in commit, but I suspect you are using an older version of stress-ng.

commit c0ce27a5870fc879967555d7c975dcbe032bb10e
Author: Colin Ian King <email address hidden>
Date: Fri Feb 1 19:32:54 2019 +0000

    stress-mmap: be less noisy on mmap failures and fix directory cleanup (LP: #1807732)

Can you inform me which version of stress-ng you are using and which release so I can double check this.

Revision history for this message
Jeff Lane  (bladernr) wrote :

stress-ng 0.09.25-1ubuntu2 the bionic-updates version.

So looks like that's landed in 0.09.52 (Disco) and later, but not in Bionic:
bladernr@galactica:~/development/stress-ng$ git tag --contains c0ce27a5870fc879967555d7c975dcbe032bb10e
V0.09.52
V0.09.53
V0.09.54
V0.09.55
V0.09.56
V0.09.57
V0.09.58
V0.09.59
V0.09.59.1
V0.09.60
V0.10.00
V0.10.01

Revision history for this message
Colin Ian King (colin-king) wrote :

I've backport and SRU this tomorrow. Thanks for the update Jeff.

Revision history for this message
Colin Ian King (colin-king) wrote :

* "I will backport.."

Revision history for this message
Colin Ian King (colin-king) wrote :

Uploaded the fixed package for SRU testing. Should be in -proposed at some point.

description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Jeff, or anyone else affected,

Accepted stress-ng into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/stress-ng/0.09.25-1ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in stress-ng (Ubuntu):
status: New → Fix Released
Changed in stress-ng (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (stress-ng/0.09.25-1ubuntu3)

All autopkgtests for the newly accepted stress-ng (0.09.25-1ubuntu3) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

stress-ng/0.09.25-1ubuntu3 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#stress-ng

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Colin Ian King (colin-king) wrote :

The regression found during testing is due to a small race between creating a file and expanding it's size using ftruncate the memory mapping of the data. In some cases, with slow backing store on slow devices and lack of sync'd data, the mapping will trip a SIGBUS because there is no file data to back the mapping. This issue has been in stress-ng for a while and has been found in the regression testing. This has been fixed, see https://bugs.launchpad.net/ubuntu/+source/stress-ng/+bug/1845011

Changed in stress-ng:
status: Confirmed → Fix Released
Revision history for this message
Colin Ian King (colin-king) wrote :

Verified, I ran 1024 instances of this stressor with really slow direct sync'd I/O backing store on a system with 64 CPUs and 1GB memory for 10 minutes to force this issue. Passes testing.

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Changed in stress-ng (Ubuntu Bionic):
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package stress-ng - 0.09.25-1ubuntu4

---------------
stress-ng (0.09.25-1ubuntu4) bionic; urgency=medium

  * stress-rmap: don't hard fail on fallocate failures (LP: #1845005)
    - backport of upstream commit 38fd9c6ff96c
  * stress-mcontend: sync mmap file with zero data (LP: #1845011)
    - backport of upstream commit c3678dadee23

 -- Colin King <email address hidden> Tue, 24 Sep 2019 08:30:11 +0100

Changed in stress-ng (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.