stress-ng: fail: [1379606] vm: detected 1694364648734976 bit errors while stressing memory

Bug #2059167 reported by dann frazier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Fix Released
Undecided
Unassigned

Bug Description

Thanks as always for this fantastic tool. It continues to be invaluable for us. (You can add bug 2058557 to your list of issues found btw!)

Recently all of our arm64 certification regression tests have began to fail in the memory vm test - I've attached a log below. The failure is independent of Ubuntu release/kernel variant/SOC, so I suspect a test regression. There is nothing interesting in dmesg during this test on any system.

The failures seem to be correlated with the certification PPA updating stress-ng to version 0.17.06-0~202403132232~ubuntuXX.YY.Z and persist with 0.17.06-0~202403251447~ubuntuXX.YY.Z.

The last success before this was with 0.17.06-0~202403080935~ubuntuXX.YY.Z.

I have no idea how to determine what commit hashes these releases correlate to, other than downloading the source package and comparing.

This looks like a promising potential fix:

commit 1d444cb8c76159e2b50116d20764dd7a2ac66230
Author: Colin Ian King <email address hidden>
Date: Mon Mar 25 19:21:59 2024 +0000

    stress_vm_walking_flush_data: fix buffer overflow on last 7 bytes

I verified that 0.17.06-0~202403251447~ubuntuXX.YY.Z *does not* have this patch.
I verified that 0.17.06-0~202403261450~ubuntuXX.YY.Z *does* have this patch.

Our next set of tests should pick up a version with this fix, so I'll let you know if it appears to resolve this issue.

15 Mar 07:10: Running stress-ng vm stressor for 5315 seconds...
** stress-ng exited with code 2
stress-ng: info: [1379604] setting to a 1 hour, 28 mins, 35 secs run per stressor
stress-ng: info: [1379604] dispatching hogs: 256 vm
stress-ng: fail: [1379606] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379665] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379699] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379748] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379769] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379762] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379843] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379849] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379695] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379729] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379746] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379631] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379619] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379837] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379705] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379766] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379675] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379687] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379821] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379622] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379691] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379749] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379706] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379783] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379611] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379648] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379605] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379681] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: error: [1379604] vm: [1379605] terminated with an error, exit status=2 (stressor failed)
stress-ng: fail: [1379753] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: fail: [1379664] vm: detected 1694364648734976 bit errors while stressing memory
[...]
stress-ng: fail: [1379609] vm: detected 1694364648734976 bit errors while stressing memory
stress-ng: error: [1379604] vm: [1379609] terminated with an error, exit status=2 (stressor failed)
stress-ng: error: [1379604] vm: [1379610] terminated with an error, exit status=2 (stressor failed)
stress-ng: error: [1379604] vm: [1379611] terminated with an error, exit status=2 (stressor failed)
[...]
stress-ng: error: [1379604] vm: [1379859] terminated with an error, exit status=2 (stressor failed)
stress-ng: error: [1379604] vm: [1379860] terminated with an error, exit status=2 (stressor failed)
stress-ng: info: [1379604] skipped: 0
stress-ng: info: [1379604] passed: 0
stress-ng: info: [1379604] failed: 255: vm (255)
stress-ng: info: [1379604] metrics untrustworthy: 0
stress-ng: info: [1379604] unsuccessful run completed in 1 hour, 28 mins, 36.82 secs

Revision history for this message
dann frazier (dannf) wrote :

The last failure we saw was 3 days ago, running 0.17.06-0~202403251447~ubuntuXX.YY.Z.

We've seen 6 successes since then, the first one was using 0.17.06-0~202403261450~ubuntuXX.YY.Z.

I'll mark this fix released!

Changed in stress-ng:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.