Failures of stack stressor in stress-ng 0.10.07 (in Eoan)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Stress-ng |
Fix Released
|
High
|
Colin Ian King | ||
stress-ng (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Eoan |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
== SRU Justification Eoan ==
Doing regression testing of Ubuntu 19.10, we've been seeing frequent failures of stress-ng's stack stressor. These systems have stress-ng 0.10.07-1. For instance (running manually, not in the Checkbox test script):
ubuntu@kzanol:~$ stress-ng -k --aggressive --verify --timeout 30 --stack 0
stress-ng: info: [5051] dispatching hogs: 4 stack
stress-ng: info: [5051] unsuccessful run completed in 32.01s
ubuntu@kzanol:~$ stress-ng -k --aggressive --verify --timeout 30 --stack 0
stress-ng: info: [5064] dispatching hogs: 4 stack
stress-ng: info: [5064] successful run completed in 32.96s
ubuntu@kzanol:~$ stress-ng -k --aggressive --verify --timeout 30 --stack 0
stress-ng: info: [5077] dispatching hogs: 4 stack
stress-ng: info: [5077] unsuccessful run completed in 33.09s
This problem seems to affect some systems more than others; in doing my testing, some computers fail a majority of 30-second or greater test runs (our default test run in 300s in length), but others haven't failed once over several such test runs. I haven't yet identified what's causing some systems to fail but not others.
In testing this, I installed Ubuntu 19.04, which comes with stress-ng 0.09.57-0ubuntu3, on one affected system, and encountered no problems. Upon upgrading to stress-ng 0.10.07-1 from Ubuntu 19.04, the problems returned. Thus, this appears to either be a problem with stress-ng 0.10.07-1 or this new version of stress-ng is detecting previously-
== Test case ==
Run stress-ng -k --aggressive --verify --timeout 30 --stack 0 multiple times and interrupt it with control-C (SIGINT). This can trigger a segfault. With the fix, the segfault cannot be triggered.
== Fix ==
Upstream stress-ng commits:
- 10ffe40579c5 stress-stack: return error code in child using
- ef18c524df48 stress-stack: don't throw a fatal error when
- 6245e5f62eae stress-stack: check for ENOMEM fork failure and retry
- 8fb67daea592 stress-stack: setup alternative stack in child only
The first 3 fixes are prerequisite fixes, the final fix addresses the main issue.
== Regression Potential ==
This affects just the stress-ng stack stressor. The fixes are already tested upstream fixes found in stress-ng in Ubuntu Focal. The fixes have been regression tested on arm64, amd64, i386, s390x and ppc64el architectures so the test coverage is good on these fixes. The fixes change the stack of the signal handler and also the exit of a child stress process, so the affects of the changes are small in the context of the stress test.
Changed in stress-ng: | |
importance: | Undecided → High |
assignee: | nobody → Colin Ian King (colin-king) |
status: | New → In Progress |
description: | updated |
Changed in stress-ng: | |
status: | Fix Committed → Fix Released |
Changed in stress-ng (Ubuntu Eoan): | |
status: | New → In Progress |
Changed in stress-ng (Ubuntu Focal): | |
status: | New → In Progress |
description: | updated |
Can you run the same command with the verbose option on -v to see if that shows any useful debug messages?
stress-ng -k --aggressive --verify --timeout 30 --stack 0 -v