Return code reports success when malloc test fails

Bug #1544575 reported by Rod Smith
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Fix Released
High
Colin Ian King

Bug Description

I ran stress-ng 0.05.12 on a 32-bit computer and got the following output:

rodsmith@tunesmith:~$ stress-ng --aggressive --verify --timeout 60 --malloc 0
stress-ng: info: [5500] dispatching hogs: 2 malloc
*** Error in `stress-ng': free(): invalid next size (fast): 0x07af3fe8 ***
stress-ng: info: [5500] successful run completed in 0.17s
rodsmith@tunesmith:~$ echo $?
0

Clearly, the test failed -- it both reported an error and completed in a (reported) 0.17 seconds rather than the requested 60 seconds. The return code, though, was 0, indicating success.

The system on which I encountered this problem reports errors in this test >50% of the time, so I can easily test any code changes.

Changed in stress-ng:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

==30870== Syscall param mincore(vec) points to unaddressable byte(s)
==30870== at 0x41EB9E9: mincore (syscall-template.S:81)
==30870== by 0x808CE10: mincore_touch_pages (mincore.c:61)
==30870== by 0x806C457: stress_malloc (stress-malloc.c:218)
==30870== by 0x80FC404: stress_run (in /home/king/repos/stress-ng/stress-ng)
==30870== by 0x804F279: main (stress-ng.c:3115)
==30870== Address 0x45fd29b is 0 bytes after a block of size 11 alloc'd
==30870== at 0x402E0D8: calloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==30870== by 0x808CDF9: mincore_touch_pages (mincore.c:57)
==30870== by 0x806C457: stress_malloc (stress-malloc.c:218)
==30870== by 0x80FC404: stress_run (in /home/king/repos/stress-ng/stress-ng)
==30870== by 0x804F279: main (stress-ng.c:3115)
==30870==

Revision history for this message
Jeff Lane  (bladernr) wrote :

I believe that's by design for now... I think I asked Colin the same question (about hte numa test) when I first started looking at it. IIRC, he offered to change that behaviour, but at the time I didn't worry about it too much, thinking we could just interpret the output and generate the right exit code from that (similar to what fwts_test does to ensure it only passes a failing exit code on high or critical fwts errors.

Revision history for this message
Colin Ian King (colin-king) wrote :

I believe it's a rounding error in

addr[i] = calloc(len / n, len * n);

...
...

(void)mincore_touch_pages(addr[i], len);

Revision history for this message
Colin Ian King (colin-king) wrote :

The bug is that the address being passed to mincore() should be page aligned.

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Rod, do you mind testing stress-ng on your 32 bit platform with the latest fixes from my git repo:

git clone git://kernel.ubuntu.com/cking/stress-ng
cd stress-ng
make
./stress-ng --aggressive --verify --timeout 60 --malloc 0

I think the fix should do the trick.

Revision history for this message
Colin Ian King (colin-king) wrote :
Revision history for this message
Rod Smith (rodsmith) wrote :

I've run it about a dozen times with no errors.

For clarification, should the return code be non-0 if an error like this occurs? The man page seems to say it should:

EXIT STATUS
         Status Description
           0 Success.
           1 Error; incorrect user options or a fatal resource issue
                    (for example, out of memory).
           2 One or more stressors failed.

Revision history for this message
Colin Ian King (colin-king) wrote :

It should be a 1, for some reason a SIGSEGV is not being picked up by the parent. I'll see if I can fix that too.

Revision history for this message
Colin Ian King (colin-king) wrote :

I've tweaked the code now to pick up crashes from the child stressors and report them as errors, for example, I deliberately made the io stressor segfault - and this is what we get:

~/repos/stress-ng$ ./stress-ng --io 1 -v
stress-ng: debug: [13494] 4 processors online, 4 processors configured
stress-ng: info: [13494] defaulting to a 86400 second run per stressor
stress-ng: info: [13494] dispatching hogs: 1 iosync
stress-ng: info: [13494] cache allocate: default cache size: 3072K
stress-ng: debug: [13494] starting stressors
stress-ng: debug: [13494] 1 stressor spawned
stress-ng: debug: [13495] stress-ng-iosync: started [13495] (instance 0)
stress-ng: debug: [13494] process 13495 (stress-ng-iosync) terminated on signal: 11 (Segmentation fault)
stress-ng: debug: [13494] process [13495] terminated
stress-ng: info: [13494] unsuccessful run completed in 0.00s
echo $?
2

(2 because the stressor failed).

Revision history for this message
Colin Ian King (colin-king) wrote :
Changed in stress-ng:
status: In Progress → Fix Committed
Changed in stress-ng:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.