daemon test in ubuntu_stress_smoke did not pass on D-s390x KVM

Bug #1849595 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Fix Released
High
Colin Ian King
ubuntu-kernel-tests
Fix Released
Undecided
Unassigned

Bug Description

Issue found on Disco s390x KVM

Reproduce rate: 2 out of 3 runs

The test was cut-off at the daemon test, and marked as failed.

Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
no label, UUID=94bb85af-3b98-4354-ac16-413d118ed811

Machine Configuration
Physical Pages: 512613
Pages available: 4484
Page Size: 4096
Zswap enabled: Y

Free memory:
              total used free shared buff/cache available
Mem: 2050452 197700 17624 548 1835128 1800876
Swap: 3145720 264 3145456

Number of CPUs: 2
Number of CPUs Online: 2

access STARTING
access RETURNED 0
access PASSED
af-alg STARTING
af-alg RETURNED 0
af-alg PASSED
affinity STARTING
affinity RETURNED 0
affinity PASSED
aio STARTING
aio RETURNED 0
aio PASSED
aiol STARTING
aiol RETURNED 0
aiol PASSED
bad-altstack STARTING
bad-altstack RETURNED 0
bad-altstack PASSED
bigheap STARTING
bigheap RETURNED 0
bigheap PASSED
branch STARTING
branch RETURNED 0
branch PASSED
brk STARTING
brk RETURNED 0
brk PASSED
cache STARTING
cache RETURNED 0
cache PASSED
cap STARTING
cap RETURNED 0
cap PASSED
chdir STARTING
chdir RETURNED 0
chdir PASSED
chmod STARTING
chmod RETURNED 0
chmod PASSED
chown STARTING
chown RETURNED 0
chown PASSED
chroot STARTING
chroot RETURNED 0
chroot PASSED
clock STARTING
clock RETURNED 0
clock PASSED
clone STARTING
clone RETURNED 0
clone PASSED
close STARTING
close RETURNED 0
close PASSED
context STARTING
context RETURNED 0
context PASSED
cpu STARTING
cpu RETURNED 0
cpu PASSED
crypt STARTING
crypt RETURNED 0
crypt PASSED
cyclic STARTING
cyclic RETURNED 0
cyclic PASSED
daemon STARTING
daemon RETURNED 0

Po-Hsu Lin (cypressyew)
summary: - daemon test in ubuntu_stress_smoke did pass on D-s390x KVM
+ daemon test in ubuntu_stress_smoke did not pass on D-s390x KVM
tags: added: 5.0 disco s390x sru-20191021 ubuntu-stress-smoke-test
Changed in stress-ng:
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

11:45:43 DEBUG| stderr:
11:45:43 DEBUG| dd: failed to open '/home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img': Text file busy
11:45:43 DEBUG| mkswap: error: /home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img is mounted; will not make swapspace
11:45:43 DEBUG| swapon: /home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img: swapon failed: Device or resource busy
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: Resource temporarily unavailable
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: line 151: [: too many arguments
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:43 DEBUG| /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: Resource temporarily unavailable
11:45:43 INFO | ERROR ubuntu_stress_smoke_test.stress-smoke-test ubuntu_stress_smoke_test.stress-smoke-test timestamp=1571917543 localtime=Oct 24 11:45:43 Command </home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh> failed, rc=254, Command returned non-zero exit status

Revision history for this message
Colin Ian King (colin-king) wrote :

1:45:08 DEBUG| [stdout] daemon STARTING
11:45:13 DEBUG| [stdout] daemon RETURNED 0
11:45:13 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:14 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:16 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:20 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:28 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: Resource temporarily unavailable
11:45:28 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: line 151: [: too many arguments
11:45:28 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:29 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:31 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:35 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: retry: Resource temporarily unavailable
11:45:43 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: fork: Resource temporarily unavailable

Revision history for this message
Colin Ian King (colin-king) wrote :

Seems that the forking of a daemon can be too aggressive in the retry loop when fork fails and init is still playing catchup trying to reap zombies and can't free up process slots quick enough. To give init some chance of catching up I've added a minor backoff delay before attempting to fork again on fork failures.

Fix committed:
https://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=a8f0f4d887033eb74631b0f53317924e15242c4c

Changed in stress-ng:
status: New → Fix Committed
Revision history for this message
Colin Ian King (colin-king) wrote :

I've run the tests now 10+ times with this fix and not seen any more issues.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Retesting this on s2lp6g001 and s2lp6g003.

Passed on s2lp6g001
Failed on s2lp6g003

Test started 11hrs ago, I will give them another try.

Revision history for this message
Colin Ian King (colin-king) wrote :

I've found a corner case where fork() failures on start up can result in test failure. I've pushed another fix to address this issue:

https://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=61e4532d35a7b5947d226117f62b560453c0535f

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Sam, do you mind re-testing as I've pushed some more minor changes that may help resolve this issue?

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Two more re-tests shows positive test result, I think this can be closed now.
Thank you!

Changed in stress-ng:
status: Fix Committed → Fix Released
Changed in ubuntu-kernel-tests:
status: New → Fix Released
Revision history for this message
Colin Ian King (colin-king) wrote :

great news!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.