sigio in ubuntu_stress_smoke_tests failed on X-lowlatency with node spitfire

Bug #1918095 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Fix Released
Critical
Colin Ian King
ubuntu-kernel-tests
Invalid
Medium
Colin Ian King

Bug Description

Issue found on with 4.4.0-204.236 - lowlatency on node spitfire in Intel cloud

Reproduce rate: 2/2

 sigio FAILED
 stress-ng: debug: [80678] 128 processors online, 128 processors configured
 stress-ng: info: [80678] dispatching hogs: 4 sigio
 stress-ng: debug: [80678] cache allocate: default cache size: 49152K
 stress-ng: debug: [80678] starting stressors
 stress-ng: debug: [80678] 4 stressors started
 stress-ng: debug: [80682] stress-ng-sigio: started [80682] (instance 3)
 stress-ng: debug: [80679] stress-ng-sigio: started [80679] (instance 0)
 stress-ng: debug: [80680] stress-ng-sigio: started [80680] (instance 1)
 stress-ng: debug: [80681] stress-ng-sigio: started [80681] (instance 2)
 stress-ng: debug: [80679] stress-ng-sigio: exited [80679] (instance 0)
 stress-ng: debug: [80680] stress-ng-sigio: exited [80680] (instance 1)
 stress-ng: debug: [80678] process [80679] terminated
 stress-ng: debug: [80678] process [80680] terminated
 stress-ng: debug: [80678] process [80681] (stress-ng-sigio) terminated on signal: 11 (Segmentation fault)
 stress-ng: debug: [80678] process [80681] terminated
 stress-ng: debug: [80678] process [80682] (stress-ng-sigio) terminated on signal: 11 (Segmentation fault)
 stress-ng: debug: [80678] process [80682] terminated
 stress-ng: info: [80678] unsuccessful run completed in 5.92s
 stress-ng: fail: [80678] sigio instance 2 corrupted bogo-ops counter, 2093457 vs 0
 stress-ng: fail: [80678] sigio instance 2 hash error in bogo-ops counter and run flag, 2817875306 vs 0
 stress-ng: fail: [80678] sigio instance 3 corrupted bogo-ops counter, 1810644 vs 0
 stress-ng: fail: [80678] sigio instance 3 hash error in bogo-ops counter and run flag, 20429112 vs 0
 stress-ng: fail: [80678] metrics-check: stressor metrics corrupted, data is compromised
 info: 5 failures reached, aborting stress process

 [ 1355.641466] stress-ng[80682]: segfault at 7fbe9f57dc80 ip 00007fbe9f372f1f sp 00007fbe9f57dc80 error 6 in ld-2.23.so[7fbe9f35b000+26000]
 [ 1355.641699] stress-ng[80681]: segfault at 7fbe9f57dc80 ip 00007fbe9f372f1f sp 00007fbe9f57dc80 error 6 in ld-2.23.so[7fbe9f35b000+26000]

Po-Hsu Lin (cypressyew)
tags: added: 4.4 amd64 sru-20210222 xenial
tags: added: ubuntu-stress-smoke-test
Po-Hsu Lin (cypressyew)
tags: added: kqa-blocker
tags: added: sru-20210412
Revision history for this message
Colin Ian King (colin-king) wrote :

I'm having problems reproducing this on my H/W. Can an instance on spitfire be set up for me so I can debug this further?

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Issue also found with xenial/linux-hwe 4.15.0-143.147~16.04.3.

tags: added: 4.15 hwe
Revision history for this message
Colin Ian King (colin-king) wrote :

@Kleber, do you have the logs for the failure in #2?

Revision history for this message
Colin Ian King (colin-king) wrote :

..and what machine/arch is it failing on for the failure in comment #2 ? more would be really helpful.

Revision history for this message
Colin Ian King (colin-king) wrote :

Urgh, was a subtle race condition on pending SIGIO once the stressor had completed and yet it was reporting an issue via the logging mechanism that triggered another SIGIO and this stacked up a bit.

Fix committed:
https://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=364a8cc7c18a75d97f5c4096cce2a2ddfa20ed04

Changed in stress-ng:
status: New → Fix Committed
importance: Undecided → Critical
assignee: nobody → Colin Ian King (colin-king)
Changed in ubuntu-kernel-tests:
status: New → Invalid
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

SIGIO completed successfully with xenial/linux-hwe 4.15.0-143.147~16.04.3, both generic and lowlatency.

Thanks, Colin!

Changed in stress-ng:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.