eeh-basic.sh in powerpc from ubuntu_kernel_selftests timeout with 5.4 P8 / P9
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
Fix Released
|
Undecided
|
Po-Hsu Lin | ||
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Unassigned | ||
Groovy |
Fix Released
|
Undecided
|
Unassigned | ||
Hirsute |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
The breakable devices test is hardware-dependent. In our test pool
it will take about:
* 30 seconds to run on a Power8 system that with 5 breakable devices,
* 60 seconds to run on a Power9 system that with 4 breakable devices.
The default 45 seconds kselftest framework timeout is not enough to
allow this test to finish on some nodes. Thus causing this test to
fail with TIMEOUT error.
[Fix]
* f5eca0b279117f ("selftests/
setting for eeh-basic")
We have this testcase since Focal, and this patch can be cherry-picked
into all affected releases.
[Test case]
Run this test on P9 node baltar, on which this timeout issue can be
100% reproduced. With this patch applied, the test can finish without
being terminated by the default timeout.
[Where problems could occur]
This will make test takes longer to finish, but it's still being
controlled by the timeout mechanism both in the test case and
kselftest framework. It's unlikely to make the test hang forever.
== Original Bug Report ==
Issue found on 5.4.0-34.38, Focal P9 "baltar"
# selftests: powerpc/eeh: eeh-basic.sh
# 0000:00:00.0, Skipped: bridge
# 0001:00:00.0, Skipped: bridge
# 0002:00:00.0, Skipped: bridge
# 0002:01:00.0, Added
# 0003:00:00.0, Skipped: bridge
# 0003:01:00.0, Added
# 0004:00:00.0, Skipped: bridge
# 0004:01:00.0, Skipped: bridge
# 0004:02:00.0, Added
# 0005:00:00.0, Skipped: bridge
# 0005:01:00.0, Added
# 0030:00:00.0, Skipped: bridge
# 0031:00:00.0, Skipped: bridge
# 0032:00:00.0, Skipped: bridge
# 0033:00:00.0, Skipped: bridge
# Found 4 breakable devices...
# Breaking 0002:01:00.0...
# 0002:01:00.0, waited 0/60
# 0002:01:00.0, waited 1/60
# 0002:01:00.0, waited 2/60
# 0002:01:00.0, waited 3/60
# 0002:01:00.0, waited 4/60
# 0002:01:00.0, waited 5/60
# 0002:01:00.0, waited 6/60
# 0002:01:00.0, Recovered after 7 seconds
# Breaking 0003:01:00.0...
# 0003:01:00.0, waited 0/60
# 0003:01:00.0, waited 1/60
# 0003:01:00.0, waited 2/60
# 0003:01:00.0, waited 3/60
# 0003:01:00.0, waited 4/60
# 0003:01:00.0, waited 5/60
# 0003:01:00.0, waited 6/60
# 0003:01:00.0, waited 7/60
# 0003:01:00.0, waited 8/60
# 0003:01:00.0, waited 9/60
# 0003:01:00.0, waited 10/60
# 0003:01:00.0, waited 11/60
# 0003:01:00.0, waited 12/60
# 0003:01:00.0, waited 13/60
# 0003:01:00.0, waited 14/60
# 0003:01:00.0, waited 15/60
# 0003:01:00.0, waited 16/60
# 0003:01:00.0, waited 17/60
# 0003:01:00.0, waited 18/60
# 0003:01:00.0, waited 19/60
# 0003:01:00.0, waited 20/60
# 0003:01:00.0, waited 21/60
# 0003:01:00.0, waited 22/60
# 0003:01:00.0, waited 23/60
# 0003:01:00.0, waited 24/60
# 0003:01:00.0, waited 25/60
# 0003:01:00.0, waited 26/60
# 0003:01:00.0, waited 27/60
# 0003:01:00.0, waited 28/60
# 0003:01:00.0, waited 29/60
# 0003:01:00.0, waited 30/60
# 0003:01:00.0, waited 31/60
# 0003:01:00.0, waited 32/60
# 0003:01:00.0, waited 33/60
# 0003:01:00.0, waited 34/60
# 0003:01:00.0, waited 35/60
# 0003:01:00.0, Recovered after 36 seconds
# Breaking 0004:02:00.0...
# 0004:02:00.0, Recovered after 0 seconds
# Breaking 0005:01:00.0...
# 0005:01:00.0, waited 0/60
# 0005:01:00.0, waited 1/60
#
not ok 1 selftests: powerpc/eeh: eeh-basic.sh # TIMEOUT
CVE References
tags: | added: 5.4 focal ppc64el sru-20200518 ubuntu-kernel-selftests |
tags: | added: kqa-blocker |
tags: | added: sru-20200608 |
tags: | added: sru-20200810 |
Changed in linux (Ubuntu Focal): | |
status: | New → In Progress |
Changed in linux (Ubuntu Hirsute): | |
status: | New → In Progress |
Changed in linux (Ubuntu Groovy): | |
status: | New → In Progress |
description: | updated |
description: | updated |
Changed in linux (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Groovy): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-focal verification-done-groovy removed: verification-needed-focal verification-needed-groovy |
Changed in ubuntu-kernel-tests: | |
status: | In Progress → Fix Released |
This issue can be found on Bionic 5.4.0-42.46~18.04.1 P8 as well, but with some other "No such file" error in the log:
# selftests: powerpc/eeh: eeh-basic.sh pci/devices/ 0001:09: 00.0/eeh_ pe_state: No such file pci/devices/ 0005:03: 00.0/eeh_ pe_state: No such file
# 0000:00:00.0, Skipped: bridge
# 0001:00:00.0, Skipped: bridge
# 0001:01:00.0, Skipped: bridge
# 0001:02:01.0, Skipped: bridge
# 0001:02:08.0, Skipped: bridge
# 0001:02:09.0, Skipped: bridge
# 0001:08:00.0, Added
# 0001:09:00.0, Added
# 0004:00:00.0, Skipped: bridge
# 0005:00:00.0, Skipped: bridge
# 0005:01:00.0, Skipped: bridge
# 0005:02:01.0, Skipped: bridge
# 0005:02:08.0, Skipped: bridge
# 0005:02:09.0, Skipped: bridge
# 0005:02:10.0, Skipped: bridge
# 0005:02:11.0, Skipped: bridge
# 0005:03:00.0, Added
# 0005:04:00.0, Added
# 0005:05:00.0, Added
# 0040:00:00.0, Skipped: bridge
# 0044:00:00.0, Skipped: bridge
# 0045:00:00.0, Skipped: bridge
# Found 5 breakable devices...
# Breaking 0001:08:00.0...
# 0001:08:00.0, waited 0/60
# 0001:08:00.0, waited 1/60
# 0001:08:00.0, waited 2/60
# 0001:08:00.0, Recovered after 3 seconds
# Breaking 0001:09:00.0...
# cut: -: No such device
# ./eeh-basic.sh: 13: ./eeh-basic.sh: cannot open /sys/bus/
# 0001:09:00.0, waited 0/60
# 0001:09:00.0, waited 1/60
# 0001:09:00.0, waited 2/60
# 0001:09:00.0, waited 3/60
# 0001:09:00.0, waited 4/60
# 0001:09:00.0, waited 5/60
# 0001:09:00.0, waited 6/60
# 0001:09:00.0, waited 7/60
# 0001:09:00.0, Recovered after 8 seconds
# Breaking 0005:03:00.0...
# cut: -: No such device
# ./eeh-basic.sh: 13: ./eeh-basic.sh: cannot open /sys/bus/
# 0005:03:00.0, waited 0/60
# 0005:03:00.0, waited 1/60
# 0005:03:00.0, waited 2/60
# 0005:03:00.0, waited 3/60
# 0005:03:00.0, waited 4/60
# 0005:03:00.0, waited 5/60
# 0005:03:00.0, waited 6/60
# 0005:03:00.0, waited 7/60
# 0005:03:00.0, Recovered after 8 seconds
# Breaking 0005:04:00.0...
# 0005:04:00.0, waited 0/60
# 0005:04:00.0, waited 1/60
# 0005:04:00.0, waited 2/60
# 0005:04:00.0, Recovered after 3 seconds
# Breaking 0005:05:00.0...
# 0005:05:00.0, waited 0/60
# 0005:05:00.0, waited 1/60
# 0005:05:00.0, waited 2/60
# 0005:05:00.0, Recovered after 3 seconds
# 0 devices failed to recover (5 tested)
#
not ok 1 selftests: powerpc/eeh: eeh-basic.sh # TIMEOUT