dscr_sysfs_test / futex_bench / tm-unavailable in powerpc from ubuntu_kernel_selftests timeout on PowerPC nodes with B-5.3

Bug #1864642 reported by Po-Hsu Lin on 2020-02-25
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Undecided
Po-Hsu Lin
linux (Ubuntu)
Undecided
Po-Hsu Lin
Eoan
Undecided
Po-Hsu Lin

Bug Description

== SRU Justification ==
Some specific tests in powerpc can take longer than the default 45
seconds that added in commit 852c8cbf34d3 ("selftests/kselftest/runner.sh:
Add 45 second timeout per test") to run, the following test result was
collected across 2 Power8 nodes and 1 Power9 node in our pool:
  powerpc/benchmarks/futex_bench - 52s
  powerpc/dscr/dscr_sysfs_test - 116s
  powerpc/signal/signal_fuzzer - 88s
  powerpc/tm/tm_unavailable_test - 168s
  powerpc/tm/tm-poison - 240s

Thus they will fail with TIMEOUT error.

== Fix ==
* 850507f3 ("selftests/powerpc: Turn off timeout setting for benchmarks, dscr, signal, tm")

Only Eoan and newer kernel will need this fix.
For Eoan, this patch can be applied with some context adjustment.
For Focal, there is a SAUCE patch which turned off timeout setting for
benchmarks and tm that needs to be reverted first, and this patch can
be applied with some context adjustment too

== Test ==
Patch tested on PowerPC and it works as expected.

== Regression Potential ==
Low, changes limited to testing tools for PowerPC.

== Original Bug Report ==
Issue found on Power9 node baltar with B-5.3 (5.3.0-41.33~18.04.1-generic)

 # selftests: benchmarks: futex_bench
 # test: futex_bench
 # tags: git_version:unknown
 # time = 52.042224
 #
 not ok 5 selftests: benchmarks: futex_bench # TIMEOUT

 # selftests: dscr: dscr_sysfs_test
 # test: dscr_sysfs_test
 # tags: git_version:unknown
 #
 not ok 6 selftests: dscr: dscr_sysfs_test # TIMEOUT

Need to check if this has something to do with the timeout setting like in bug 1864626

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-5.3.0-40-generic 5.3.0-40.32~18.04.1
ProcVersionSignature: Ubuntu 5.3.0-40.32~18.04.1-generic 5.3.18
Uname: Linux 5.3.0-40-generic ppc64le
ApportVersion: 2.20.9-0ubuntu7.11
Architecture: ppc64el
Date: Tue Feb 25 11:23:58 2020
ProcLoadAvg: 1.28 22.46 46.59 3/1347 62287
ProcLocks:
 1: POSIX ADVISORY WRITE 3837 00:18:562 0 EOF
 2: POSIX ADVISORY WRITE 3864 00:18:588 0 EOF
 3: FLOCK ADVISORY WRITE 4515 00:18:463 0 EOF
 4: FLOCK ADVISORY WRITE 3844 00:18:579 0 EOF
 5: POSIX ADVISORY WRITE 1820 00:18:343 0 EOF
ProcSwaps:
 Filename Type Size Used Priority
 /swap.img file 8388544 0 -2
ProcVersion: Linux version 5.3.0-40-generic (buildd@bos02-ppc64el-007) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:15 UTC 2020
SourcePackage: linux-signed-hwe
UpgradeStatus: No upgrade log present (probably fresh install)
VarLogDump_list: total 0
cpu_cores: Number of cores present = 40
cpu_coreson: Number of cores online = 39
cpu_dscr: DSCR is 9
cpu_freq:
 min: 2.862 GHz (cpu 79)
 max: 2.945 GHz (cpu 81)
 avg: 2.903 GHz
cpu_runmode:
 Could not retrieve current diagnostics mode,
 No kernel interface to firmware
cpu_smt: SMT=4

CVE References

Po-Hsu Lin (cypressyew) wrote :
Po-Hsu Lin (cypressyew) wrote :

On the same power9 node:

$ time sudo ./dscr_sysfs_test
test: dscr_sysfs_test
tags: git_version:4e2ba00-dirty
success: dscr_sysfs_test

real 1m56.025s
user 0m0.333s
sys 1m55.403s

$ time sudo ./futex_bench
test: futex_bench
tags: git_version:4e2ba00-dirty
time = 52.114582
success: futex_bench

real 0m52.126s
user 0m9.259s
sys 0m42.868s

tags: added: sru-20200217 ubuntu-kernel-selftests
tags: added: 5.3

On a P8 node modoc with B-5.3:
$ time sudo ./futex_bench
test: futex_bench
tags: git_version:4e2ba00-dirty
time = 69.705913
success: futex_bench

real 1m9.720s
user 0m28.526s
sys 0m41.191s

$ time sudo ./tm-unavailable
test: tm_unavailable_test
tags: git_version:4e2ba00-dirty
Checking if FP/VEC registers are sane after a FP unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
Checking if FP/VEC registers are sane after a VEC unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
Checking if FP/VEC registers are sane after a VSX unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
result: success
success: tm_unavailable_test

real 2m49.238s
user 2m4.087s
sys 0m45.131s

summary: dscr_sysfs_test / futex_bench in powerpc from ubuntu_kernel_selftests
- failed on B-5.3
+ timeout on B-5.3
summary: - dscr_sysfs_test / futex_bench in powerpc from ubuntu_kernel_selftests
- timeout on B-5.3
+ dscr_sysfs_test / futex_bench / tm-unavailable in powerpc from
+ ubuntu_kernel_selftests timeout on PowerPC nodes with B-5.3
Po-Hsu Lin (cypressyew) wrote :

On P8 node witchita, with Eoan 5.3 kernel + git branch from upstream:
$ time sudo ./futex_bench
test: futex_bench
tags: git_version:v5.6-rc4-0-g98d54f81e
time = 46.302199
success: futex_bench

real 0m46.313s
user 0m9.172s
sys 0m37.138s

$ time sudo ./tm-unavailable
test: tm_unavailable_test
tags: git_version:v5.6-rc4-0-g98d54f81e
Checking if FP/VEC registers are sane after a FP unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
^[[1;5BIf MSR.FP=1 MSR.VEC=1: FP ok VEC ok
Checking if FP/VEC registers are sane after a VEC unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
Checking if FP/VEC registers are sane after a VSX unavailable exception...
If MSR.FP=0 MSR.VEC=0: FP ok VEC ok
If MSR.FP=1 MSR.VEC=0: FP ok VEC ok
If MSR.FP=0 MSR.VEC=1: FP ok VEC ok
If MSR.FP=1 MSR.VEC=1: FP ok VEC ok
result: success
success: tm_unavailable_test

real 2m48.540s
user 1m55.433s
sys 0m53.083s

$ time sudo ./tm-poison
test: tm_poison_test
tags: git_version:v5.6-rc4-0-g98d54f81e
Good, no poison or leaked value into FP registers
Good, no poison or leaked value into VEC registers
success: tm_poison_test

real 4m0.025s
user 2m0.001s
sys 0m0.008s

$ time sudo ./sigfuz
test: signal_fuzzer
tags: git_version:v5.6-rc4-0-g98d54f81e
success: signal_fuzzer

real 1m28.684s
user 0m6.332s
sys 0m5.206s

Po-Hsu Lin (cypressyew) on 2020-03-03
Changed in ubuntu-kernel-tests:
assignee: nobody → Po-Hsu Lin (cypressyew)
status: New → In Progress
Po-Hsu Lin (cypressyew) on 2020-03-04
tags: added: kqa-blocker
affects: linux-signed-hwe (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Po-Hsu Lin (cypressyew)
status: New → In Progress
Sean Feole (sfeole) wrote :

Giving that this is a simple test timeout and not a critical failure, i'm removing the kqa-blocker tag. Please reference bug 1864626, this may be similar giving that there was a patch submitted to ubuntu-kernel-selftests that allows the user to define timeouts via settings file

https://<email address hidden>/

tags: removed: kqa-blocker
Po-Hsu Lin (cypressyew) on 2020-04-06
Changed in linux (Ubuntu Eoan):
assignee: nobody → Po-Hsu Lin (cypressyew)
Po-Hsu Lin (cypressyew) on 2020-04-06
description: updated
Po-Hsu Lin (cypressyew) on 2020-04-06
description: updated
Launchpad Janitor (janitor) wrote :
Download full text (35.2 KiB)

This bug was fixed in the package linux - 5.4.0-24.28

---------------
linux (5.4.0-24.28) focal; urgency=medium

  * focal/linux: 5.4.0-24.28 -proposed tracker (LP: #1871939)

  * getitimer returns it_value=0 erroneously (LP: #1349028)
    - [Config] CONTEXT_TRACKING_FORCE policy should be unset

  * 12d1:1038 Dual-Role OTG device on non-HNP port - unable to enumerate USB
    device on port 1 (LP: #1047527)
    - [Config] USB_OTG_FSM policy not needed

  * Add DCPD backlight support for HP CML system (LP: #1871589)
    - SAUCE: drm/i915: Force DPCD backlight mode for HP CML 2020 system

  * Backlight brightness cannot be adjusted using keys (LP: #1860303)
    - SAUCE drm/i915: Force DPCD backlight mode for HP Spectre x360 Convertible
      13t-aw100

  * CVE-2020-11494
    - slcan: Don't transmit uninitialized stack data in padding

  * Ubuntu Kernel Support for OpenPOWER NV Secure & Trusted Boot (LP: #1866909)
    - powerpc: Detect the secure boot mode of the system
    - powerpc/ima: Add support to initialize ima policy rules
    - powerpc: Detect the trusted boot state of the system
    - powerpc/ima: Define trusted boot policy
    - ima: Make process_buffer_measurement() generic
    - certs: Add wrapper function to check blacklisted binary hash
    - ima: Check against blacklisted hashes for files with modsig
    - powerpc/ima: Update ima arch policy to check for blacklist
    - powerpc/ima: Indicate kernel modules appended signatures are enforced
    - powerpc/powernv: Add OPAL API interface to access secure variable
    - powerpc: expose secure variables to userspace via sysfs
    - x86/efi: move common keyring handler functions to new file
    - powerpc: Load firmware trusted keys/hashes into kernel keyring
    - x86/efi: remove unused variables

  * [roce-0227]sync mainline kernel 5.6rc3 roce patchset into ubuntu HWE kernel
    branch (LP: #1864950)
    - RDMA/hns: Cleanups of magic numbers
    - RDMA/hns: Optimize eqe buffer allocation flow
    - RDMA/hns: Add the workqueue framework for flush cqe handler
    - RDMA/hns: Delayed flush cqe process with workqueue
    - RDMA/hns: fix spelling mistake: "attatch" -> "attach"
    - RDMA/hns: Initialize all fields of doorbells to zero
    - RDMA/hns: Treat revision HIP08_A as a special case
    - RDMA/hns: Use flush framework for the case in aeq
    - RDMA/hns: Stop doorbell update while qp state error
    - RDMA/hns: Optimize qp destroy flow
    - RDMA/hns: Optimize qp context create and destroy flow
    - RDMA/hns: Optimize qp number assign flow
    - RDMA/hns: Optimize qp buffer allocation flow
    - RDMA/hns: Optimize qp param setup flow
    - RDMA/hns: Optimize kernel qp wrid allocation flow
    - RDMA/hns: Optimize qp doorbell allocation flow
    - RDMA/hns: Check if depth of qp is 0 before configure

  * [hns3-0316]sync mainline kernel 5.6rc4 hns3 patchset into ubuntu HWE kernel
    branch (LP: #1867586)
    - net: hns3: modify an unsuitable print when setting unknown duplex to fibre
    - net: hns3: add enabled TC numbers and DWRR weight info in debugfs
    - net: hns3: add support for dump MAC ID and loopback status in debugfs
    - net: hns3: add missing help info for QS shaper...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux (Ubuntu Eoan):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Po-Hsu Lin (cypressyew) wrote :

These test are not failing with timeout anymore on 5.3.0-52.46 with node baltar.

Changed in ubuntu-kernel-tests:
status: In Progress → Fix Released
tags: added: verification-done-eoan
removed: verification-needed-eoan
Launchpad Janitor (janitor) wrote :
Download full text (38.1 KiB)

This bug was fixed in the package linux - 5.3.0-53.47

---------------
linux (5.3.0-53.47) eoan; urgency=medium

  * eoan/linux: 5.3.0-53.47 -proposed tracker (LP: #1877257)

  * Intermittent display blackouts on event (LP: #1875254)
    - drm/i915: Limit audio CDCLK>=2*BCLK constraint back to GLK only

  * Unable to handle kernel pointer dereference in virtual kernel address space
    on Eoan (LP: #1876645)
    - SAUCE: overlayfs: fix shitfs special-casing

linux (5.3.0-52.46) eoan; urgency=medium

  * eoan/linux: 5.3.0-52.46 -proposed tracker (LP: #1874752)

  * alsa: make the dmic detection align to the mainline kernel-5.6
    (LP: #1871284)
    - ALSA: hda: add Intel DSP configuration / probe code
    - ALSA: hda: fix intel DSP config
    - ALSA: hda: Allow non-Intel device probe gracefully
    - ALSA: hda: More constifications
    - ALSA: hda: Rename back to dmic_detect option
    - [Config] SND_INTEL_DSP_CONFIG=m
    - [packaging] Remove snd-intel-nhlt from modules

  * built-using constraints preventing uploads (LP: #1875601)
    - temporarily drop Built-Using data

  * ubuntu/focal64 fails to mount Vagrant shared folders (LP: #1873506)
    - [Packaging] Move virtualbox modules to linux-modules
    - [Packaging] Remove vbox and zfs modules from generic.inclusion-list

  * linux-image-5.0.0-35-generic breaks checkpointing of container
    (LP: #1857257)
    - SAUCE: overlayfs: use shiftfs hacks only with shiftfs as underlay

  * shiftfs: broken shiftfs nesting (LP: #1872094)
    - SAUCE: shiftfs: record correct creator credentials

  * Add debian/rules targets to compile/run kernel selftests (LP: #1874286)
    - [Packaging] add support to compile/run selftests

  * shiftfs: O_TMPFILE reports ESTALE (LP: #1872757)
    - SAUCE: shiftfs: fix dentry revalidation

  * getitimer returns it_value=0 erroneously (LP: #1349028)
    - [Config] CONTEXT_TRACKING_FORCE policy should be unset

  * 5.3.0-46-generic - i915 - frequent GPU hangs / resets rcs0 (LP: #1872001)
    - drm/i915/execlists: Preempt-to-busy
    - drm/i915/gt: Detect if we miss WaIdleLiteRestore
    - drm/i915/execlists: Always force a context reload when rewinding RING_TAIL

  * alsa/sof: external mic can't be deteced on Lenovo and HP laptops
    (LP: #1872569)
    - SAUCE: ASoC: intel/skl/hda - set autosuspend timeout for hda codecs

  * Eoan update: upstream stable patchset 2020-04-22 (LP: #1874325)
    - ARM: dts: sun8i-a83t-tbs-a711: HM5065 doesn't like such a high voltage
    - bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads
    - net: vxge: fix wrong __VA_ARGS__ usage
    - hinic: fix a bug of waitting for IO stopped
    - hinic: fix wrong para of wait_for_completion_timeout
    - cxgb4/ptp: pass the sign of offset delta in FW CMD
    - qlcnic: Fix bad kzalloc null test
    - i2c: st: fix missing struct parameter description
    - cpufreq: imx6q: Fixes unwanted cpu overclocking on i.MX6ULL
    - media: venus: hfi_parser: Ignore HEVC encoding for V1
    - firmware: arm_sdei: fix double-lock on hibernate with shared events
    - null_blk: Fix the null_add_dev() error path
    - null_blk: Handle null_add_dev() failures properly
    - null_blk: fix spuri...

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers