powerpc/tm:tm-resched-dscr in ubuntu_kernel_selftests flaky on P8 node gulpin

Bug #2007908 reported by Po-Hsu Lin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

Issue found with Focal hwe-5.15.0-66.73~20.04.1 on Power8 node "gulpin".

Test failed with:
  test: tm_resched_dscr
  tags: git_version:bfd31f0-dirty
  Binding to cpu 8
  main test running as pid 2542
  Check DSCR TM context switch:
  !! killing tm_resched_dscr
  !! child died by signal 15
  failure: tm_resched_dscr

Nothing interesting in dmesg / syslog:
$ dmesg | tail
[ 20.556968] IPv6: ADDRCONF(NETDEV_CHANGE): enP34p1s0f0: link becomes ready
[ 20.572182] tg3 0022:01:00.2 enP34p1s0f2: Link is up at 1000 Mbps, full duplex
[ 20.572187] tg3 0022:01:00.2 enP34p1s0f2: Flow control is off for TX and off for RX
[ 20.572191] tg3 0022:01:00.2 enP34p1s0f2: EEE is disabled
[ 20.572201] IPv6: ADDRCONF(NETDEV_CHANGE): enP34p1s0f2: link becomes ready
[ 24.598633] kauditd_printk_skb: 19 callbacks suppressed
[ 24.598640] audit: type=1400 audit(1676957767.768:31): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/snap/bin/" pid=2171 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 27.682671] loop3: detected capacity change from 0 to 8
[ 28.618195] fbcon: Taking over console
[ 28.747706] Console: switching to colour frame buffer device 128x48

This issue does not exist with 5.15.0-60.66~20.04.1
  test: tm_resched_dscr
  tags: git_version:bfd31f0-dirty
  Binding to cpu 8
  main test running as pid 28806
  Check DSCR TM context switch: OK
  success: tm_resched_dscr

We didn't catch this issue in Jammy 5.15 as Power8 was not supported there, and this test was skipped on the Power9 node that got tested with Jammy (looks like because of the lack of PPC_FEATURE2_HTM)

Po-Hsu Lin (cypressyew)
summary: [Possible Regression] powerpc/tm:tm-resched-dscr in
- ubuntu_kernel_selftests failed
+ ubuntu_kernel_selftests failed on P8
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2007908

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Stefan Bader (smb)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: [Possible Regression] powerpc/tm:tm-resched-dscr in ubuntu_kernel_selftests failed on P8

stress tested with the following command on node gulpin with -66 shows this test is a bit flaky:

failcnt=0
for i in `seq 1 100`; do
   time sudo ./tm-resched-dscr
   if [ $? -ne 0 ]; then
        let "failcnt=$failcnt+1"
   fi
done
echo $failcnt

There are 27 failures in 100 attempts.

Next is to stress test with -60

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Here is the test output with -66.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

stress tested with 5.15.0-60-generic, there are 20 failures reported in 100 attempts.

So it looks like a flaky test instead of a regression. I was too lucky to get multiple failures in a row.

summary: - [Possible Regression] powerpc/tm:tm-resched-dscr in
- ubuntu_kernel_selftests failed on P8
+ powerpc/tm:tm-resched-dscr in ubuntu_kernel_selftests flaky on P8 node
+ gulpin
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.