func_stack_tracer.tc (Max stack tracer) kernel ftrace selftest fails on riscv64 unmatched

Bug #1940972 reported by Dimitri John Ledkov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Confirmed
Medium
ubuntu-kernel-tests
New
Medium
Unassigned
linux-riscv (Ubuntu)
New
Medium
Unassigned

Bug Description

$ sudo make -C tools/testing/selftests TARGETS=ftrace run_tests
...
# [32] ftrace - Max stack tracer [FAIL]
...

Testing using 5.13.0-1002.2

the full ftrace suite takes just over 2 hours on unmatched.

description: updated
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

[10052.121189] rcu: INFO: rcu_sched self-detected stall on CPU
[10052.126125] rcu: 1-....: (14995 ticks this GP) idle=a3a/1/0x4000000000000002 softirq=116019/116019 fqs=7426
[10080.057250] watchdog: BUG: soft lockup - CPU#1 stuck for 82s! [ftracetest:30662]
[10124.057368] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [ftracetest:30662]
[10148.057431] watchdog: BUG: soft lockup - CPU#2 stuck for 45s! [ftracetest:30662]
[10160.653472] rcu: INFO: rcu_sched self-detected stall on CPU
[10160.658405] rcu: 2-....: (14995 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=118075/118075 fqs=7359
[10188.057537] watchdog: BUG: soft lockup - CPU#2 stuck for 82s! [ftracetest:30662]
[10236.057668] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ftracetest:30732]
[10260.057727] watchdog: BUG: soft lockup - CPU#0 stuck for 45s! [ftracetest:30732]
[10272.113779] rcu: INFO: rcu_sched self-detected stall on CPU
[10272.118710] rcu: 0-....: (14995 ticks this GP) idle=232/1/0x4000000000000002 softirq=120182/120182 fqs=7473
[10300.057832] watchdog: BUG: soft lockup - CPU#0 stuck for 82s! [ftracetest:30732]
[10344.057950] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ftracetest:30732]
[10368.058013] watchdog: BUG: soft lockup - CPU#1 stuck for 45s! [ftracetest:30732]
[10380.714055] rcu: INFO: rcu_sched self-detected stall on CPU
[10380.718987] rcu: 1-....: (14994 ticks this GP) idle=172/1/0x4000000000000002 softirq=116702/116702 fqs=7389
[10408.058133] watchdog: BUG: soft lockup - CPU#1 stuck for 82s! [ftracetest:30732]
[10456.062284] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [ftracetest:30794]
[10480.062359] watchdog: BUG: soft lockup - CPU#3 stuck for 45s! [ftracetest:30794]

lots of soft lockups during the test

Revision history for this message
Colin Ian King (colin-king) wrote :

The test script adds a tracing filter:

echo '*lock*' > stack_trace_filter

..that takes a few minutes to apply and during that time we see the RCU and/or softlockup messages. This is kind of expected since we're blocked in kernel space updating ftrace points across the kernel which is slow and expensive on the RISC-V unmatched/unleashed H/W.

One remedy is to do the following:

echo 0 > /proc/sys/kernel/soft_watchdog
echo 1 > /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress

.. run test

echo 0 > /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress
echo 1 > /proc/sys/kernel/soft_watchdog

The long term fix is to make the ftrace stack_trace_filter interface more yieldy

Revision history for this message
Colin Ian King (colin-king) wrote :

FYI, to debug the test script use:

impish/tools/testing/selftests/ftrace$ sudo ./ftracetest test.d/ftrace/func_stack_tracer.tc -v -d

Revision history for this message
Colin Ian King (colin-king) wrote :

... I've looked at this test and the following occurs:

0. go to the tracking directory:

cd /sys/kernel/debug/tracing

1. tracer is disabled:

echo 0 > /proc/sys/kernel/stack_tracer_enabled

2. tracer is set to track all kernel functions with *lock* in the name:

echo '*lock*' > stack_trace_filter

(this takes a few minutes to configure, it's quite slow).

one can see the functions filtered on by using cat stack_trace_filter

3. emabling the tracer:

echo 0 > stack_max_size
echo 1 > /proc/sys/kernel/stack_tracer_enabled

4. check if the tracer has found any functions that match:

cat stack_trace | grep lock

..none appear in position 0 in the stack_trace file. This works on other arches such as arm64, amd64, s390x etc..

Revision history for this message
Colin Ian King (colin-king) wrote :

the RCU timeout was a red-herring; it occurs during the filter set-up. The actual test to find the stack tracer traced stack sizes is running w/o causing RCU soft lockup messages but it is failing to find the kernel function names that match the *lock* regex, so that's a bug.

Revision history for this message
In , colin.king (colin.king-linux-kernel-bugs) wrote :

While running some kernel selftests I discovered that the kernel ftrace
func_stack_tracer.tc fails.

How to reproduce:

in the 5.13 kernel source tree:

$ cd linux/tools/testing/selftests/ftrace
$ sudo ./ftracetest test.d/ftrace/func_stack_tracer.tc

[1] ftrace - Max stack tracer [FAIL]

# of passed: 0
# of failed: 1
# of unresolved: 0
# of untested: 0
# of unsupported: 0
# of xfailed: 0
# of undefined(test bug): 0

... I've looked at this test and the following occurs:

0. go to the tracking directory:

cd /sys/kernel/debug/tracing

1. tracer is disabled:

echo 0 > /proc/sys/kernel/stack_tracer_enabled

2. tracer is set to track all kernel functions with *lock* in the name:

echo '*lock*' > stack_trace_filter

(this takes a few minutes to configure, it's quite slow).

one can see the functions filtered on by using cat stack_trace_filter

3. emabling the tracer:

echo 0 > stack_max_size
echo 1 > /proc/sys/kernel/stack_tracer_enabled

4. check if the tracer has found any functions that match:

cat stack_trace | grep lock

..none appear in position 0 in the stack_trace file. This works on other
arches such as arm64, amd64, s390x etc.. so I'm not sure why it's not
working on RISC-V.

Changed in ubuntu-kernel-tests:
importance: Undecided → Medium
Changed in linux-riscv (Ubuntu):
importance: Undecided → Medium
Revision history for this message
In , colin.king (colin.king-linux-kernel-bugs) wrote :

Tested in RISC-V SiFive Unmatched dev board and QEMU.

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: v5.13 kernel ftrace selftest fails on riscv64 unmatched

5.13 EOL

However this issue is still valid for jammy 5.15 riscv.
We have this test disabled in ubuntu_kernel_selftests:
https://git.launchpad.net/~canonical-kernel-team/+git/autotest-client-tests/tree/ubuntu_kernel_selftests/ubuntu_kernel_selftests.py#n163

Thus I will left this bug open.

summary: - v5.13 kernel ftrace selftest fails on riscv64 unmatched
+ unc_stack_tracer.tc (Max stack tracer) kernel ftrace selftest fails on
+ riscv64 unmatched
tags: added: 5.15 jammy ubuntu-kernel-selftests
summary: - unc_stack_tracer.tc (Max stack tracer) kernel ftrace selftest fails on
+ func_stack_tracer.tc (Max stack tracer) kernel ftrace selftest fails on
riscv64 unmatched
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.