[P9,POwer NV][WSP][DD2.1][Ubuntu 1804][Perf fuzzer] : Call trace is seen while running perf fuzzer (perf:)

Bug #1752002 reported by bugproxy on 2018-02-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
Canonical Kernel Team
linux (Ubuntu)
High
Canonical Kernel Team
Bionic
High
Canonical Kernel Team

Bug Description

== Comment: #0 - Shriya R. Kulkarni <email address hidden> - 2018-02-02 01:21:36 ==
Problem Description :
=============

Warn on message is seen while running perf fuzzer tests.

Machine details :
==========
Hardware : Witherspoon (wsp12) + DD2.1
OS : Ubuntu 1804
uname -a : 4.13.0-32-generic #35~lp1746225 ( Kernel from the bug : https://bugzilla.linux.ibm.com/show_bug.cgi?id=164107#c7 )

Steps to reproduce :
============
Build Kernel :
--------------------
To avoid the kernel crash due to Perf fuzzer , use the kernel mentioned in the link : https://bugzilla.linux.ibm.com/show_bug.cgi?id=164107#c7

#! /bin/bash
set -x
git clone https://github.com/deater/perf_event_tests.git
cd perf_event_tests/include
mkdir asm
cd asm
wget http://9.114.13.132/repo/shriya/perf_regs.h
cd ../../lib
make
sleep 10
cd ../fuzzer
make
sleep 10

echo 0 > /proc/sys/kernel/nmi_watchdog
echo 2 > /proc/sys/kernel/perf_event_paranoid
echo 100000 > /proc/sys/kernel/perf_event_max_sample_rate
./perf_fuzzer -r 1492143527

Call trace :
=======
[ 329.228031] ------------[ cut here ]------------
[ 329.228039] WARNING: CPU: 43 PID: 9088 at /home/jsalisbury/bugs/lp1746225/ubuntu-artful/kernel/events/core.c:3038 perf_pmu_sched_task+0x170/0x180
[ 329.228040] Modules linked in: ofpart at24 uio_pdrv_genirq uio cmdlinepart powernv_flash mtd ipmi_powernv vmx_crypto ipmi_devintf ipmi_msghandler ibmpowernv opal_prd crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 crc32c_vpmsum lpfc ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core fb_sys_fops ttm tg3 nvmet_fc drm ahci nvmet nvme_fc libahci nvme_fabrics mlxfw nvme_core devlink scsi_transport_fc
[ 329.228068] CPU: 43 PID: 9088 Comm: perf_fuzzer Not tainted 4.13.0-32-generic #35~lp1746225
[ 329.228070] task: c000003f776ac900 task.stack: c000003f77728000
[ 329.228071] NIP: c000000000299b70 LR: c0000000002a4534 CTR: c00000000029bb80
[ 329.228073] REGS: c000003f7772b760 TRAP: 0700 Not tainted (4.13.0-32-generic)
[ 329.228073] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
[ 329.228079] CR: 24008822 XER: 00000000
[ 329.228080] CFAR: c000000000299a70 SOFTE: 0
               GPR00: c0000000002a4534 c000003f7772b9e0 c000000001606200 c000003fef858908
               GPR04: c000003f776ac900 0000000000000001 ffffffffffffffff 0000003fee730000
               GPR08: 0000000000000000 0000000000000000 c0000000011220d8 0000000000000002
               GPR12: c00000000029bb80 c000000007a3d900 0000000000000000 0000000000000000
               GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
               GPR20: 0000000000000000 0000000000000000 c000003f776ad090 c000000000c71354
               GPR24: c000003fef716780 0000003fee730000 c000003fe69d4200 c000003f776ad330
               GPR28: c0000000011220d8 0000000000000001 c0000000014c6108 c000003fef858900
[ 329.228098] NIP [c000000000299b70] perf_pmu_sched_task+0x170/0x180
[ 329.228100] LR [c0000000002a4534] __perf_event_task_sched_in+0xc4/0x230
[ 329.228101] Call Trace:
[ 329.228102] [c000003f7772b9e0] [c0000000002a0678] perf_iterate_sb+0x158/0x2a0 (unreliable)
[ 329.228105] [c000003f7772ba30] [c0000000002a4534] __perf_event_task_sched_in+0xc4/0x230
[ 329.228107] [c000003f7772bab0] [c0000000001396dc] finish_task_switch+0x21c/0x310
[ 329.228109] [c000003f7772bb60] [c000000000c71354] __schedule+0x304/0xb80
[ 329.228111] [c000003f7772bc40] [c000000000c71c10] schedule+0x40/0xc0
[ 329.228113] [c000003f7772bc60] [c0000000001033f4] do_wait+0x254/0x2e0
[ 329.228115] [c000003f7772bcd0] [c000000000104ac0] kernel_wait4+0xa0/0x1a0
[ 329.228117] [c000003f7772bd70] [c000000000104c24] SyS_wait4+0x64/0xc0
[ 329.228121] [c000003f7772be30] [c00000000000b184] system_call+0x58/0x6c
[ 329.228121] Instruction dump:
[ 329.228123] 3beafea0 7faa4800 409eff18 e8010060 eb610028 ebc10040 7c0803a6 38210050
[ 329.228127] eb81ffe0 eba1ffe8 ebe1fff8 4e800020 <0fe00000> 4bffffbc 60000000 60420000
[ 329.228131] ---[ end trace 8c46856d314c1811 ]---
[ 375.755943] hrtimer: interrupt took 31601 ns

== Comment: #4 - SEETEENA THOUFEEK <email address hidden> - 2018-02-05 06:34:09 ==

== Comment: #5 - SEETEENA THOUFEEK <email address hidden> - 2018-02-05 06:36:12 ==
We have similar issue reported on different distro where Anju Provided the patch. Patch attached above.
.
Will check with her if that patch got accepted upstream

== Comment: #14 - SEETEENA THOUFEEK <email address hidden> - 2018-02-23 01:52:50 ==

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-164198 severity-high targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → High
Frank Heimes (frank-heimes) wrote :

Please let us know if the needed kernel patch was finally upstream accepted and leave a public reference to it here (either attach the patch here, provide a public link/URL or a commit id).
Thx

Changed in ubuntu-power-systems:
status: New → Incomplete
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Triaged
tags: added: kernel-da-key
Manoj Iyer (manjo) on 2018-03-05
Changed in linux (Ubuntu Bionic):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)

------- Comment From <email address hidden> 2018-03-19 10:34 EDT-------
What's latest on this bug?

Joseph Salisbury (jsalisbury) wrote :

Has the patch been accepted upstream? If so, do you happen to have the SHA1 or patch subject?

------- Comment (attachment only) From <email address hidden> 2018-04-28 12:49 EDT-------

Download full text (9.1 KiB)

------- Comment From <email address hidden> 2018-04-30 04:24 EDT-------
Verified with test kernel : Issue is resolved.
===============

But hitting another trace :

[ 294.764782] perf: Dynamic interrupt throttling disabled, can hang your system!
[ 315.685952] perf: Dynamic interrupt throttling disabled, can hang your system!
[ 317.385747] perf: Dynamic interrupt throttling disabled, can hang your system!
[ 335.030061] hrtimer: interrupt took 1494725987 ns
[ 386.576484] perf: Dynamic interrupt throttling disabled, can hang your system!
[ 403.964295] perf: Dynamic interrupt throttling disabled, can hang your system!
[ 414.884012] perf: Dynamic interrupt throttling disabled, can hang your system!
[ 431.700329] perf: Dynamic interrupt throttling disabled, can hang your system!
[ 471.108095] INFO: rcu_sched self-detected stall on CPU
[ 471.108214] 116-....: (5250 ticks this GP) idle=c9a/140000000000002/0 softirq=6343/6344 fqs=2625
[ 471.108351] (t=5251 jiffies g=8835 c=8834 q=1160)
[ 471.108508] Task dump for CPU 116:
[ 471.108518] perf_fuzzer R running task 0 5428 5267 0x0004a006
[ 471.108549] Call Trace:
[ 471.108582] [c0002038e74231c0] [c000000000149e98] sched_show_task.part.16+0xd8/0x110 (unreliable)
[ 471.108627] [c0002038e7423230] [c0000000001a9e5c] rcu_dump_cpu_stacks+0xd4/0x138
[ 471.108664] [c0002038e7423280] [c0000000001a8f28] rcu_check_callbacks+0x8e8/0xb40
[ 471.108698] [c0002038e74233b0] [c0000000001b71c8] update_process_times+0x48/0x90
[ 471.108731] [c0002038e74233e0] [c0000000001cef14] tick_sched_handle.isra.5+0x34/0xd0
[ 471.108760] [c0002038e7423410] [c0000000001cf010] tick_sched_timer+0x60/0xe0
[ 471.108795] [c0002038e7423450] [c0000000001b7d74] __hrtimer_run_queues+0x144/0x370
[ 471.108830] [c0002038e74234d0] [c0000000001b8ccc] hrtimer_interrupt+0xfc/0x350
[ 471.108867] [c0002038e74235a0] [c0000000000248f0] __timer_interrupt+0x90/0x260
[ 471.108903] [c0002038e74235f0] [c000000000024d08] timer_interrupt+0x98/0xe0
[ 471.108943] [c0002038e7423620] [c00000000000b998] fast_exception_return+0x148/0x16c
[ 471.108990] --- interrupt: 901 at arch_local_irq_restore+0x84/0x90
LR = __do_softirq+0xd8/0x3e4
[ 471.109017] [c0002038e7423910] [c0000000001b8d60] hrtimer_interrupt+0x190/0x350 (unreliable)
[ 471.109054] [c0002038e7423930] [c000000000cffbc8] __do_softirq+0xd8/0x3e4
[ 471.109089] [c0002038e7423a10] [c000000000115928] irq_exit+0xe8/0x120
[ 471.109124] [c0002038e7423a30] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[ 471.109164] [c0002038e7423a60] [c00000000000b998] fast_exception_return+0x148/0x16c
[ 471.109211] --- interrupt: 901 at mutex_unlock+0x18/0x50
LR = perf_event_for_each_child+0xb0/0xf0
[ 471.109236] [c0002038e7423d50] [c0000000002b9e70] perf_event_for_each_child+0x60/0xf0 (unreliable)
[ 471.109279] [c0002038e7423d90] [c0000000002c4da8] perf_event_task_enable+0x78/0xe0
[ 471.109309] [c0002038e7423dd0] [c00000000012d4e4] SyS_prctl+0x364/0x6a0
[ 471.109345] [c0002038e7423e30] [c00000000000b184] system_call+0x58/0x6c
[ 477.935937] watchdog: BUG: soft lockup - CPU#116 stuck for 23s! [perf_fuzzer:5428]
[ 477.936042] Modules linked in: xt_CHECKSUM(E) iptable_mangle(E) ipt_MASQUER...

Read more...

bugproxy (bugproxy) wrote :
Download full text (9.8 KiB)

------- Comment From <email address hidden> 2018-05-02 05:19 EDT-------
(In reply to comment #39)
> Verified with test kernel : Issue is resolved.
> ===============
>
>
>
> But hitting another trace :
>
> [ 294.764782] perf: Dynamic interrupt throttling disabled, can hang your
> system!
> [ 315.685952] perf: Dynamic interrupt throttling disabled, can hang your
> system!
> [ 317.385747] perf: Dynamic interrupt throttling disabled, can hang your
> system!
> [ 335.030061] hrtimer: interrupt took 1494725987 ns
> [ 386.576484] perf: Dynamic interrupt throttling disabled, can hang your
> system!
> [ 403.964295] perf: Dynamic interrupt throttling disabled, can hang your
> system!
> [ 414.884012] perf: Dynamic interrupt throttling disabled, can hang your
> system!
> [ 431.700329] perf: Dynamic interrupt throttling disabled, can hang your
> system!
> [ 471.108095] INFO: rcu_sched self-detected stall on CPU
> [ 471.108214] 116-....: (5250 ticks this GP) idle=c9a/140000000000002/0
> softirq=6343/6344 fqs=2625
> [ 471.108351] (t=5251 jiffies g=8835 c=8834 q=1160)
> [ 471.108508] Task dump for CPU 116:
> [ 471.108518] perf_fuzzer R running task 0 5428 5267
> 0x0004a006
> [ 471.108549] Call Trace:
> [ 471.108582] [c0002038e74231c0] [c000000000149e98]
> sched_show_task.part.16+0xd8/0x110 (unreliable)
> [ 471.108627] [c0002038e7423230] [c0000000001a9e5c]
> rcu_dump_cpu_stacks+0xd4/0x138
> [ 471.108664] [c0002038e7423280] [c0000000001a8f28]
> rcu_check_callbacks+0x8e8/0xb40
> [ 471.108698] [c0002038e74233b0] [c0000000001b71c8]
> update_process_times+0x48/0x90
> [ 471.108731] [c0002038e74233e0] [c0000000001cef14]
> tick_sched_handle.isra.5+0x34/0xd0
> [ 471.108760] [c0002038e7423410] [c0000000001cf010]
> tick_sched_timer+0x60/0xe0
> [ 471.108795] [c0002038e7423450] [c0000000001b7d74]
> __hrtimer_run_queues+0x144/0x370
> [ 471.108830] [c0002038e74234d0] [c0000000001b8ccc]
> hrtimer_interrupt+0xfc/0x350
> [ 471.108867] [c0002038e74235a0] [c0000000000248f0]
> __timer_interrupt+0x90/0x260
> [ 471.108903] [c0002038e74235f0] [c000000000024d08]
> timer_interrupt+0x98/0xe0
> [ 471.108943] [c0002038e7423620] [c00000000000b998]
> fast_exception_return+0x148/0x16c
> [ 471.108990] --- interrupt: 901 at arch_local_irq_restore+0x84/0x90
> LR = __do_softirq+0xd8/0x3e4
> [ 471.109017] [c0002038e7423910] [c0000000001b8d60]
> hrtimer_interrupt+0x190/0x350 (unreliable)
> [ 471.109054] [c0002038e7423930] [c000000000cffbc8] __do_softirq+0xd8/0x3e4
> [ 471.109089] [c0002038e7423a10] [c000000000115928] irq_exit+0xe8/0x120
> [ 471.109124] [c0002038e7423a30] [c000000000024d0c]
> timer_interrupt+0x9c/0xe0
> [ 471.109164] [c0002038e7423a60] [c00000000000b998]
> fast_exception_return+0x148/0x16c
> [ 471.109211] --- interrupt: 901 at mutex_unlock+0x18/0x50
> LR = perf_event_for_each_child+0xb0/0xf0
> [ 471.109236] [c0002038e7423d50] [c0000000002b9e70]
> perf_event_for_each_child+0x60/0xf0 (unreliable)
> [ 471.109279] [c0002038e7423d90] [c0000000002c4da8]
> perf_event_task_enable+0x78/0xe0
> [ 471.109309] [c0002038e7423dd0] [c00000000012d4e4] SyS_prctl+0x364/0x6a0
> [ 471.109345] [c0002038e7423e30...

Frank Heimes (frank-heimes) wrote :

I 'think' that this bug can be closed, since it looks to me that the issue was solved with kernel 4.15.0-10.11 and higher and we shipped 4.15.0.20.23 with bionic GA.
But we still would like to see the upstream commit id SHA1 or patch subject.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-17 03:04 EDT-------
Hi Anju ,

The above test kernel add issue while setting up, so rebuilt the upstream kernel with the patches mentioned above.

=> Perf fuzzer works fine and issue is not seen.

=> Thread IMC also works fine.

root@ltc-wspoon4:/usr/lib/linux-tools-4.15.0-20# ./perf stat -e '{emulation-faults, thread_imc/CPM_CCYC/, thread_imc/CPM_CS_32MHZ_CYC/, thread_imc/CPM_CS_BRU_CMPL_KERNEL/}' yes > /dev/null

^Cyes: Interrupt

Performance counter stats for 'yes':

0 emulation-faults
1,509,903,360 thread_imc/CPM_CCYC/
1,493,126,656 thread_imc/CPM_CS_32MHZ_CYC/
0 thread_imc/CPM_CS_BRU_CMPL_KERNEL/

35.697528468 seconds time elapsed

Machine : Witherspoon + DD2.2

uname -r : 4.17.0-rc5

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Changed in linux (Ubuntu Bionic):
status: Triaged → Fix Released
Changed in ubuntu-power-systems:
status: Incomplete → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-09-27 07:16 EDT-------
> Anju: What is the status on upstream review on your fixed code?

Hi,

Patch is now upstream,
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=7ccc4fe5ff9e3a134e863beed0dba18a5e511659

Thanks,
Anju.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-07 02:33 EDT-------
Anju , please check if both patches are needed to fix this issue.

------- Comment From <email address hidden> 2019-05-07 04:59 EDT-------
(In reply to comment #58)
> (In reply to comment #57)
> > Anju , please check if both patches are needed to fix this issue.
>
> yes, both may be needed, as the grouping of events with perf-fuzzer can
> happen in any way.

Based on the developer response both the patches are needed to fix this issue.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5aa04b3eb6fca63d2e9827be656dcadc26d54e11

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=7ccc4fe5ff9e3a134e863beed0dba18a5e511659

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-06-04 06:40 EDT-------
Canonical, Any update

Andrew Cloke (andrew-cloke) wrote :

This bug, LP link https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1752002, has been marked as "Fixed Released". If you have further questions or issues, please raise a new LP bug.
Thanks.

Terry Rudd (terrykrudd) on 2019-06-05
Changed in linux (Ubuntu):
status: Fix Released → In Progress
Changed in linux (Ubuntu Bionic):
status: Fix Released → In Progress
Changed in ubuntu-power-systems:
status: Fix Released → In Progress
Andrea Righi (arighi) wrote :

Considering that only one of the two fixes mentioned in comment #11 are applied to the bionic kernel, I've uploaded a new test kernel here https://kernel.ubuntu.com/~arighi/LP-1752002/ with both fixes applied.

It would be great if someone could test this kernel to confirm that the problem is actually solved. Thanks.

Andrew Cloke (andrew-cloke) wrote :

Marking as "incomplete" while waiting for IBM to test the PPA referenced in comment #14.

Changed in ubuntu-power-systems:
status: In Progress → Incomplete
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-06-10 17:45 EDT-------
Anju - I tried this with both the 4.15.0-50 kernel (that has the first patch) and the test kernel in the above comment and I see the same behavior. No call trace, but on both the test ends:
==================================================
Starting fuzzing at 2019-06-10 16:35:58
==================================================
Watchdog triggered; failed to progress for 60 seconds; killing
Trying to shut ourselves down: 4986, last child 0

This is on a dd 2.3 witherspoon. Any ideas?

Changed in ubuntu-power-systems:
status: Incomplete → In Progress
bugproxy (bugproxy) wrote :
Download full text (4.5 KiB)

------- Comment From <email address hidden> 2019-06-24 04:52 EDT-------
Anju,

I'm trying this and seeing:
==================================================
Starting fuzzing at 2019-06-24 03:26:26
==================================================
Cannot open /sys/kernel/tracing/kprobe_events
SIGIO due to RT queue overflow
Signal from invalid fd 10 Bad file descriptor
Iteration 10000, 139369 syscalls in 25.77 s (5.407 k syscalls/s)
Open attempts: 131443 Successful: 918 Currently open: 5
ENOENT : 370
E2BIG : 11571
EBADF : 7775
EINVAL : 110627
ENOSPC : 31
EOVERFLOW : 1
EOPNOTSUPP : 150
Trinity Type (Normal 142/32892)(Sampling 11/32823)(Global 693/32727)(Random 72/33001)
Type (Hardware 268/18287)(software 318/17277)(tracepoint 60/16952)(Cache 55/16464)(cpu 151/16831)(breakpoint 17/16793)(nest_alink0_imc 1/387)(nest_alink1_imc 2/353)(nest_alink2_imc 2/455)(nest_alink3_imc 0/380)(nest_capp0_imc 0/352)(nest_capp1_imc 0/385)(nest_centaur0_imc 0/381)(nest_centaur1_imc 0/384)(nest_centaur2_imc 2/339)(nest_centaur3_imc 1/380)(nest_centaur4_imc 0/482)(nest_centaur5_imc 2/378)(nest_centaur6_imc 0/344)(nest_centaur7_imc 39/23839)
Close: 913/913 Successful
Read: 821/892 Successful
Write: 0/837 Successful
Ioctl: 276/883 Successful: (ENABLE 60/60)(DISABLE 60/60)(REFRESH 4/57)(RESET 66/66)(PERIOD 9/58)(SET_OUTPUT 5/53)(SET_FILTER 0/68)(ID 63/63)(SET_BPF 0/57)(PAUSE_OUTPUT 9/63)(QUERY_BPF 0/71)(MOD_ATTR 0/71)(#12 0/0)(#13 0/0)(#14 0/0)(>14 0/136)
Mmap: 661/1075 Successful: (MMAP 661/1075)(TRASH 119/144)(READ 127/134)(UNMAP 661/1050)(AUX 0/155)(AUX_READ 0/0)
Prctl: 900/900 Successful
Fork: 466/466 Successful
Poll: 831/893 Successful
Access: 329/918 Successful
Overflows: 1972959 Recursive: 0
SIGIOs due to RT signal queue full: 1
Iteration 20000, 140897 syscalls in 7.04 s (20.024 k syscalls/s)
Open attempts: 133151 Successful: 901 Currently open: 24
ENOENT : 402
E2BIG : 11642
EBADF : 8002
EINVAL : 112005
ENOSPC : 33
EOVERFLOW : 1
EOPNOTSUPP : 165
Trinity Type (Normal 110/33454)(Sampling 15/33233)(Global 700/33200)(Random 76/33264)
Type (Hardware 276/18590)(software 303/17115)(tracepoint 54/17217)(Cache 56/16759)(cpu 138/17185)(breakpoint 15/17011)(nest_alink0_imc 2/352)(nest_alink1_imc 3/370)(nest_alink2_imc 0/498)(nest_alink3_imc 3/382)(nest_capp0_imc 1/378)(nest_capp1_imc 1/359)(nest_centaur0_imc 1/364)(nest_centaur1_imc 0/363)(nest_centaur2_imc 1/376)(nest_centaur3_imc 2/356)(nest_centaur4_imc 1/483)(nest_centaur5_imc 1/369)(nest_centaur6_imc 0/385)(nest_centaur7_imc 43/24239)
Close: 882/882 Successful
Read: 790/870 Successful
Write: 0/857 Successful
Ioctl: 266/896 Successful: (ENABLE 63/63)(DISABLE 59/59)(REFRESH 6/74)(RESET 69/69)(PERIOD 5/68)(SET_OUTPUT 5/76)(SET_FILTER 0/68)(ID 49/49)(SET_BPF 0/67)(PAUSE_OUTPUT 10/60)(QUERY_BPF 0/64)(MOD_ATTR 0/59)(#12 0/0)(#13 0/0)(#14 0/0)(>14 0/120)
Mmap: 623/1028 Successful: (MMAP 623/1028)(TRASH 121/147)(READ 118/124)(UNMAP 618/1004)(AUX 0/135)(AUX_READ 0/0)
Prctl: 868/868 Successful
Fork: 452/452 Successful
Poll: 791/865 Successful
Access: 332/926 Successful
Overflows: 0 Recursive: 0
SIGIOs due to RT signal queue full: 0
Throttling event 1 fd 6, last_refresh=0, period=8816262, type=1 throttles 0
Throttling event ...

Read more...

Andrew Cloke (andrew-cloke) wrote :

Moving back to incomplete until IBM have finished their testing.

Changed in ubuntu-power-systems:
status: In Progress → Incomplete
Changed in ubuntu-power-systems:
status: Incomplete → In Progress
status: In Progress → Incomplete
bugproxy (bugproxy) wrote :
Download full text (5.9 KiB)

------- Comment From <email address hidden> 2019-07-15 09:59 EDT-------
I added:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c9108b083706330cd5484d121fbb0ad67e8f647

in addition to:
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=7ccc4fe5ff9e3a134e863beed0dba18a5e511659

It ran a lot longer - hour instead of minutes, but then ended up with this:
[18734.191331] perf: interrupt took too long (1054 > 1051), lowering kernel.perf_event_max_sample_rate to 7500
[18736.617855] perf: Dynamic interrupt throttling disabled, can hang your system!
[18751.191062] perf: interrupt took too long (2317 > 1333), lowering kernel.perf_event_max_sample_rate to 3250
[18753.006339] perf: interrupt took too long (2218 > 1), lowering kernel.perf_event_max_sample_rate to 3500
[18754.156398] perf: Dynamic interrupt throttling disabled, can hang your system!
[18775.067223] perf: interrupt took too long (2227 > 1), lowering kernel.perf_event_max_sample_rate to 3500
[18779.532549] perf: Dynamic interrupt throttling disabled, can hang your system!
[18834.315583] perf: Dynamic interrupt throttling disabled, can hang your system!
[18851.090933] Watchdog CPU:102 Hard LOCKUP
[18851.090936] Modules linked in: kvm_hv kvm vmx_crypto crct10dif_vpmsum ast drm_kms_helper ttm ofpart cmdlinepart drm fb_sys_fops ipmi_powernv at24 syscopyarea ipmi_devintf powernv_flash sysfillrect ipmi_msghandler opal_prd mtd ibmpowernv sysimgblt i2c_algo_bit uio_pdrv_genirq uio sch_fq_codel ip_tables x_tables autofs4 mlx5_core ahci mlxfw crc32c_vpmsum tg3 libahci devlink
[18851.090995] CPU: 102 PID: 0 Comm: swapper/102 Tainted: G L 4.15.0-54-generic #58
[18851.090997] NIP: c000000000100740 LR: c00000000010058c CTR: c0000000000fe770
[18851.091000] REGS: c000000007ad3d80 TRAP: 0900 Tainted: G L (4.15.0-54-generic)
[18851.091001] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28002882 XER: 00000000
[18851.091014] CFAR: c00000000000deb8 SOFTE: 0
GPR00: c000000000100584 c00020397b423850 c00000000170b800 c000003f80925000
GPR04: 000000007fffffff c00020397b4238b0 c000203994486958 c0002039944868b8
GPR08: 0000000000000004 0000000000000000 0000000000000001 0000000000000000
GPR12: c0002039944868f8 c000000007a56200 c00020397b423f90 0000000000000000
GPR16: 0000000000000000 c00000000004ad60 c00000000004ad30 c0000000011d5380
GPR20: 0000000000000800 c000000001742494 0000000000000066 0000000000000001
GPR24: 0000000000000198 0000000080000000 0000000000000000 0000000000000006
GPR28: 0000000000000000 c0000000018d1808 0000000006004010 c0002039944868a0
[18851.091058] NIP [c000000000100740] power_pmu_enable+0x4f0/0x600
[18851.091060] LR [c00000000010058c] power_pmu_enable+0x33c/0x600
[18851.091061] Call Trace:
[18851.091065] [c00020397b423850] [c000000000100584] power_pmu_enable+0x334/0x600 (unreliable)
[18851.091071] [c00020397b423930] [c0000000002c9dbc] ctx_resched+0xec/0x150
[18851.091075] [c00020397b423970] [c0000000002ca014] __perf_install_in_context+0x1f4/0x280
[18851.091079] [c00020397b4239c0] [c0000000002bf7d0] remote_function+0x40/0x90
[18851.091083] [c00020397b4239f0] [c0000000001db9dc] flush_smp_call_function_...

Read more...

Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
Andrew Cloke (andrew-cloke) wrote :

Awaiting confirmation from IBM (Anju), as per Michael's last comment.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-07-16 01:49 EDT-------
>
> Anju - we missing something else?

Hi Michael,

This seems to be a different issue. Not related to thread-imc. Similar perf fuzzer issues are reported in
https://bugzilla.linux.ibm.com/show_bug.cgi?id=161854
https://bugzilla.linux.ibm.com/show_bug.cgi?id=162507

See this comment: https://bugzilla.linux.ibm.com/show_bug.cgi?id=162507#c37

One workaround can be: try with these patches as well:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3202e35ec1c8fc19cea24253ff83edf702a60a02

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=913a90bc5a3a06b1f04c337320e9aeee2328dd77

Above patches fixes two issues. But there are still cases where lockup can still happen.

Thanks,
Anju

Andrew Cloke (andrew-cloke) wrote :

@IBM, do the patches that were included in the PPA in comment #14 completely address a discrete issue? Or do we need to wait for further patches to be identified and upstreamed?

If the patches included in the PPA from comment #14 do address a concrete issue, then I would suggest re-focusing this bug on that issue, and raising a new bug for the other issues...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.