[LTCTest][Kernel][OP810]Ubuntu14.04.3: kernel call trace while continuously switching between smt on/off and changing subcores
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Canonical Kernel Team | ||
Vivid |
Fix Released
|
Undecided
|
Tim Gardner | ||
Wily |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Canonical Kernel Team |
Bug Description
== Comment: #0 - Satheesh Rajendran <email address hidden> - 2015-07-14 02:32:22 ==
While running the below script, and nothing else was running in the system, host got the kernel call traces after around 15~20 mins
#!/bin/bash -x
while true;
do for i in 1 2 4;
do ppc64_cpu --subcores-
ppc64_cpu --info
ppc64_cpu --smt=off;
ppc64_cpu --info
sleep 1;
lscpu
ppc64_cpu --smt=on;
ppc64_cpu --info
sleep 1;
lscpu
done;
sleep 1;
done
Environment:
------------------
$ cat /etc/issue
Ubuntu 14.04.2 LTS \n \l
$ uname -a
Linux tul8fp 3.19.0-22-generic #22~14.04.1-Ubuntu SMP Wed Jun 17 10:03:39 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 160
On-line CPU(s) list: 0,4,8,12,
Off-line CPU(s) list: 1-3,5-7,
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Model: 8335-GTA
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0,4,88,
NUMA node8 CPU(s): 8,12,16,
$ tail -10 /proc/cpuinfo
processor : 159
cpu : POWER8 (raw), altivec supported
clock : 2693.000000MHz
revision : 2.0 (pvr 004d 0200)
timebase : 512000000
platform : PowerNV
model : 8335-GTA
machine : PowerNV 8335-GTA
firmware : OPAL v3
Logs:-
---------
# dmesg
[79081.074088] INFO: task irqbalance:1897 blocked for more than 120 seconds.
[79081.074333] Not tainted 3.19.0-22-generic #22~14.04.1-Ubuntu
[79081.074388] "echo 0 > /proc/sys/
[79081.074449] irqbalance D 00003fff9ac39d3c 0 1897 1 0x00040000
[79081.074454] Call Trace:
[79081.074470] [c0000007e8d035f0] [00003fff9af50000] 0x3fff9af50000 (unreliable)
[79081.074477] [c0000007e8d037c0] [c000000000015934] __switch_
[79081.074482] [c0000007e8d03820] [c000000000a11508] __schedule+
[79081.074486] --- interrupt: 1000ba30 at 0xc0000007e8d039f0
[79081.074486] LR = 0x10026a6c1d0
[79081.074490] [c0000007e8d03a40] [c000000000a120a0] schedule_
[79081.074494] [c0000007e8d03a60] [c000000000a1420c] __mutex_
[79081.074497] [c0000007e8d03ae0] [c000000000a1436c] mutex_lock+
[79081.074501] [c0000007e8d03b10] [c00000000064488c] online_
[79081.074503] [c0000007e8d03b90] [c000000000644fec] dev_attr_
[79081.074508] [c0000007e8d03bd0] [c00000000035b7f4] sysfs_kf_
[79081.074511] [c0000007e8d03c20] [c0000000003591e4] kernfs_
[79081.074515] [c0000007e8d03c50] [c0000000002e74fc] seq_read+0xec/0x500
[79081.074519] [c0000007e8d03cf0] [c00000000035a474] kernfs_
[79081.074523] [c0000007e8d03d50] [c0000000002b520c] __vfs_read+
[79081.074527] [c0000007e8d03d90] [c0000000002b5364] vfs_read+0xa4/0x1c0
[79081.074530] [c0000007e8d03de0] [c0000000002b54ec] SyS_read+0x6c/0x110
[79081.074534] [c0000007e8d03e30] [c000000000009258] system_
[79081.074546] INFO: task ppc64_cpu:111047 blocked for more than 120 seconds.
[79081.074597] Not tainted 3.19.0-22-generic #22~14.04.1-Ubuntu
[79081.074802] "echo 0 > /proc/sys/
[79081.074865] ppc64_cpu D 00003fff82892d08 0 111047 109995 0x00040000
[79081.074868] Call Trace:
[79081.074871] [c000000f1c67f680] [c000000f1c67f710] 0xc000000f1c67f710 (unreliable)
[79081.074875] [c000000f1c67f850] [c000000000015934] __switch_
[79081.074878] [c000000f1c67f8b0] [c000000000a11508] __schedule+
[79081.074882] [c000000f1c67fad0] [c0000000000aff10] cpu_hotplug_
[79081.074887] [c000000f1c67fb10] [c00000000020b0a0] _cpu_down+
[79081.074890] [c000000f1c67fbd0] [c00000000020b3e8] cpu_down+0x58/0xa0
[79081.074894] [c000000f1c67fc00] [c00000000064fa0c] cpu_subsys_
[79081.074897] [c000000f1c67fc30] [c000000000647aa4] device_
[79081.074900] [c000000f1c67fc70] [c000000000647c6c] online_
[79081.074904] [c000000f1c67fcc0] [c000000000643ab8] dev_attr_
[79081.074907] [c000000f1c67fd00] [c00000000035b350] sysfs_kf_
[79081.074910] [c000000f1c67fd40] [c00000000035a28c] kernfs_
[79081.074914] [c000000f1c67fd90] [c0000000002b474c] vfs_write+
[79081.074917] [c000000f1c67fde0] [c0000000002b55fc] SyS_write+
[79081.074921] [c000000f1c67fe30] [c000000000009258] system_
[79201.074065] INFO: task irqbalance:1897 blocked for more than 120 seconds.
[79201.074312] Not tainted 3.19.0-22-generic #22~14.04.1-Ubuntu
[79201.074361] "echo 0 > /proc/sys/
[79201.074421] irqbalance D 00003fff9ac39d3c 0 1897 1 0x00040000
[79201.074424] Call Trace:
[79201.074428] [c0000007e8d035f0] [00003fff9af50000] 0x3fff9af50000 (unreliable)
[79201.074432] [c0000007e8d037c0] [c000000000015934] __switch_
[79201.074436] [c0000007e8d03820] [c000000000a11508] __schedule+
[79201.074439] --- interrupt: 1000ba30 at 0xc0000007e8d039f0
[79201.074439] LR = 0x10026a6c1d0
[79201.074443] [c0000007e8d03a40] [c000000000a120a0] schedule_
[79201.074447] [c0000007e8d03a60] [c000000000a1420c] __mutex_
[79201.074450] [c0000007e8d03ae0] [c000000000a1436c] mutex_lock+
[79201.074453] [c0000007e8d03b10] [c00000000064488c] online_
[79201.074456] [c0000007e8d03b90] [c000000000644fec] dev_attr_
[79201.074459] [c0000007e8d03bd0] [c00000000035b7f4] sysfs_kf_
[79201.074462] [c0000007e8d03c20] [c0000000003591e4] kernfs_
[79201.074465] [c0000007e8d03c50] [c0000000002e74fc] seq_read+0xec/0x500
[79201.074468] [c0000007e8d03cf0] [c00000000035a474] kernfs_
[79201.074472] [c0000007e8d03d50] [c0000000002b520c] __vfs_read+
[79201.074475] [c0000007e8d03d90] [c0000000002b5364] vfs_read+0xa4/0x1c0
[79201.074479] [c0000007e8d03de0] [c0000000002b54ec] SyS_read+0x6c/0x110
[79201.074482] [c0000007e8d03e30] [c000000000009258] system_
[79201.074499] INFO: task ppc64_cpu:111047 blocked for more than 120 seconds.
[79201.074638] Not tainted 3.19.0-22-generic #22~14.04.1-Ubuntu
[79201.074726] "echo 0 > /proc/sys/
[79201.074786] ppc64_cpu D 00003fff82892d08 0 111047 109995 0x00040000
[79201.074789] Call Trace:
[79201.074792] [c000000f1c67f680] [c000000f1c67f710] 0xc000000f1c67f710 (unreliable)
[79201.074795] [c000000f1c67f850] [c000000000015934] __switch_
[79201.074798] [c000000f1c67f8b0] [c000000000a11508] __schedule+
[79201.074802] [c000000f1c67fad0] [c0000000000aff10] cpu_hotplug_
[79201.074805] [c000000f1c67fb10] [c00000000020b0a0] _cpu_down+
[79201.074808] [c000000f1c67fbd0] [c00000000020b3e8] cpu_down+0x58/0xa0
[79201.074811] [c000000f1c67fc00] [c00000000064fa0c] cpu_subsys_
[79201.074814] [c000000f1c67fc30] [c000000000647aa4] device_
[79201.074817] [c000000f1c67fc70] [c000000000647c6c] online_
[79201.074821] [c000000f1c67fcc0] [c000000000643ab8] dev_attr_
[79201.074824] [c000000f1c67fd00] [c00000000035b350] sysfs_kf_
[79201.074827] [c000000f1c67fd40] [c00000000035a28c] kernfs_
[79201.074831] [c000000f1c67fd90] [c0000000002b474c] vfs_write+
[79201.074834] [c000000f1c67fde0] [c0000000002b55fc] SyS_write+
[79201.074837] [c000000f1c67fe30] [c000000000009258] system_
[79321.074029] INFO: task irqbalance:1897 blocked for more than 120 seconds.
[79321.074269] Not tainted 3.19.0-22-generic #22~14.04.1-Ubuntu
[79321.074319] "echo 0 > /proc/sys/
[79321.074378] irqbalance D 00003fff9ac39d3c 0 1897 1 0x00040000
[79321.074382] Call Trace:
[79321.074386] [c0000007e8d035f0] [00003fff9af50000] 0x3fff9af50000 (unreliable)
[79321.074390] [c0000007e8d037c0] [c000000000015934] __switch_
[79321.074393] [c0000007e8d03820] [c000000000a11508] __schedule+
[79321.074396] --- interrupt: 1000ba30 at 0xc0000007e8d039f0
[79321.074396] LR = 0x10026a6c1d0
[79321.074400] [c0000007e8d03a40] [c000000000a120a0] schedule_
[79321.074404] [c0000007e8d03a60] [c000000000a1420c] __mutex_
[79321.074407] [c0000007e8d03ae0] [c000000000a1436c] mutex_lock+
[79321.074410] [c0000007e8d03b10] [c00000000064488c] online_
[79321.074412] [c0000007e8d03b90] [c000000000644fec] dev_attr_
[79321.074416] [c0000007e8d03bd0] [c00000000035b7f4] sysfs_kf_
[79321.074419] [c0000007e8d03c20] [c0000000003591e4] kernfs_
[79321.074422] [c0000007e8d03c50] [c0000000002e74fc] seq_read+0xec/0x500
[79321.074425] [c0000007e8d03cf0] [c00000000035a474] kernfs_
[79321.074429] [c0000007e8d03d50] [c0000000002b520c] __vfs_read+
[79321.074432] [c0000007e8d03d90] [c0000000002b5364] vfs_read+0xa4/0x1c0
[79321.074436] [c0000007e8d03de0] [c0000000002b54ec] SyS_read+0x6c/0x110
[79321.074439] [c0000007e8d03e30] [c000000000009258] system_
[79321.074453] INFO: task ppc64_cpu:111047 blocked for more than 120 seconds.
== Comment: #1 - Satheesh Rajendran <email address hidden> - 2015-07-14 02:34:55 ==
Tried to get sosreport but the same got stuck during info gathering at processor.
# sosreport
...
Please enter your first initial and last name [tul8fp]: satheesh
Please enter the case number that you are generating this report for: trace
Running plugins. Please wait ...
Running 51/64: processor...
== Comment: #3 - Ranjal G. Shenoy <email address hidden> - 2015-07-14 05:18:17 ==
Hi Sandhya,
From the call trace it looks like ppc64_task which is trying to offline a cpu is blocked inside cpu_hotplug_
That said, I don't think the patch that you are referring to has anything to do with it. The problem lies elsewhere. If you see that the ppc64_cpu task is not making any progress at all, then it might be prudent to enable lockdep which will at least list out all the locks that are held at any point.
== Comment: #4 - Ranjal G. Shenoy <email address hidden> - 2015-07-14 05:48:39 ==
(In reply to comment #3)
> Hi Sandhya,
>
> From the call trace it looks like ppc64_task which is trying to offline a
> cpu is blocked inside cpu_hotplug_
> trying to prevent a cpu-hotplug using the
> get_online_
> is that entity that's preventing this.
>
> That said, I don't think the patch that you are referring to has anything
> to do with it. The problem lies elsewhere. If you see that the ppc64_cpu
> task is not making any progress at all, then it might be prudent to enable
> lockdep which will at least list out all the locks that are held at any
> point.
To add to that, I recall seeing a similar issue where cpu-offline operation was blocking inside cpu_hotplug_begin. The fix for that is present in the mainline with commit id : 87af9e7ff9. You can try backporting that one.
== Comment: #13 - Satheesh Rajendran <email address hidden> - 2015-07-27 12:06:19 ==
Ran script mentioned in bug description continuously for 6 hours and not have seen the mentioned call traces.
Once this fix is available as part of ubuntu14.04.3 released kernel, this bug can be closed.
Thanks in advance.
Regards,
-Satheesh.
== Comment: #14 - Brahadambal Srinivasan <email address hidden> - 2015-08-04 09:31:21 ==
Cherry picked bug backported to the trusty kernel to resolve the issue reported in this bug
Related branches
affects: | ubuntu → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in linux (Ubuntu Vivid): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-failed-vivid removed: verification-needed-vivid |
Default Comment by Bridge