Panda: Upstream: rcu_sched detected stalls on CPU

Bug #1317401 reported by Naresh Kamboju on 2014-05-08
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro Stable Kernel (LSK)
New
Undecided
Unassigned

Bug Description

Panda: Upstream: rcu_sched detected stalls on CPU
Bug happened from accent time(3.10) util latest 3.15.0-rc4
Linux kernel version: 3.15.0-rc4.
some one already reported this bug before : https://lkml.org/lkml/2012/9/20/519

  95.519653] INFO: rcu_sched self-detected stall on CPU
[ 95.519866] 1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405
[ 95.526489] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 95.526489] 1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405
[ 95.526489] (detected by 0, t=4229 jiffies, g=800, c=799, q=440)
[ 95.526519] Task dump for CPU 1:
[ 95.526519] swapper/1 R running 0 0 1 0x00000000
[ 95.559844] (t=4229 jiffies g=800 c=799 q=440)
[ 95.564727] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc4 #93
[ 95.571502] [<c00133fd>] (unwind_backtrace) from [<c001076d>] (show_stack+0x11/0x14)
[ 95.579711] [<c001076d>] (show_stack) from [<c0570465>] (dump_stack+0x75/0x88)
[ 95.587371] [<c0570465>] (dump_stack) from [<c0084383>] (rcu_check_callbacks+0x353/0x79c)
[ 95.596038] [<c0084383>] (rcu_check_callbacks) from [<c003e99f>] (update_process_times+0x33/0x4c)
[ 95.605438] [<c003e99f>] (update_process_times) from [<c008e5a3>] (tick_sched_handle.isra.18+0x1f/0x48)
[ 95.615386] [<c008e5a3>] (tick_sched_handle.isra.18) from [<c008e609>] (tick_sched_timer+0x3d/0x5c)
[ 95.624969] [<c008e609>] (tick_sched_timer) from [<c0051a23>] (__run_hrtimer+0x67/0x310)
[ 95.633544] [<c0051a23>] (__run_hrtimer) from [<c00525fd>] (hrtimer_interrupt+0xe1/0x214)
[ 95.642211] [<c00525fd>] (hrtimer_interrupt) from [<c008cecb>] (tick_receive_broadcast+0x1f/0x30)
[ 95.651611] [<c008cecb>] (tick_receive_broadcast) from [<c0011e4f>] (handle_IPI+0xb3/0x120)
[ 95.660461] [<c0011e4f>] (handle_IPI) from [<c00085e5>] (gic_handle_irq+0x51/0x54)
[ 95.668487] [<c00085e5>] (gic_handle_irq) from [<c057603f>] (__irq_svc+0x3f/0x64)
[ 95.676391] Exception stack(0xee0dbf10 to 0xee0dbf58)
[ 95.681762] bf00: 00000001 00000001 00000000 ee0d8c40
[ 95.690429] bf20: 3c6bd296 00000016 3c6f8c43 00000016 eefab540 c08e0c84 00000000 c0fc7114
[ 95.699066] bf40: 00000010 ee0dbf58 c006ef4d c0443890 40000033 ffffffff
[ 95.706085] [<c057603f>] (__irq_svc) from [<c0443890>] (cpuidle_enter_state+0xc0/0xc4)
[ 95.714477] [<c0443890>] (cpuidle_enter_state) from [<c0444d11>] (cpuidle_enter_state_coupled+0xe1/0x290)
[ 95.724639] [<c0444d11>] (cpuidle_enter_state_coupled) from [<c0067cd1>] (cpu_startup_entry+0x1a5/0x494)
[ 95.734680] [<c0067cd1>] (cpu_startup_entry) from [<80008685>] (0x80008685)
[ 95.742095] BUG: soft lockup - CPU#1 stuck for 40s! [swapper/1:0]
[ 95.748535] Modules linked in:
[ 95.751770] irq event stamp: 128730
[ 95.755462] hardirqs last enabled at (128727): [<c044388f>] cpuidle_enter_state+0xbf/0xc4
[ 95.764221] hardirqs last disabled at (128728): [<c0576033>] __irq_svc+0x33/0x64
[ 95.772064] softirqs last enabled at (128730): [<c00386cd>] irq_enter+0x59/0x60
[ 95.779907] softirqs last disabled at (128729): [<c00386ba>] irq_enter+0x46/0x60
[ 95.787750]

RCU and IDLE related kernel config as blow:

CONFIG_TREE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
CONFIG_TREE_RCU_TRACE=y
CONFIG_PROVE_RCU=y
CONFIG_PROVE_RCU_REPEATEDLY=y
CONFIG_SPARSE_RCU_POINTER=y
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_CPU_STALL_INFO=y
CONFIG_RCU_TRACE=y

alexs@alex-panda:~$ cat /proc/config.gz | gunzip | grep IDLE
CONFIG_NO_HZ_IDLE=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_GENERIC_IDLE_POLL_SETUP=y
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y

Naresh Kamboju (naresh-kamboju) wrote :

[Alex shi wrote ]
Is it the hardware issue or a real software problem?

Naresh Kamboju (naresh-kamboju) wrote :

[Paul McKenney wrote]

> Is it the hardware issue or a real software problem?

I cannot distinguish between hardware and software from the trace below,
but given that you are also seeing a soft lockup, either way you do
appear to have a real problem as opposed to an RCU CPU stall warning
false positive.

Thanx, Paul

description: updated
Alex Shi (alex-shi) wrote :

please retest this bug. it should be gone.

Alex Shi (alex-shi) wrote :

Naresh,

You mentioned in subject that is 'upstream' issue. so when link to LSK, please give lsk kernel Oops instead of nothing.

LSK is very different with upstream kernel. and now we have lsk 3.10 and 3.14.

Naresh Kamboju (naresh-kamboju) wrote :

mark this bug as invalid in this case.
If it reports on LSK i file bug again (on bugzilla).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers