Panda: Upstream: rcu_sched detected stalls on CPU

Bug #1317401 reported by Naresh Kamboju
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro Stable Kernel (LSK)
New
Undecided
Unassigned

Bug Description

Panda: Upstream: rcu_sched detected stalls on CPU
Bug happened from accent time(3.10) util latest 3.15.0-rc4
Linux kernel version: 3.15.0-rc4.
some one already reported this bug before : https://lkml.org/lkml/2012/9/20/519

  95.519653] INFO: rcu_sched self-detected stall on CPU
[ 95.519866] 1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405
[ 95.526489] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 95.526489] 1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405
[ 95.526489] (detected by 0, t=4229 jiffies, g=800, c=799, q=440)
[ 95.526519] Task dump for CPU 1:
[ 95.526519] swapper/1 R running 0 0 1 0x00000000
[ 95.559844] (t=4229 jiffies g=800 c=799 q=440)
[ 95.564727] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc4 #93
[ 95.571502] [<c00133fd>] (unwind_backtrace) from [<c001076d>] (show_stack+0x11/0x14)
[ 95.579711] [<c001076d>] (show_stack) from [<c0570465>] (dump_stack+0x75/0x88)
[ 95.587371] [<c0570465>] (dump_stack) from [<c0084383>] (rcu_check_callbacks+0x353/0x79c)
[ 95.596038] [<c0084383>] (rcu_check_callbacks) from [<c003e99f>] (update_process_times+0x33/0x4c)
[ 95.605438] [<c003e99f>] (update_process_times) from [<c008e5a3>] (tick_sched_handle.isra.18+0x1f/0x48)
[ 95.615386] [<c008e5a3>] (tick_sched_handle.isra.18) from [<c008e609>] (tick_sched_timer+0x3d/0x5c)
[ 95.624969] [<c008e609>] (tick_sched_timer) from [<c0051a23>] (__run_hrtimer+0x67/0x310)
[ 95.633544] [<c0051a23>] (__run_hrtimer) from [<c00525fd>] (hrtimer_interrupt+0xe1/0x214)
[ 95.642211] [<c00525fd>] (hrtimer_interrupt) from [<c008cecb>] (tick_receive_broadcast+0x1f/0x30)
[ 95.651611] [<c008cecb>] (tick_receive_broadcast) from [<c0011e4f>] (handle_IPI+0xb3/0x120)
[ 95.660461] [<c0011e4f>] (handle_IPI) from [<c00085e5>] (gic_handle_irq+0x51/0x54)
[ 95.668487] [<c00085e5>] (gic_handle_irq) from [<c057603f>] (__irq_svc+0x3f/0x64)
[ 95.676391] Exception stack(0xee0dbf10 to 0xee0dbf58)
[ 95.681762] bf00: 00000001 00000001 00000000 ee0d8c40
[ 95.690429] bf20: 3c6bd296 00000016 3c6f8c43 00000016 eefab540 c08e0c84 00000000 c0fc7114
[ 95.699066] bf40: 00000010 ee0dbf58 c006ef4d c0443890 40000033 ffffffff
[ 95.706085] [<c057603f>] (__irq_svc) from [<c0443890>] (cpuidle_enter_state+0xc0/0xc4)
[ 95.714477] [<c0443890>] (cpuidle_enter_state) from [<c0444d11>] (cpuidle_enter_state_coupled+0xe1/0x290)
[ 95.724639] [<c0444d11>] (cpuidle_enter_state_coupled) from [<c0067cd1>] (cpu_startup_entry+0x1a5/0x494)
[ 95.734680] [<c0067cd1>] (cpu_startup_entry) from [<80008685>] (0x80008685)
[ 95.742095] BUG: soft lockup - CPU#1 stuck for 40s! [swapper/1:0]
[ 95.748535] Modules linked in:
[ 95.751770] irq event stamp: 128730
[ 95.755462] hardirqs last enabled at (128727): [<c044388f>] cpuidle_enter_state+0xbf/0xc4
[ 95.764221] hardirqs last disabled at (128728): [<c0576033>] __irq_svc+0x33/0x64
[ 95.772064] softirqs last enabled at (128730): [<c00386cd>] irq_enter+0x59/0x60
[ 95.779907] softirqs last disabled at (128729): [<c00386ba>] irq_enter+0x46/0x60
[ 95.787750]

RCU and IDLE related kernel config as blow:

CONFIG_TREE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
CONFIG_TREE_RCU_TRACE=y
CONFIG_PROVE_RCU=y
CONFIG_PROVE_RCU_REPEATEDLY=y
CONFIG_SPARSE_RCU_POINTER=y
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_CPU_STALL_INFO=y
CONFIG_RCU_TRACE=y

alexs@alex-panda:~$ cat /proc/config.gz | gunzip | grep IDLE
CONFIG_NO_HZ_IDLE=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_GENERIC_IDLE_POLL_SETUP=y
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y

Revision history for this message
Naresh Kamboju (naresh-kamboju) wrote :

[Alex shi wrote ]
Is it the hardware issue or a real software problem?

Revision history for this message
Naresh Kamboju (naresh-kamboju) wrote :

[Paul McKenney wrote]

> Is it the hardware issue or a real software problem?

I cannot distinguish between hardware and software from the trace below,
but given that you are also seeing a soft lockup, either way you do
appear to have a real problem as opposed to an RCU CPU stall warning
false positive.

Thanx, Paul

description: updated
Revision history for this message
Alexey Vazhnov (vazhnov) wrote :
Revision history for this message
Alex Shi (alex-shi) wrote :

please retest this bug. it should be gone.

Revision history for this message
Alex Shi (alex-shi) wrote :

Naresh,

You mentioned in subject that is 'upstream' issue. so when link to LSK, please give lsk kernel Oops instead of nothing.

LSK is very different with upstream kernel. and now we have lsk 3.10 and 3.14.

Revision history for this message
Naresh Kamboju (naresh-kamboju) wrote :

mark this bug as invalid in this case.
If it reports on LSK i file bug again (on bugzilla).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.