Linaro Stable Kernel (LSK)

Panda: Upstream: rcu_sched detected stalls on CPU

Bug #1317401 reported by Naresh Kamboju on 2014-05-08

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Linaro Stable Kernel (LSK)	New	Undecided	Unassigned

Bug Description

Panda: Upstream: rcu_sched detected stalls on CPU
Bug happened from accent time(3.10) util latest 3.15.0-rc4
Linux kernel version: 3.15.0-rc4.
some one already reported this bug before : https://lkml.org/lkml/2012/9/20/519

95.519653] INFO: rcu_sched self-detected stall on CPU
[ 95.519866] 1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405
[ 95.526489] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 95.526489] 1: (1 GPs behind) idle=2e7/1/0 softirq=4404/4405
[ 95.526489] (detected by 0, t=4229 jiffies, g=800, c=799, q=440)
[ 95.526519] Task dump for CPU 1:
[ 95.526519] swapper/1 R running 0 0 1 0x00000000
[ 95.559844] (t=4229 jiffies g=800 c=799 q=440)
[ 95.564727] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc4 #93
[ 95.571502] [<c00133fd>] (unwind_backtrace) from [<c001076d>] (show_stack+0x11/0x14)
[ 95.579711] [<c001076d>] (show_stack) from [<c0570465>] (dump_stack+0x75/0x88)
[ 95.587371] [<c0570465>] (dump_stack) from [<c0084383>] (rcu_check_callbacks+0x353/0x79c)
[ 95.596038] [<c0084383>] (rcu_check_callbacks) from [<c003e99f>] (update_process_times+0x33/0x4c)
[ 95.605438] [<c003e99f>] (update_process_times) from [<c008e5a3>] (tick_sched_handle.isra.18+0x1f/0x48)
[ 95.615386] [<c008e5a3>] (tick_sched_handle.isra.18) from [<c008e609>] (tick_sched_timer+0x3d/0x5c)
[ 95.624969] [<c008e609>] (tick_sched_timer) from [<c0051a23>] (__run_hrtimer+0x67/0x310)
[ 95.633544] [<c0051a23>] (__run_hrtimer) from [<c00525fd>] (hrtimer_interrupt+0xe1/0x214)
[ 95.642211] [<c00525fd>] (hrtimer_interrupt) from [<c008cecb>] (tick_receive_broadcast+0x1f/0x30)
[ 95.651611] [<c008cecb>] (tick_receive_broadcast) from [<c0011e4f>] (handle_IPI+0xb3/0x120)
[ 95.660461] [<c0011e4f>] (handle_IPI) from [<c00085e5>] (gic_handle_irq+0x51/0x54)
[ 95.668487] [<c00085e5>] (gic_handle_irq) from [<c057603f>] (__irq_svc+0x3f/0x64)
[ 95.676391] Exception stack(0xee0dbf10 to 0xee0dbf58)
[ 95.681762] bf00: 00000001 00000001 00000000 ee0d8c40
[ 95.690429] bf20: 3c6bd296 00000016 3c6f8c43 00000016 eefab540 c08e0c84 00000000 c0fc7114
[ 95.699066] bf40: 00000010 ee0dbf58 c006ef4d c0443890 40000033 ffffffff
[ 95.706085] [<c057603f>] (__irq_svc) from [<c0443890>] (cpuidle_enter_state+0xc0/0xc4)
[ 95.714477] [<c0443890>] (cpuidle_enter_state) from [<c0444d11>] (cpuidle_enter_state_coupled+0xe1/0x290)
[ 95.724639] [<c0444d11>] (cpuidle_enter_state_coupled) from [<c0067cd1>] (cpu_startup_entry+0x1a5/0x494)
[ 95.734680] [<c0067cd1>] (cpu_startup_entry) from [<80008685>] (0x80008685)
[ 95.742095] BUG: soft lockup - CPU#1 stuck for 40s! [swapper/1:0]
[ 95.748535] Modules linked in:
[ 95.751770] irq event stamp: 128730
[ 95.755462] hardirqs last enabled at (128727): [<c044388f>] cpuidle_enter_state+0xbf/0xc4
[ 95.764221] hardirqs last disabled at (128728): [<c0576033>] __irq_svc+0x33/0x64
[ 95.772064] softirqs last enabled at (128730): [<c00386cd>] irq_enter+0x59/0x60
[ 95.779907] softirqs last disabled at (128729): [<c00386ba>] irq_enter+0x46/0x60
[ 95.787750]

RCU and IDLE related kernel config as blow:

CONFIG_TREE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
CONFIG_TREE_RCU_TRACE=y
CONFIG_PROVE_RCU=y
CONFIG_PROVE_RCU_REPEATEDLY=y
CONFIG_SPARSE_RCU_POINTER=y
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_CPU_STALL_INFO=y
CONFIG_RCU_TRACE=y

alexs@alex-panda:~$ cat /proc/config.gz | gunzip | grep IDLE
CONFIG_NO_HZ_IDLE=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_GENERIC_IDLE_POLL_SETUP=y
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y

See original description

Revision history for this message

Naresh Kamboju (naresh-kamboju) wrote on 2014-05-08:

[Alex shi wrote ]
Is it the hardware issue or a real software problem?

Revision history for this message

Naresh Kamboju (naresh-kamboju) wrote on 2014-05-08:

[Paul McKenney wrote]

> Is it the hardware issue or a real software problem?

I cannot distinguish between hardware and software from the trace below,
but given that you are also seeing a soft lockup, either way you do
appear to have a real problem as opposed to an RCU CPU stall warning
false positive.

Thanx, Paul

Naresh Kamboju (naresh-kamboju) on 2014-05-08

description:

updated

Revision history for this message

Alexey Vazhnov (vazhnov) wrote on 2014-05-25:

Similar problem: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1322957

Revision history for this message

Alex Shi (alex-shi) wrote on 2014-06-19:

please retest this bug. it should be gone.

Revision history for this message

Alex Shi (alex-shi) wrote on 2014-06-19:

Naresh,

You mentioned in subject that is 'upstream' issue. so when link to LSK, please give lsk kernel Oops instead of nothing.

LSK is very different with upstream kernel. and now we have lsk 3.10 and 3.14.

Revision history for this message

Naresh Kamboju (naresh-kamboju) wrote on 2014-06-19:

mark this bug as invalid in this case.
If it reports on LSK i file bug again (on bugzilla).

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.