LSK: CPU hot-plug / un-plug crashed kernel on Vexpress-TC2

Bug #1301886 reported by Naresh Kamboju
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linaro Stable Kernel (LSK)
Fix Released
High
Tixy (Jon Medhurst)

Bug Description

While running IKS test suite on LSK Android vexpress tc2 build kernel oops occurred.

Test case:
- - - - - - - - -
 echo 0 > /sys/kernel/bL_switcher/active
 echo 1 > /sys/kernel/bL_switcher/active

Easy to reproduce all time.

Kernel oops:
- - - - - - - - - - -
[ 493.749400] CPU2: Booted secondary processor
[ 493.765541] CPU3: Booted secondary processor
[ 493.810084] CPU4: Booted secondary processor
[ 493.819121] ------------[ cut here ]------------
[ 493.845294] kernel BUG at /mnt/jenkins/workspace/linaro-android_vexpress-lsk/
build/kernel/linaro/vexpress-lsk/kernel/sched/fair.c:3738!
[ 493.881185] Internal error: Oops - BUG: 0 [#1] SMP THUMB2
[ 493.897086] Modules linked in: gator
[ 493.907625] CPU: 4 PID: 23 Comm: ksoftirqd/4 Not tainted 3.10.35-00166-g0f15e
a2 #1
[ 493.929932] task: ef0c30c0 ti: ef10e000 task.ti: ef10e000
[ 493.945836] PC is at hmp_get_heaviest_task+0xb6/0xbc
[ 493.960457] LR is at hmp_idle_pull+0xbf/0x3b0
[ 493.973283] pc : [<c0042eea>] lr : [<c004782b>] psr: 400001f3
[ 493.973283] sp : ef10fdd8 ip : 200001d3 fp : 00000004
[ 494.007121] r10: c062fba4 r9 : c16eef00 r8 : c0624d78
[ 494.022509] r7 : ef003180 r6 : 00000073 r5 : ee990578 r4 : c16eef48
[ 494.041741] r3 : 00000004 r2 : 00000000 r1 : 00000004 r0 : ef003188
[ 494.060974] Flags: nZcv IRQs off FIQs off Mode SVC_32 ISA Thumb Segment
kernel
[ 494.083536] Control: 50c5387d Table: 8000406a DAC: 00000015
[ 494.100460]
[ 494.100460] PC: 0xc0042e6a:

Linux kernel version:
- - - - - - - - - - - - - - - - -
Linux version 3.10.35-00166-g0f15ea2 (jenkins-build@ip-10-185-139-12) (gcc version 4.8.3 20140303 (prerelease) (Linaro GCC 4.8-2014.03) ) #1 SMP Thu Apr 3 04:11:02 UTC 2014

suspected code snippet:
- - - - - - - - - - - - - - - - - - - -
kernel/sched/fair.c:3738!
static struct sched_entity *hmp_get_heaviest_task()
3735 hmp = hmp_faster_domain(cpu_of(se->cfs_rq->rq));
3736 hmp_target_mask = &hmp->cpus;
3737 if (target_cpu >= 0) {
3738 BUG_ON(!cpumask_test_cpu(target_cpu, hmp_target_mask));
3739 hmp_target_mask = cpumask_of(target_cpu);
3740 }

Suspected bad commit:
https://git.linaro.org/kernel/linux-linaro-stable.git/blobdiff/483a53b310dfefdd8c010ba0cc5b2addfaef123f..7dcba49cf6c1fd2d6ec1bead821fdd85e84bdc1f:/kernel/sched/fair.c

LSK Android Build commit:
https://git.linaro.org/kernel/linux-linaro-stable.git/commit/0f15ea2dcd21b295fa34c77dffd1164117824d2f

LSK Android Build:
https://android-build.linaro.org/builds/~linaro-android/vexpress-lsk/#build=221

Revision history for this message
Naresh Kamboju (naresh-kamboju) wrote :
Changed in linaro-stable-kernel:
assignee: nobody → Tixy (Jon Medhurst) (tixy)
importance: Undecided → High
status: New → In Progress
Revision history for this message
Tixy (Jon Medhurst) (tixy) wrote :

I can also reproduce this just by offlining and onlining cpu's with

echo 0 > /sys/devices/system/cpu/cpu4/online
echo 1 > /sys/devices/system/cpu/cpu4/online

Using the attached debug-patch I can see that there is a lag between using hmp_get_heaviest_task to find a task for a CPU and hmp_target_mask getting updated to include/exclude that CPU.

EXAMPLE 1

root@vexpress:/ # echo 0 > /sys/devices/system/cpu/cpu4/online
[ 204.487272] CPU4: shutdown
root@vexpress:/ # echo 1 > /sys/devices/system/cpu/cpu4/online
[ 210.380007] CPU4: Booted secondary processor
[ 210.383334] hmp_get_heaviest_task fail for target_cpu=4 hmp_target_mask=0x8

EXAMPLE 2

root@vexpress:/ # echo 1 > /sys/kernel/bL_switcher/active
[ 76.828730] big.LITTLE switcher initializing
[ 76.949151] CPU0 paired with CPU4
[ 76.958994] CPU1 paired with CPU3
[ 76.968798] GIC ID for CPU 0 cluster 1 is 2
[ 76.981161] GIC ID for CPU 1 cluster 1 is 3
[ 76.993523] GIC ID for CPU 2 cluster 1 is 4
[ 77.010123] CPU2: shutdown
[ 77.019182] GIC ID for CPU 0 cluster 0 is 0
[ 77.034011] hmp_get_heaviest_task fail for target_cpu=3 hmp_target_mask=0x10
[ 77.055932] CPU3: shutdown
[ 77.064811] GIC ID for CPU 1 cluster 0 is 1
[ 77.084289] CPU4: shutdown
[ 77.094436] cpu cpu0: bL_cpufreq_init: CPU 0 initialized
[ 77.110651] cpu cpu1: bL_cpufreq_init: CPU 1 initialized
[ 77.126803] big.LITTLE switcher initialized

EXAMPLE 3

root@vexpress:/ # echo 0 > /sys/kernel/bL_switcher/
[ 97.850282] CPU2: Booted secondary processor
[ 97.866689] CPU3: Booted secondary processor
[ 97.867107] hmp_get_heaviest_task fail for target_cpu=3 hmp_target_mask=0x0
[ 97.900191] hmp_get_heaviest_task fail for target_cpu=3 hmp_target_mask=0x0
[ 97.920705] hmp_get_heaviest_task fail for target_cpu=3 hmp_target_mask=0x0
[ 97.945189] CPU4: Booted secondary processor
[ 97.945603] hmp_get_heaviest_task fail for target_cpu=4 hmp_target_mask=0x8
[ 97.980939] cpu cpu0: bL_cpufreq_init: CPU 0 initialized
[ 97.997408] cpu cpu3: bL_cpufreq_init: CPU 3 initialized

Revision history for this message
Naresh Kamboju (naresh-kamboju) wrote : Re: LSK: Enabling big.LITTLE IKS crashed kernel

LSK-enabling-iks-kernel-crashed.log attached.

summary: - LSK: Kernel oops at "hmp_get_heaviest_task" on ve-tc2
+ LSK: Enabling big.LITTLE IKS crashed kernel
summary: - LSK: Enabling big.LITTLE IKS crashed kernel
+ LSK: CPU hot plug crashed kernel
summary: - LSK: CPU hot plug crashed kernel
+ LSK: CPU hot-plug / un-plug crashed kernel
Fathi Boudra (fboudra)
Changed in linaro-stable-kernel:
milestone: none → 14.04
Revision history for this message
Naresh Kamboju (naresh-kamboju) wrote : Re: LSK: CPU hot-plug / un-plug crashed kernel

Due to this bug cpu hot plug/ un plug test cases failed from pm-qa testsuite.

Revision history for this message
Tixy (Jon Medhurst) (tixy) wrote :
Changed in linaro-stable-kernel:
status: In Progress → Fix Committed
summary: - LSK: CPU hot-plug / un-plug crashed kernel
+ LSK: CPU hot-plug / un-plug crashed kernel on Vexpress-TC2
Fathi Boudra (fboudra)
Changed in linaro-stable-kernel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.