While testing the unstable kernel PPA with linux-image-4.4.0-0-generic on ppc64el, autopkgtests often fail with this kernel crash on boot:

ubuntu@juju-prod-ues-proposed-migration-machine-12:~⟫ nova console-log adt-xenial-ppc64el-systemd-20151207-161711

[ 0.000000] Using pSeries machine description
[ 0.000000] Page sizes from device-tree:
[ 0.000000] base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
[ 0.000000] base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
[ 0.000000] Using 1TB segments
[ 0.000000] Found initrd at 0xc000000003a00000:0xc000000005884a58
[ 0.000000] bootconsole [udbg0] enabled
[ 0.000000] CPU maps initialized for 1 thread per core
 -> smp_release_cpus()
spinning_secondaries = 0
 <- smp_release_cpus()
[ 0.000000] Starting Linux ppc64le #3-Ubuntu SMP Mon Dec 7 10:52:50 UTC 2015
[ 0.000000] -----------------------------------------------------
[ 0.000000] ppc64_pft_size = 0x18
[ 0.000000] phys_mem_size = 0x60000000
[ 0.000000] cpu_features = 0x17fc7a6c18500249
[ 0.000000] possible = 0x1fffffef18500649
[ 0.000000] always = 0x0000000018100040
[ 0.000000] cpu_user_features = 0xdc0065c2 0xef000000
[ 0.000000] mmu_features = 0x58000001
[ 0.000000] firmware_features = 0x000000014052440b
[ 0.000000] htab_hash_mask = 0x1ffff
[ 0.000000] -----------------------------------------------------
 <- setup_system()
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
[ 1.181166] NIP: c000000000183050 LR: c0000000001838c8 CTR: c000000000183880
[ 1.181216] REGS: c00000005e70b7d0 TRAP: 0300 Tainted: G W (4.4.0-0-generic)
[ 1.181274] MSR: 8000000100009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24028404 XER: 20000000
[ 1.181394] CFAR: c000000000009958 DAR: 00000000000000c1 DSISR: 40000000 SOFTE: 1
GPR00: c0000000001838c8 c00000005e70ba50 c000000001573100 0000000000000001
GPR04: 000000000000000c 0000000000000000 0000000000000000 0000000000000000
GPR08: ffffffffffffffff 0000000000000001 ffffffffffffffff 000000000e57a24d
GPR12: c000000000183880 c00000000fb40000 c0000000000e4188 0000000004208040
GPR16: c000000058574a68 000000000000000a 0000000000000000 c00000005fe1f338
GPR20: 0000000000000000 c000000001481780 c000000000b03918 7fffffffffffffff
GPR24: 0000000000000001 c00000005e708000 c00000000147c180 c000000053079800
GPR28: c00000000152d090 c0000000015b0030 c000000053cf6950 00000000000000c1
[ 1.182066] NIP [c000000000183050] pids_cancel.constprop.4+0x30/0x90
[ 1.182109] LR [c0000000001838c8] pids_free+0x48/0x80
[ 1.182142] Call Trace:
[ 1.182160] [c00000005e70ba50] [c000000000b03918] cpu_online_mask+0x0/0x8 (unreliable)
[ 1.182220] [c00000005e70ba80] [c0000000001838c8] pids_free+0x48/0x80
[ 1.182271] [c00000005e70bab0] [c000000000181eb0] cgroup_free+0x90/0xe0
[ 1.182322] [c00000005e70bb00] [c0000000000b3998] __put_task_struct+0x68/0x170
[ 1.182381] [c00000005e70bb30] [c0000000000b8b6c] delayed_put_task_struct+0x6c/0xe0
[ 1.182440] [c00000005e70bb70] [c00000000013e3a0] rcu_process_callbacks+0x340/0x6e0
[ 1.182513] [c00000005e70bc10] [c0000000000bcff8] __do_softirq+0x188/0x3a0
[ 1.182563] [c00000005e70bd00] [c0000000000bd254] run_ksoftirqd+0x44/0xb0
[ 1.182614] [c00000005e70bd20] [c0000000000e9af0] smpboot_thread_fn+0x290/0x2a0
[ 1.182673] [c00000005e70bd80] [c0000000000e4290] kthread+0x110/0x130
[ 1.182724] [c00000005e70be30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
[ 1.182782] Instruction dump:
[ 1.182808] 3c4c013f 384200e0 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78 60000000
[ 1.182895] 60000000 3bff00c0 3940ffff 7c2004ac <7d20f8a8> 7d2a4a14 7d20f9ad 40c2fff4
[ 1.182982] ---[ end trace 66241192affcee1c ]---
[ 1.184126]
[ 3.184191] Kernel panic - not syncing: Fatal exception in interrupt

Changed in linux (Ubuntu):
status: New → Incomplete
Martin Pitt (pitti) on 2015-12-07
tags: added: bot-stop-nagging
Changed in linux (Ubuntu):
status: Incomplete → New
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Is this regression between the 4.3 and 4.4 kernels? Do you have a way to install test kernels to test this? If so, I can perform a kernel bisect to identify the exact commit that caused this.

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-key xenial
Martin Pitt (pitti) wrote :

Yes, it is a regression. Andy found the offending commit and the fix is already queued in Tejun's tree, so this should get fixed with the next upstream update.

Joseph Salisbury (jsalisbury) wrote :

Thanks for the update, Martin.

tags: removed: kernel-key
