Comment 0 for bug 1352056

Revision history for this message
bugproxy (bugproxy) wrote : kdump on Ubuntu 14.04 is not generating a dump.

---Problem Description---
kdump is not producing a dump on powerKVM LE P8 Ubuntu 14.04

---uname output---
3.13.0-30-generic

---Additional Hardware Info---
Power8 LE configuration.

---Patches Installed---
1324544 - kdump-config load fails with vmlinux kernel (vs. vmlinuz)

Machine Type = 8247-22L

---Steps to Reproduce---
Installed kdump-tools 1.5.5-2ubuntu1 and crash 7.0.3-3ubuntu3.
Updated /etc/default/kdump-tools, first I updated just USE_KDUMP=1. Rebooted the node and see:
root=UUID=87986483-5fec-4b4d-b22e-bf2a72096df8 ro quiet splash crashkernel=384M-:128M
root@c656f2n02:~# cat /proc/sys/kernel/sysrq
1
root@c656f2n02:~# cat /proc/sys/kernel/sysrq
1
root@c656f2n02:~# ^Cnd /proc | grep sysrq
root@c656f2n02:~# kdump-config status
current state : ready to kdump
root@c656f2n02:~# kdump-config show
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr:
current state: ready to kdump

kexec command:
  /sbin/kexec -p --args-linux --command-line="root=UUID=87986483-5fec-4b4d-b22e-bf2a72096df8 ro quiet splash irqpoll maxcpus=1 nousb" --initrd=/boot/initrd.img-3.13.0-30-generic /boot/vmlinux-3.13.0-30-generic

root@c656f2n02:/boot/grub# cat /sys/kernel/kexec_crash_loaded
1
root@c656f2n02:/boot/grub# cat /sys/kernel/kexec_loaded
0

echo c > /proc/sysrq-trigger

root@c656f2n02:/var/log# echo c > /proc/sysrq-trigger
[ 1956.014243] SysRq : Trigger a crash
[ 1956.014328] Unable to handle kernel paging request for data at address 0x00000000
[ 1956.014404] Faulting instruction address: 0xc000000000586c2c
[ 1956.014468] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1956.014518] SMP NR_CPUS=2048 NUMA PowerNV
[ 1956.014570] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables autofs4 rdma_ucm(OF) ib_ucm(OF) rdma_cm(OF) iw_cm(OF) ib_ipoib(OF) ib_cm(OF) ib_uverbs(OF) ib_umad(OF) mlx5_ib(OF) mlx5_core(OF) mlx4_ib(OF) ib_sa(OF) ib_mad(OF) ib_core(OF) ib_addr(OF) mlx4_en(OF) mlx4_core(OF) compat(OF) nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache rtc_generic powernv_rng ses enclosure ipr
[ 1956.015306] CPU: 146 PID: 2522 Comm: bash Tainted: GF O 3.13.0-30-generic #54-Ubuntu
[ 1956.015394] task: c000003fcabda120 ti: c000003fcac58000 task.ti: c000003fcac58000
[ 1956.015469] NIP: c000000000586c2c LR: c000000000587b8c CTR: c000000000586c00
[ 1956.015543] REGS: c000003fcac5b820 TRAP: 0300 Tainted: GF O (3.13.0-30-generic)
[ 1956.015617] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42422822 XER: 20000000
[ 1956.015804] CFAR: c000000000009318 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0
GPR00: c000000000587b8c c000003fcac5baa0 c00000000162e840 0000000000000063
GPR04: c000000002f45bd0 c000000002f564c8 0000000000015ad0 c000000001827480
GPR08: c000000000dfe840 0000000000000000 0000000000000001 0000000000015ad0
GPR12: 0000000042422822 c000000007e5ff00 000001002fe90648 000000001016e008
GPR16: 000000001013ad70 000001002fe94648 000000001016fed0 000000001016e008
GPR20: 00000000100c31e0 0000000000000000 0000000010171fc8 000000001016f840
GPR24: 0000000000000004 0000000000000000 0000000000000001 c0000000014b7dc8
GPR28: c000000001974c90 0000000000000063 c00000000148d9c0 c0000000014b8188
[ 1956.016794] NIP [c000000000586c2c] .sysrq_handle_crash+0x2c/0x40
[ 1956.016858] LR [c000000000587b8c] .__handle_sysrq+0xfc/0x260
[ 1956.016920] Call Trace:
[ 1956.016948] [c000003fcac5baa0] [0000000010172a34] 0x10172a34 (unreliable)
[ 1956.017025] [c000003fcac5bb10] [c000000000587b8c] .__handle_sysrq+0xfc/0x260
[ 1956.017101] [c000003fcac5bbd0] [c000000000588324] .write_sysrq_trigger+0x74/0x90
[ 1956.017190] [c000003fcac5bc50] [c0000000002dff1c] .proc_reg_write+0xac/0x110
[ 1956.017266] [c000003fcac5bcf0] [c000000000254c00] .vfs_write+0xe0/0x260
[ 1956.017342] [c000003fcac5bd90] [c0000000002558f4] .SyS_write+0x64/0xe0
[ 1956.017418] [c000003fcac5be30] [c00000000000a158] syscall_exit+0x0/0x98
[ 1956.017492] Instruction dump:
[ 1956.017530] 4bffffac 7c0802a6 f8010010 f821ff91 60000000 60000000 3d42001f 392a8ca8
[ 1956.017658] 39400001 91490000 7c0004ac 39200000 <99490000> 38210070 e8010010 7c0803a6
[ 1956.017894] ---[ end trace d163ff42366bde72 ]---
[ 1956.017986]
[ 1956.018042] Sending IPI to other CPUs
[ 1956.019188] IPI complete
 -> smp_release_cpus()
spinning_secondaries = 159
 <- smp_release_cpus()
 <- setup_system()
The console stays remains at this message until I power cycle the cec. There is no /proc/vmcore on reboot.

I recreated the hang on my victim node.
Some CPUs are hitting the 4400's interrupt vector. I think this is due to the commit 429d2e834295 "powerpc: Fix kdump hang issue on p8 with relocation on exception enabled." from Mahesh but I need to double check that since it may not be only patch missing.

Definitively, the patch I mentioned is fixing the hang.
Here are the commit details :

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=429d2e8342954d337abe370d957e78291032d867

powerpc: Fix kdump hang issue on p8 with relocation on exception enabled.

On p8 systems, with relocation on exception feature enabled we are seeing
kdump kernel hang at interrupt vector 0xc*4400. The reason is, with this
feature enabled, exception are raised with MMU (IR=DR=1) ON with the
default offset of 0xc*4000. Since exception is raised in virtual mode it
requires the vector region to be executable without which it fails to
fetch and execute instruction at 0xc*4xxx. For default kernel since kernel
is loaded at real 0, the htab mappings sets the entire kernel text region
executable. But for relocatable kernel (e.g. kdump case) we only copy
interrupt vectors down to real 0 and never marked that region as
executable because in p7 and below we always get exception in real mode.

This patch fixes this issue by marking htab mapping range as executable
that overlaps with the interrupt vector region for relocatable kernel.

Thanks to Ben who helped me to debug this issue and find the root cause.

Signed-off-by: Mahesh Salgaonkar <email address hidden>
Signed-off-by: Benjamin Herrenschmidt <email address hidden>

I think this bug should be mirrored to Ubuntu so they can include this patch in the 14.04 kernel, and may be also in the 14.10 kernel too.