ISST-LTE: KVM:UBUNTU1804: kdump is not working on UbuntuKVM guest

Bug #1745104 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Critical
Canonical Kernel Team
linux (Ubuntu)
Fix Released
Critical
Joseph Salisbury
Bionic
Fix Released
Critical
Joseph Salisbury

Bug Description

== Comment: #0 - Chanh H. Nguyen <email address hidden> - 2018-01-23 14:55:05 ==
root@boslcp4g5:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu Bionic Beaver (development branch)"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
root@boslcp4g5:~# uname -r
4.13.0-25-generic
root@boslcp4g5:~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr:
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.13.0-25-generic
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.13.0-25-generic
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.13.0-25-generic root=UUID=8a776bc5-d9e0-4a1d-9218-135f9c702e11 ro splash quiet nr_cpus=1 systemd.unit=kdump-tools.service irqpoll noirqdistrib nousb" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
root@boslcp4g5:~# kdump-config status
current state : ready to kdump
root@boslcp4g5:~# echo c > /proc/sysrq-trigger
[ 176.911191] sysrq: SysRq : This sysrq operation is disabled.
root@boslcp4g5:~# sysctl -w kernel.sysrq=1
kernel.sysrq = 1
root@boslcp4g5:~# echo c > /proc/sysrq-trigger
[ 240.304466] sysrq: SysRq : Trigger a crash
[ 240.304545] Unable to handle kernel paging request for data at address 0x00000000
[ 240.304656] Faulting instruction address: 0xc000000000792f88
[ 240.304771] Oops: Kernel access of bad area, sig: 11 [#1]
[ 240.304846] SMP NR_CPUS=2048
[ 240.304848] NUMA
[ 240.304903] pSeries
[ 240.305000] Modules linked in: sctp_diag sctp libcrc32c dccp_diag dccp tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache dm_service_time vmx_crypto crct10dif_vpmsum binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sunrpc sch_fq_codel ip_tables x_tables autofs4 btrfs xor raid6_pq crc32c_vpmsum virtio_net virtio_scsi
[ 240.305776] CPU: 12 PID: 1860 Comm: bash Not tainted 4.13.0-25-generic #29-Ubuntu
[ 240.305886] task: c0000000ff904500 task.stack: c0000001f2d7c000
[ 240.305979] NIP: c000000000792f88 LR: c000000000793eb8 CTR: c000000000792f60
[ 240.306087] REGS: c0000001f2d7f9f0 TRAP: 0300 Not tainted (4.13.0-25-generic)
[ 240.306195] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
[ 240.306207] CR: 28422222 XER: 20040000
[ 240.306338] CFAR: c000000000793eb4 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
[ 240.306338] GPR00: c000000000793eb8 c0000001f2d7fc70 c0000000015f6200 0000000000000063
[ 240.306338] GPR04: c0000001feeeade8 c0000001fef02068 6967676572206120 63726173680d0a72
[ 240.306338] GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000000
[ 240.306338] GPR12: c000000000792f60 c000000007ac7e00 0000000010180df8 0000000010189e30
[ 240.306338] GPR16: 0000000010189ea8 0000000010151210 000000001018bd58 000000001018de48
[ 240.306338] GPR20: 0000000028bc0268 0000000000000001 0000000010164590 0000000010163bb0
[ 240.306338] GPR24: 00007fffdcb37d34 00007fffdcb37d30 c0000000014fa770 0000000000000002
[ 240.306338] GPR28: 0000000000000063 0000000000000007 c0000000014824f4 c0000000014fab10
[ 240.307346] NIP [c000000000792f88] sysrq_handle_crash+0x28/0x30
[ 240.307474] LR [c000000000793eb8] __handle_sysrq+0xf8/0x2b0
[ 240.307553] Call Trace:
[ 240.307594] [c0000001f2d7fc70] [c000000000793e98] __handle_sysrq+0xd8/0x2b0 (unreliable)
[ 240.307715] [c0000001f2d7fd10] [c0000000007946b4] write_sysrq_trigger+0x64/0x90
[ 240.307850] [c0000001f2d7fd40] [c00000000044fb28] proc_reg_write+0x88/0xd0
[ 240.307951] [c0000001f2d7fd70] [c0000000003a160c] __vfs_write+0x3c/0x70
[ 240.308049] [c0000001f2d7fd90] [c0000000003a3248] vfs_write+0xd8/0x220
[ 240.308149] [c0000001f2d7fde0] [c0000000003a50c8] SyS_write+0x68/0x110
[ 240.308248] [c0000001f2d7fe30] [c00000000000b184] system_call+0x58/0x6c
[ 240.308340] Instruction dump:
[ 240.308400] 4bfff9f1 4bfffe50 3c4c00e6 384232a0 7c0802a6 60000000 39200001 3d42001d
[ 240.308522] 394adab0 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00e6 38423270
[ 240.308644] ---[ end trace 97aaa45518689ad0 ]---
[ 240.314197]
[ 240.314408] Sending IPI to other CPUs
[ 240.357424] IPI complete
[ 240.377038] kexec: Starting switchover sequence.
 <<<<<<<<<<<< it stops here.....

== Comment: #2 - MAMATHA INAMDAR <email address hidden> - 2018-01-24 01:07:12 ==
Hi Chanh,

We need following patch which will fix this issue. I think this patch is not integrated in 18.04 kernel

From 2621e945fbf1d6df5f3f0ba7be5bae3d2cf9b6a5 Mon Sep 17 00:00:00 2001
From: Michael Ellerman <email address hidden>
Date: Fri, 24 Nov 2017 14:51:02 +1100
Subject: [PATCH] powerpc/kexec: Fix kexec/kdump in P9 guest kernels

The code that cleans up the IAMR/AMOR before kexec'ing failed to
remember that when we're running as a guest AMOR is not writable, it's
hypervisor privileged.

They symptom is that the kexec stops before entering purgatory and
nothing else is seen on the console. If you examine the state of the
system all threads will be in the 0x700 program check handler.

Fix it by making the write to AMOR dependent on HV mode.

Fixes: 1e2a516e89fc ("powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR")
Cc: <email address hidden> # v4.10+
Reported-by: Yilin Zhang <email address hidden>
Debugged-by: David Gibson <email address hidden>
Signed-off-by: Michael Ellerman <email address hidden>
Acked-by: Balbir Singh <email address hidden>
Reviewed-by: David Gibson <email address hidden>
Tested-by: David Gibson <email address hidden>
Signed-off-by: Michael Ellerman <email address hidden>

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 8ac0bd2..3280953 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -623,7 +623,9 @@ BEGIN_FTR_SECTION
         * NOTE, we rely on r0 being 0 from above.
         */
        mtspr SPRN_IAMR,r0
+BEGIN_FTR_SECTION_NESTED(42)
        mtspr SPRN_AMOR,r0
+END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)

        /* save regs for local vars on new stack.
--

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-163889 severity-critical targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Frank Heimes (fheimes)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g
tags: added: kernel-da-key
removed: triage-g
tags: added: ppc64el-kdump
removed: kernel-da-key
tags: added: triage-g
tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 2621e945fbf1d6df5f3f0ba7be5bae3d2cf9b6a5. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745104

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (6.0 KiB)

------- Comment From <email address hidden> 2018-01-24 15:10 EDT-------
(In reply to comment #6)
> I built a test kernel with commit 2621e945fbf1d6df5f3f0ba7be5bae3d2cf9b6a5.
> The test kernel can be downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1745104
>
> Can you test this kernel and see if it resolves this bug?
>
> Note, to test this kernel, you need to install both the linux-image and
> linux-image-extra .deb packages.
>
> Thanks in advance!

Thanks. It works but I see another error the "makedumpfile Failed".

root@boslcp4g5:~# echo c > /proc/sysrq-trigger
[ 80.255377] sysrq: SysRq : Trigger a crash
[ 80.255490] Unable to handle kernel paging request for data at address 0x00000000
[ 80.255597] Faulting instruction address: 0xc00000000078f608
[ 80.255708] Oops: Kernel access of bad area, sig: 11 [#1]
[ 80.255781] SMP NR_CPUS=2048
[ 80.255782] NUMA
[ 80.255837] pSeries
[ 80.255930] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache sctp_diag sctp libcrc32c dccp_diag dccp tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc vmx_crypto crct10dif_vpmsum dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel sunrpc ip_tables x_tables autofs4 btrfs xor raid6_pq crc32c_vpmsum virtio_net virtio_scsi
[ 80.256798] CPU: 12 PID: 1928 Comm: bash Not tainted 4.13.0-17-generic #20~lp1745104
[ 80.256907] task: c0000000053c8a00 task.stack: c0000001ec3dc000
[ 80.257010] NIP: c00000000078f608 LR: c000000000790538 CTR: c00000000078f5e0
[ 80.257116] REGS: c0000001ec3df9f0 TRAP: 0300 Not tainted (4.13.0-17-generic)
[ 80.257221] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
[ 80.257232] CR: 28422222 XER: 20040000
[ 80.257355] CFAR: c000000000790534 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
[ 80.257355] GPR00: c000000000790538 c0000001ec3dfc70 c000000001606000 0000000000000063
[ 80.257355] GPR04: c0000001feeeade8 c0000001fef02068 6967676572206120 63726173680d0a72
[ 80.257355] GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000000
[ 80.257355] GPR12: c00000000078f5e0 c000000007ac7e00 0000000010180df8 0000000010189e30
[ 80.257355] GPR16: 0000000010189ea8 0000000010151210 000000001018bd58 000000001018de48
[ 80.257355] GPR20: 00000000121f0248 0000000000000001 0000000010164590 0000000010163bb0
[ 80.257355] GPR24: 00007fffd06a0724 00007fffd06a0720 c00000000150a570 0000000000000002
[ 80.257355] GPR28: 0000000000000063 0000000000000004 c0000000014922f4 c00000000150a910
[ 80.258340] NIP [c00000000078f608] sysrq_handle_crash+0x28/0x30
[ 80.258433] LR [c000000000790538] __handle_sysrq+0xf8/0x2b0
[ 80.258504] Call Trace:
[ 80.258546] [c0000001ec3dfc70] [c000000000790518] __handle_sysrq+0xd8/0x2b0 (unreliable)
[ 80.258657] [c0000001ec3dfd10] [c000000000790d34] write_sysrq_trigger+0x64/0x90
[ 80.258789] [c0000001ec3dfd40] [c00000000044c0c8] proc_reg_write+0x88/0xd0
[ 80.258883] [c0000001ec3dfd70] [c00000000039db8c] __vfs_write+0x3c/0x70
[ 80.258975] [c0000001ec3dfd90] [c00000000039f7c8] vfs_write+0xd8/0x220
[ 80.259067] [c0000001ec3dfde0] [c0000000003a1648] SyS_write+0x68/0x110
[ 80.259159]...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I see this in the trace in comment #2:
"The kernel version is not supported."

I wonder if makedumpfile is seeing the "lp1745104" tag I put in the test kernel name. I'll build another kernel without this tag to see if it helps.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-30 09:33 EDT-------
(In reply to comment #8)
> I see this in the trace in comment #2:
> "The kernel version is not supported."

That is an issue with makedumpfile tool version used. makedumpfile v1.6.2
used in 18.04 officially only supports till kernel 4.11.7. We would see that
warning for any kernel above that version. As far the failure to capture
dump, makedumpfile tool must be missing some upstream patches.
Let us track that in a separate bug.

>
> I wonder if makedumpfile is seeing the "lp1745104" tag I put in the test
> kernel name. I'll build another kernel without this tag to see if it helps.

Not necessary. Makedumpfile issue is a different bug altogether.

Thanks
Hari

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update, Hari. I'll submit an SRU request for commit 2621e945fbf1d and we can track the new issue in a separate bug report.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-01-30 13:26 EDT-------
(In reply to comment #10)
> Thanks for the update, Hari. I'll submit an SRU request for commit
> 2621e945fbf1d and we can track the new issue in a separate bug report.

for the records.. LP bug 1746299 opened to resolve makedumpfile issue

Thanks
Hari

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Commit 2621e945fbf1d is now in Bionic as of kernel version Ubuntu-4.14.0-12.14.

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
bugproxy (bugproxy)
tags: added: targetmilestone-inin1804
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : console log

------- Comment (attachment only) From <email address hidden> 2018-03-08 12:15 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-03-08 12:14 EDT-------
(In reply to comment #14)
> Chanh & Indira,
>
> Can you please retest this on latest PNOR and ubuntu level? Need update by
> today. If you are blocked for a system, please mention the blocking bug

It works. I just verified on boslcp4 pnor=227,bmc=1.15, kernel=4.15.0-10-generic
root@boslcp4g2:~# uname -a
Linux boslcp4g2 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
root@boslcp4g2:/var/crash# ls -l
total 24
drwxr-xr-x 2 root root 4096 Mar 11 05:05 201803110504
-rw-r--r-- 1 root root 253 Mar 11 05:05 kexec_cmd
-rw-r----- 1 root root 12297 Mar 11 05:03 linux-image-4.15.0-10-generic-201803110502.crash

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.