[Ubuntu 1810] Kdump fails to dump vmcore and enters initramfs inside Power9 KVM guest

Bug #1808743 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
High
Canonical Kernel Team
makedumpfile (Ubuntu)
Fix Released
High
Canonical Kernel Team

Bug Description

Kdump fails to dump vmcore even with workaround suggested

This issue is submitted to
track on Power9 Guest where it uses file type qcow2 disk (virtio-scsi)

Boot Log: (Attached full console log)

[ 3.754031] 32regs : 19616.000 MB/sec
[ 3.794031] 32regs_prefetch: 17280.000 MB/sec
[ 3.834030] altivec : 22480.000 MB/sec
[ 3.834063] xor: using function: altivec (22480.000 MB/sec)
done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
Begin: Waiting for root file system ... Begin: Running /scripts/local-block ... mdadm: No devices listed in conf file were found.
done.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
done.
Gave up waiting for root file system device. Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT! UUID=5e1fe9e9-cf03-4c73-adce-0e57676f98e0 does not exist. Dropping to a shell!

BusyBox v1.27.2 (Ubuntu 1:1.27.2-2ubuntu4) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

Contact Information = Balamuruhan S / <email address hidden>

---uname output---

Guest Kernel: 4.18.0-11-generic
Host Kernel: 4.18.0-11-generic

Machine Type = Boston

---Debugger---
A debugger is not configured

---Steps to Reproduce---
1. Have a healthy KVM guest with Ubuntu 1810 with kernel 4.18.0-11-generic
2. Install kdump, kexec and crash tools in the guest,
# dpkg -l | grep crash
ii apport 2.20.10-0ubuntu13.1 all automatically generate crash reports for debugging
ii crash 7.2.3+real-1 ppc64el kernel debugging utility, allowing gdb like syntax
ii kdump-tools 1:1.6.4-2ubuntu1 ppc64el scripts and tools for automating kdump (Linux crash dumps)
ii linux-crashdump 4.18.0.11.12 ppc64el Linux kernel crashdump setup for the latest generic kernel
ii python3-apport 2.20.10-0ubuntu13.1 all Python 3 library for Apport crash report handling

3. Ensure workaround suggested in Bug 172389 is followed by uncomment the `KDUMP_CMDLINE_APPEND`
and change nr_cpus to maxcpus in /etc/default/kdump-tools config file,

# cat /etc/default/kdump-tools | grep -i cmdline
# KDUMP_CMDLINE - The default is to use the contents of /proc/cmdline.
# Set this variable to override /proc/cmdline.
# KDUMP_CMDLINE_APPEND - Additional arguments to append to the command line
#KDUMP_CMDLINE=""
KDUMP_CMDLINE_APPEND="1 maxcpus=1 systemd.unit=kdump-tools.service irqpoll noirqdistrib nousb reset_devices"

4. restart the kdump tools service,
# service kdump-tools restart
# service kdump-tools status
? kdump-tools.service - Kernel crash dump capture service
   Loaded: loaded (/lib/systemd/system/kdump-tools.service; enabled; vendor pres
   Active: active (exited) since Mon 2018-12-03 02:34:03 CST; 1 weeks 3 days ago
  Process: 1560 ExecStart=/etc/init.d/kdump-tools start (code=exited, status=0/S
 Main PID: 1560 (code=exited, status=0/SUCCESS)

Dec 03 02:34:02 ubuntu1810 systemd[1]: Starting Kernel crash dump capture servic
Dec 03 02:34:02 ubuntu1810 kdump-tools[1560]: Starting kdump-tools: * Creating
Dec 03 02:34:02 ubuntu1810 kdump-tools[1560]: * Creating symlink /var/lib/kdump
Dec 03 02:34:03 ubuntu1810 kdump-tools[1560]: Modified cmdline:BOOT_IMAGE=/boot/
Dec 03 02:34:03 ubuntu1810 kdump-tools[1560]: * loaded kdump kernel
Dec 03 02:34:03 ubuntu1810 kdump-tools[1678]: /sbin/kexec -p --command-line="BOO
Dec 03 02:34:03 ubuntu1810 kdump-tools[1679]: loaded kdump kernel
Dec 03 02:34:03 ubuntu1810 systemd[1]: Started Kernel crash dump capture service

5. check kdump-config is state is ready to dump,
# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr:
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.18.0-11-generic
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.18.0-11-generic
current state: ready to kdump

6. Reboot the guest and check for kernel cmdline whether crashkernel is included,
# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinux-4.18.0-11-generic root=UUID=5e1fe9e9-cf03-4c73-adce-0e57676f98e0 ro net.ifnames=0 biosdevname=0 crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M@128M

7. enable sysrq and trigger the crash,
# echo 1 > /proc/sys/kernel/sysrq
# cat /proc/sys/kernel/sysrq
1

# echo c > /proc/sysrq-trigger

kdump fails to generate vmcore after crash instead reboots and enters initramfs

Attachment:
1. Guest console log
2. Guest sosreport
3. Guest xml

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-174078 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
affects: linux (Ubuntu) → makedumpfile (Ubuntu)
Changed in ubuntu-power-systems:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in makedumpfile (Ubuntu):
importance: Undecided → High
status: New → Triaged
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
tags: added: kernel-da-key
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2019-01-18 00:59 EDT-------
Ubuntu, please provide update

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Can you please provide the console output when this fails? It looks like the attachments were not synced to launchpad.

Thanks.
Cascardo.

Revision history for this message
bugproxy (bugproxy) wrote : console.log

------- Comment (attachment only) From <email address hidden> 2019-02-06 01:39 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport

------- Comment (attachment only) From <email address hidden> 2019-02-06 01:40 EDT-------

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

It is not finding the root filesystem, but virtio seems to find the SCSI HBA. Not really sure why it won't find the root filesystem.

As initramfs drops to a shell, can you see if /dev/vda* exists, and what about what is under /dev/disk/by-uuid/? And is ext4 loaded by doing cat /proc/modules ?

Thanks.
Cascardo.

Revision history for this message
Will Dormann (wdormann) wrote :

In my testing, linux-crashdump fails to recognize the root FS if the VM is using a SCSI controller. When the root FS is connected via SATA, it works fine.
This is on an up-to-date 18.04 install in a VMware Workstation VM.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, @wdormann.

Can you open a new bug and add details there, like sosreport, console logs, dmesg, etc?

Thank you very much.
Cascardo.

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → Incomplete
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Marked as incomplete, awaiting response to Thadeu's question in comment #5.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (5.9 KiB)

------- Comment From <email address hidden> 2019-02-28 03:07 EDT-------
I have tested with latest version of Ubuntu 1810 kernel and kdump, it is working as expected,

# uname -a
Linux ubuntu1810 4.18.0-15-generic #16-Ubuntu SMP Thu Feb 7 11:04:25 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

# dpkg -l | grep crash
ii apport 2.20.10-0ubuntu13.2 all automatically generate crash reports for debugging
ii crash 7.2.3+real-1 ppc64el kernel debugging utility, allowing gdb like syntax
ii kdump-tools 1:1.6.4-2ubuntu1 ppc64el scripts and tools for automating kdump (Linux crash dumps)
ii linux-crashdump 4.18.0.15.16 ppc64el Linux kernel crashdump setup for the latest generic kernel
ii python3-apport 2.20.10-0ubuntu13.2 all Python 3 library for Apport crash report handling

# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger
[ 437.932187] sysrq: SysRq : Trigger a crash
[ 437.932248] Unable to handle kernel paging request for data at address 0x00000000
[ 437.932321] Faulting instruction address: 0xc0000000008297f8
[ 437.932384] Oops: Kernel access of bad area, sig: 11 [#1]
[ 437.932432] LE SMP NR_CPUS=2048 NUMA pSeries
[ 437.932486] Modules linked in: iscsi_target_mod target_core_mod xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter kvm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel nfsd ib_iser auth_rpcgss nfs_acl rdma_cm lockd iw_cm grace ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear virtio_net crc32c_vpmsum virtio_scsi net_failover failover
[ 437.933248] CPU: 12 PID: 2849 Comm: bash Kdump: loaded Not tainted 4.18.0-15-generic #16-Ubuntu
[ 437.933335] NIP: c0000000008297f8 LR: c00000000082a684 CTR: c0000000008297d0
[ 437.933411] REGS: c000000005fcba00 TRAP: 0300 Not tainted (4.18.0-15-generic)
[ 437.933487] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48422222 XER: 20040000
[ 437.933571] CFAR: c00000000082a680 DAR: 0000000000000000 DSISR: 42000000 IRQMASK: 0
[ 437.933571] GPR00: c00000000082a684 c000000005fcbc80 c00000000178ca00 0000000000000063
[ 437.933571] GPR04: 0000000000000001 0000000000000184 6967676572206120 63726173680d0a72
[ 437.933571] GPR08: 0000000000000007 0000000000000001 0000000000000000 c000000005fcb8af
[ 437.933571] GPR12: c0000000008297d0 c000000007fdea00 00000fa17d1d9760 0000000000000000
[ 437.933571] GPR16: 00000fa18f5559a0 00000fa17d174a48 00000fa17d1d9760 00000fa17d0f8b00
[ 437.933571] GPR20: 0000000000000000 0000000000000001 00000fa17d18...

Read more...

Changed in ubuntu-power-systems:
status: Incomplete → Triaged
Revision history for this message
Manoj Iyer (manjo) wrote :

Closing this as 'Fix Released' based on the verification from IBM that this is now fixed with the latest kernel.

Changed in makedumpfile (Ubuntu):
status: Triaged → Fix Released
Changed in ubuntu-power-systems:
status: Triaged → Fix Released
Revision history for this message
Frank Heimes (fheimes) wrote :

Since it was in comment #9 verified that it now works as expected with the in between available package (thanks to <email address hidden>) - I'm closing that ticket and set the status to Fix Released.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.