Kernel crash dump not getting generated

Bug #1995270 reported by SRINIVAS SADAGOPAN
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-realtime
Fix Released
Medium
Joseph Salisbury

Bug Description

I have to enable the kernel crash dump feature to investigate certain system hang issues. My intention is to enable the kernel crash dump feature (https://ubuntu.com/server/docs/kernel-crash-dump) along with kernel.hung_task_panic and kernel.hung_task_timeout_secs parameters (in /etc/sysctl.conf), so that a kernel vmcore is generated when kernel detects a hung task for more than 5 minutes. I will be then able to send the vmcore files to Kernel engineers in Canonical for investigation.

However, While trying to enable kernel crash dump, I'm unable to get a vmcore generate using the test procedure described in "Testing the Crash Dump Mechanism" section in: https://ubuntu.com/server/docs/kernel-crash-dump

When I run the command: "echo c > /proc/sysrq-trigger", all I see is a backtrace (image attached). I do not see the print .. "Begin: Saving vmcore from kernel crash ..."

Upon system reboot there is no vmcore file under the /var/crash directory. could you please help pointing out what steps i'm missing?

Here are some system information:
=======================================

Last login: Mon Oct 31 11:35:21 2022
root@vran-server-1:~# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.15.0-1025-realtime root=/dev/mapper/ubuntu--vg-ubuntu--lv ro rhgb quiet skew_tick=1 nohz=on nohz_full=2-23,26-47,50-71,74-95 rcu_nocbs=2-23,26-47,50-71,74-95 intel_pstate=disable nosoftlockup intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 softlockup_panic=0 audit=0 cgroup_memory=1 cgroup_enable=memory mce=off idle=poll default_hugepagesz=1G skew_tick=1 idle=poll processor.max_cstate=1 intel_idle.max_cstate=0 rcu_nocb_poll kthread_cpus=0,1,72,73,48,49,24,25 irqaffinity=0,1,72,73,48,49,24,25 nosoftlockup tsc=nowatchdog isolcpus=managed_irq,domain,2-23,26-47,50-71,74-95 systemd.cpu_affinity=0,1,72,73,48,49,24,25 cgroup.memory=nokmem crashkernel=512M-:256M

root@vran-server-1:~# uname -a
Linux vran-server-1 5.15.0-1025-realtime #26 SMP PREEMPT_RT Thu Oct 20 18:14:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

root@vran-server-1:~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x49000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.15.0-1025-realtime
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.15.0-1025-realtime
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/vmlinuz-5.15.0-1025-realtime root=/dev/mapper/ubuntu--vg-ubuntu--lv ro rhgb quiet skew_tick=1 nohz=on nohz_full=2-23,26-47,50-71,74-95 rcu_nocbs=2-23,26-47,50-71,74-95 intel_pstate=disable nosoftlockup intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 softlockup_panic=0 audit=0 cgroup_memory=1 cgroup_enable=memory mce=off idle=poll default_hugepagesz=1G skew_tick=1 idle=poll processor.max_cstate=1 intel_idle.max_cstate=0 rcu_nocb_poll kthread_cpus=0,1,72,73,48,49,24,25 irqaffinity=0,1,72,73,48,49,24,25 nosoftlockup tsc=nowatchdog isolcpus=managed_irq,domain,2-23,26-47,50-71,74-95 systemd.cpu_affinity=0,1,72,73,48,49,24,25 cgroup.memory=nokmem reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

root@vran-server-1:~# cat /etc/default/kexec
# Defaults for kexec initscript
# sourced by /etc/init.d/kexec and /etc/init.d/kexec-load

# Load a kexec kernel (true/false)
LOAD_KEXEC=true

# Kernel and initrd image
KERNEL_IMAGE="/vmlinuz"
INITRD="/initrd.img"

# If empty, use current /proc/cmdline
APPEND=""

# Load the default kernel from grub config (true/false)
USE_GRUB_CONFIG=false

root@vran-server-1:~# cat /etc/default/kdump-tools
# kdump-tools configuration
# ---------------------------------------------------------------------------
# USE_KDUMP - controls kdump will be configured
# 0 - kdump kernel will not be loaded
# 1 - kdump kernel will be loaded and kdump is configured
#
USE_KDUMP=1

# ---------------------------------------------------------------------------
# Kdump Kernel:
# KDUMP_KERNEL - A full pathname to a kdump kernel.
# KDUMP_INITRD - A full pathname to the kdump initrd (if used).
# If these are not set, kdump-config will try to use the current kernel
# and initrd if it is relocatable. Otherwise, you will need to specify
# these manually.
KDUMP_KERNEL=/var/lib/kdump/vmlinuz
KDUMP_INITRD=/var/lib/kdump/initrd.img

# ---------------------------------------------------------------------------
# vmcore Handling:
# KDUMP_COREDIR - local path to save the vmcore to.
# KDUMP_FAIL_CMD - This variable can be used to cause a reboot or
# start a shell if saving the vmcore fails. If not set, "reboot -f"
# is the default.
# Example - start a shell if the vmcore copy fails:
# KDUMP_FAIL_CMD="echo 'makedumpfile FAILED.'; /bin/bash; reboot -f"
# KDUMP_DUMP_DMESG - This variable controls if the dmesg buffer is dumped.
# If unset or set to 1, the dmesg buffer is dumped. If set to 0, the dmesg
# buffer is not dumped.
# KDUMP_NUM_DUMPS - This variable controls how many dump files are kept on
# the machine to prevent running out of disk space. If set to 0 or unset,
# the variable is ignored and no dump files are automatically purged.
# KDUMP_COMPRESSION - Compress the dumpfile. No compression is used by default.
# Supported compressions: bzip2, gzip, lz4, xz
KDUMP_COREDIR="/var/crash"
#KDUMP_FAIL_CMD="reboot -f"
#KDUMP_DUMP_DMESG=
#KDUMP_NUM_DUMPS=
#KDUMP_COMPRESSION=

# ---------------------------------------------------------------------------
# Makedumpfile options:
# MAKEDUMP_ARGS - extra arguments passed to makedumpfile (8). The default,
# if unset, is to pass '-c -d 31' telling makedumpfile to use compression
# and reduce the corefile to in-use kernel pages only.
#MAKEDUMP_ARGS="-c -d 31"

# ---------------------------------------------------------------------------
# Kexec/Kdump args
# KDUMP_KEXEC_ARGS - Additional arguments to the kexec command used to load
# the kdump kernel
# Example - Use this option on x86 systems with PAE and more than
# 4 gig of memory:
# KDUMP_KEXEC_ARGS="--elf64-core-headers"
# KDUMP_CMDLINE - The default is to use the contents of /proc/cmdline.
# Set this variable to override /proc/cmdline.
# KDUMP_CMDLINE_APPEND - Additional arguments to append to the command line
# for the kdump kernel. If unset, it defaults to
# "reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb"
#KDUMP_KEXEC_ARGS=""
#KDUMP_CMDLINE=""
#KDUMP_CMDLINE_APPEND="reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb"

# ---------------------------------------------------------------------------
# Architecture specific Overrides:

# ---------------------------------------------------------------------------
# Remote dump facilities:
# HOSTTAG - Select if hostname of IP address will be used as a prefix to the
# timestamped directory when sending files to the remote server.
# 'ip' is the default.
#HOSTTAG="hostname|[ip]"

# NFS - Hostname and mount point of the NFS server configured to receive
# the crash dump. The syntax must be {HOSTNAME}:{MOUNTPOINT}
# (e.g. remote:/var/crash)
# NFS_TIMEO - Timeout before NFS retries a request. See man nfs(5) for details.
# NFS_RETRANS - Number of times NFS client retries a request. See man nfs(5) for details.
#NFS="<nfs mount>"
#NFS_TIMEO="600"
#NFS_RETRANS="3"

# FTP - Hostname and path of the FTP server configured to receive the crash dump.
# The syntax is {HOSTNAME}[:{PATH}] with PATH defaulting to /.
# FTP_USER - FTP username. A anonomous upload will be used if not set.
# FTP_PASSWORD - password for the FTP user
# FTP_PORT=21 - FTP port. Port 21 will be used by default.
#FTP="<server>:<path>"
#FTP_USER=""
#FTP_PASSWORD=""
#FTP_PORT=21

# SSH - username and hostname of the remote server that will receive the dump
# and dmesg files.
# SSH_KEY - Full path of the ssh private key to be used to login to the remote
# server. use kdump-config propagate to send the public key to the
# remote server
#SSH="<user at server>"
#SSH_KEY="<path>"

root@vran-server-1:~#
root@vran-server-1:~# cat /proc/sys/kernel/sysrq
176

root@vran-server-1:~# dmesg | grep -i crash
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.15.0-1025-realtime root=/dev/mapper/ubuntu--vg-ubuntu--lv ro rhgb quiet skew_tick=1 nohz=on nohz_full=2-23,26-47,50-71,74-95 rcu_nocbs=2-23,26-47,50-71,74-95 intel_pstate=disable nosoftlockup intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 softlockup_panic=0 audit=0 cgroup_memory=1 cgroup_enable=memory mce=off idle=poll default_hugepagesz=1G skew_tick=1 idle=poll processor.max_cstate=1 intel_idle.max_cstate=0 rcu_nocb_poll kthread_cpus=0,1,72,73,48,49,24,25 irqaffinity=0,1,72,73,48,49,24,25 nosoftlockup tsc=nowatchdog isolcpus=managed_irq,domain,2-23,26-47,50-71,74-95 systemd.cpu_affinity=0,1,72,73,48,49,24,25 cgroup.memory=nokmem crashkernel=512M-:256M
[ 0.012353] Reserving 256MB of memory at 1168MB for crashkernel (System RAM: 195278MB)
[ 0.551413] Kernel command line: BOOT_IMAGE=/vmlinuz-5.15.0-1025-realtime root=/dev/mapper/ubuntu--vg-ubuntu--lv ro rhgb quiet skew_tick=1 nohz=on nohz_full=2-23,26-47,50-71,74-95 rcu_nocbs=2-23,26-47,50-71,74-95 intel_pstate=disable nosoftlockup intel_iommu=on iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=0 softlockup_panic=0 audit=0 cgroup_memory=1 cgroup_enable=memory mce=off idle=poll default_hugepagesz=1G skew_tick=1 idle=poll processor.max_cstate=1 intel_idle.max_cstate=0 rcu_nocb_poll kthread_cpus=0,1,72,73,48,49,24,25 irqaffinity=0,1,72,73,48,49,24,25 nosoftlockup tsc=nowatchdog isolcpus=managed_irq,domain,2-23,26-47,50-71,74-95 systemd.cpu_affinity=0,1,72,73,48,49,24,25 cgroup.memory=nokmem crashkernel=512M-:256M
[ 3.536582] pstore: Using crash dump compression: deflate
[ 3.992799] megaraid_sas 0000:5e:00.0: firmware crash dump : yes

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :
Changed in ubuntu-realtime:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Srinivas,

I was able to generate a dump with the 1025 kernel by following the wiki instructions. One difference, is I used a VM:

jsalisbury@jammy-realtime-vm:/var/crash$ ls
202211081248 kexec_cmd
kdump_lock linux-image-5.15.0-1025-realtime-202211081248.crash

Would it be possible for you to try the same steps you performed in a VM or on another machine? That will tell us if the issue is specific to the machine/config or the instructions. I notice you are also using lvm, but I am not, so maybe that is related.

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote (last edit ):

Hi Joseph- Works for me on a VM too. I need to get this enabled on the server where we plan to run stability tests. Without the kernel dump feature enabled on the server we will not be able to proceed with investigation of any system hang issues we might see.

Could you please let me know how we should proceed to get the kernel dump feature successfully working on my server?

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Hi Joseph - Any suggestions on how we should proceed to investigate the non-functioning of kernel dump in our server? Once we have the kernel dump feature enabled, we will be able to start our stability tests.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

We would like to collect some additional information about your system. From a terminal, please run the following:

apport-collect 1995270
or to a file:
apport-bug --save /tmp/report.1995270 linux

If apport can't be run:
1) uname -a > uname-a.log
2) dmesg > dmesg.log
3) sudo lspci -vvnn > lspci-vvnn.log
4) cat /proc/version_signature > version.log

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Also, could you try kdump with the generic (non-realtime) kernel? It would be good to know if this bug is specific to the real-time kernel, or if it affects all kernels on this hardware confg.

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote (last edit ):

apport-bug command does not work. Is this is a known issue? I have attached the log files (1995270-system-information.tgz) with outputs of commands as requested

root@vran-server-1:~# apport-bug --save /tmp/report.1995270 linux

*** Collecting problem information

The collected information can be sent to the developers to improve the
application. This might take a few minutes.
..

*** Problem in linux-image-5.15.0-1025-realtime

The problem cannot be reported:

This report is about a package that is not installed.

Press any key to continue...

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :
Changed in ubuntu-realtime:
status: Triaged → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It looks like you are allocating 256M for crashkernel, which may be too little for how much physical RAM this machine has. Could you try changing this so it is range based, which will allocate 768M for your system?

That can be done by editing the following file:
/etc/default/grub.d/kdump-tools.cfg

In that file, you will see a line like the following:
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=512M-:256M"

Change it to:
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=0M-2G:128M,2G-6G:256M,6G-8G:512M,8G-:768M"

Then you have to run:
sudo update-grub

Finally reboot.

You can confirm this is the confirmed memory after a reboot by looking at /proc/cmdline. The following is currently reported in your /proc/cmdline: crashkernel=512M-:256M It should change to crashkernel=0M-2G:128M,2G-6G:256M,6G-8G:512M,8G-:768M after the reboot.

What this changes means is:
A system with 0 to 2G of RAM, the crashkernel will have 128M reserved.
A system with 2 to 6G of RAM, the crashkernel will have 256M reserved.
A system with 6 to 8G of RAM, the crashkernel will have 512M reserved.
A system with Greater than 8G of RAM, the crashkernel will have 768M reserved.

This is documented in the kernel documentation here:
https://www.kernel.org/doc/Documentation/kdump/kdump.txt

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Thanks Joseph. Kernel dump mechanism works after increasing the allocation for crash kernel. I did try increasing it before but not by the amount that makes it work now. Thanks I will close this ticket

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote (last edit ):

The issue described in this ticket was caused by configuration error. This ticket can be closed.

Changed in ubuntu-realtime:
status: In Progress → Fix Released
no longer affects: kdump-tools (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.