kdump cannot generate coredump file on bluefield with 5.4 and 5.15 kernel

Bug #2021930 reported by Tony Duan
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-bluefield (Ubuntu)
New
Undecided
Unassigned

Bug Description

kdump cannot generate coredump file on bluefield with 5.4 kernel

Bug description:

Following the instruction in https://ubuntu.com/server/docs/kernel-crash-dump, the coredump file cannot be generated.

Bluefield is running 5.4 kernel
 bf2:~$ uname -a
 Linux sw-mtx-008-bf2 5.4.0-1060-bluefield #66-Ubuntu SMP PREEMPT Mon Mar 27 15:52:50 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

crashkernel parameter is configured
 bf2:~$ cat /proc/cmdline
 BOOT_IMAGE=/boot/vmlinuz-5.4.0-1060-bluefield root=UUID=52ddbe2c-ee4f-48d4-b7d4-ab76e264e438 ro console=hvc0 console=ttyAMA0 earlycon=pl011,0x01000000 fixrtc net.ifnames=0 biosdevname=0 iommu.passthrough=1 crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M
 bf2:~$ dmesg | grep -i crash
 [ 0.000000] crashkernel reserved: 0x00000000cfe00000 - 0x00000000efe00000 (512 MB)
 [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1060-bluefield root=UUID=52ddbe2c-ee4f-48d4-b7d4-ab76e264e438 ro console=hvc0 console=ttyAMA0 earlycon=pl011,0x01000000 fixrtc net.ifnames=0 biosdevname=0 iommu.passthrough=1 crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M
 [ 8.070921] pstore: Using crash dump compression: deflate

kdump-config is as below:
 bf2:~$ kdump-config show
 DUMP_MODE: kdump
 USE_KDUMP: 1
 KDUMP_SYSCTL: kernel.panic_on_oops=1
 KDUMP_COREDIR: /var/crash
 crashkernel addr: 0x
  /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.4.0-1060-bluefield
 kdump initrd:
  /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.4.0-1060-bluefield
 current state: ready to kdump

 kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.4.0-1060-bluefield root=UUID=52ddbe2c-ee4f-48d4-b7d4-ab76e264e438 ro console=hvc0 console=ttyAMA0 earlycon=pl011,0x01000000 fixrtc net.ifnames=0 biosdevname=0 iommu.passthrough=1 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

sysrq:
 bf2:/# cat /proc/sys/kernel/sysrq
 176

After trigged the crash manually with "echo c > /proc/sysrq-trigger", the system could not come up because of OOM. And after change the crashkernel with 1024M memory it still hangs.
 With default 512M, it hangs at "Killed process 674"
  [ 8.718188] systemd-journald[368]: File /var/log/journal/8244d38b2f804fc692f3f2dbf8206f57/system.journal corrupted or uncleanly shut down, renaming and re.
  [ 30.252513] Out of memory: Killed process 651 (systemd-resolve) total-vm:24380kB, anon-rss:3812kB, file-rss:1828kB, shmem-rss:0kB, UID:101 pgtables:80kB o0
  ...
  [ 34.651927] Out of memory: Killed process 674 (dbus-daemon) total-vm:7884kB, anon-rss:552kB, file-rss:1380kB, shmem-rss:0kB, UID:103 pgtables:52kB oom_sco0
 With 1024M, it hangs at following
  [ 8.733323] systemd-journald[369]: File /var/log/journal/8244d38b2f804fc692f3f2dbf8206f57/system.journal corrupted or uncleanly shut down, renaming and re.

After soft reboot the Bluefield, there's no coredump file generated.
 bf2:~$ ls /var/crash/ -la
 total 52
 drwxrwxrwt 3 root root 4096 May 31 01:43 .
 drwxr-xr-x 14 root root 4096 Apr 30 11:26 ..
 drwxrwxr-x 2 ubuntu ubuntu 4096 May 31 01:43 202305310143
 -rw-r----- 1 root root 34307 May 31 01:18 _usr_share_netplan_netplan.script.0.crash
 -rw-r--r-- 1 root root 0 May 31 03:47 kdump_lock
 -rw-r--r-- 1 root root 358 May 31 03:48 kexec_cmd
 bf2:~$ ls /var/crash/202305310143/ -la
 total 8
 drwxrwxr-x 2 ubuntu ubuntu 4096 May 31 01:43 .
 drwxrwxrwt 3 root root 4096 May 31 01:43 ..

This issue also happens on 5.4.0-1049-bluefield kernel.

Revision history for this message
William Tu (wtu) wrote (last edit ):

I also tested it on 5.15.0-1031-bluefield and it also fails.

Configurations:

root@bu-oob:~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0xbd000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.15.0-1031-bluefield
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.15.0-1031-bluefield
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.15.0-1031-bluefield root=UUID=8e8b38a6-7d3d-4a29-b7a0-99761624f941 ro console=hvc0 console=ttyAMA0 earlycon=pl011,0x13010000 fixrtc net.ifnames=0 biosdevname=0 iommu.passthrough=1 console=tty1 console=ttyS0 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
root@bu-lab60v3-oob:~#

###################################
root@bu-oob:~# dmesg |grep -i crash
[ 0.000000] crashkernel reserved: 0x00000000bd000000 - 0x00000000fd000000 (1024 MB)
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1031-bluefield root=UUID=8e8b38a6-7d3d-4a29-b7a0-99761624f941 ro console=hvc0 console=ttyAMA0 earlycon=pl011,0x13010000 fixrtc net.ifnames=0 biosdevname=0 iommu.passthrough=1 console=tty1 console=ttyS0 crashkernel=2G-4G:320M,4G-32G:1024M,32G-64G:1536M,64G-128G:2048M,128G-:4096M
[ 5.230439] pstore: Using crash dump compression: deflate
root@bu-oob:~#

################
root@bu-oob:~# cat /etc/default/grub.d/kdump-tools.cfg
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=2G-4G:320M,4G-32G:1024M,32G-64G:1536M,64G-128G:2048M,128G-:4096M"

root@bu-lab60v3-oob:~# grep -e "CRASH" -e "KEXEC" /boot/config-5.15.0-1031-bluefield
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_KEXEC_SIG=y
CONFIG_KEXEC_IMAGE_VERIFY_SIG=y
CONFIG_CRASH_DUMP=y
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_HAVE_IMA_KEXEC=y
CONFIG_IMA_KEXEC=y

*** How to reproduce ***
When manually triggers the crash "echo c > /proc/sysrq-trigger"
the system just hangs without showing any message/log.

Revision history for this message
William Tu (wtu) wrote :
Download full text (42.9 KiB)

adding "-d" to kexec which shows debugging info. I did not see any error...

[ 441.468322] kdump-tools[7548]: arch_process_options:178: command_line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1031-bluefield root=1
[ 441.468586] kdump-tools[7548]: arch_process_options:180: initrd: /var/lib/kdump/initrd.img
[ 441.468726] kdump-tools[7548]: arch_process_options:182: dtb: (null)
[ 441.468827] kdump-tools[7548]: arch_process_options:185: console: (null)
[ 441.468936] kdump-tools[7548]: Try gzip decompression.
[ 441.734545] kdump-tools[7548]: kernel: 0xffffa6946010 kernel_size: 0x2c9c198
[ 441.734767] kdump-tools[7548]: set_phys_offset: phys_offset : 0000000080000000 (method : vmcoreinfo pt_note)
[ 441.734967] kdump-tools[7548]: get_memory_ranges:+[0] 0000000082000000 - 0000000087ffffff
[ 441.735103] kdump-tools[7548]: get_memory_ranges:+[1] 0000000088002000 - 00000000fffdffff
[ 441.735209] kdump-tools[7548]: get_memory_ranges:- 00000000b9000000 - 00000000bcffffff
[ 441.735308] kdump-tools[7548]: get_memory_ranges:- 00000000fd000000 - 00000000feffffff
[ 441.735391] kdump-tools[7548]: get_memory_ranges:+[4] 0000000100000000 - 000000085a0bffff
[ 441.735467] kdump-tools[7548]: get_memory_ranges:- 00000001001ea000 - 00000001001eafff
[ 441.735568] kdump-tools[7548]: get_memory_ranges:- 0000000100220000 - 000000010022ffff
[ 441.735655] kdump-tools[7548]: get_memory_ranges:- 0000000100230000 - 000000010023ffff
[ 441.735737] kdump-tools[7548]: get_memory_ranges:- 0000000100240000 - 000000010024ffff
[ 441.735813] kdump-tools[7548]: get_memory_ranges:- 0000000100250000 - 000000010025ffff
[ 441.735885] kdump-tools[7548]: get_memory_ranges:- 0000000100260000 - 000000010026ffff
[ 441.735968] kdump-tools[7548]: get_memory_ranges:- 0000000100270000 - 000000010027ffff
[ 441.736117] kdump-tools[7548]: get_memory_ranges:- 0000000100280000 - 000000010028ffff
[ 441.736224] kdump-tools[7548]: get_memory_ranges:- 0000000100290000 - 000000010029ffff
[ 441.736306] kdump-tools[7548]: get_memory_ranges:- 00000001002a0000 - 00000001002affff
[ 441.736380] kdump-tools[7548]: get_memory_ranges:- 00000001002b0000 - 00000001002bffff
[ 441.736481] kdump-tools[7548]: get_memory_ranges:- 00000001002c0000 - 00000001002cffff
[ 441.736570] kdump-tools[7548]: get_memory_ranges:- 00000001002d0000 - 00000001002dffff
[ 441.736650] kdump-tools[7548]: get_memory_ranges:- 00000001002e0000 - 00000001002effff
[ 441.736737] kdump-tools[7548]: get_memory_ranges:- 00000001002f0000 - 00000001002fffff
[ 441.736815] kdump-tools[7548]: get_memory_ranges:- 0000000100300000 - 000000010030ffff
[ 441.736887] kdump-tools[7548]: get_memory_ranges:- 0000000100310000 - 000000010031ffff
[ 441.736957] kdump-tools[7548]: get_memory_ranges:- 0000000100320000 - 000000010032ffff
[ 441.737028] kdump-tools[7548]: get_memory_ranges:- 0000000276d50000 - 00000002776affff
[ 441.737096] kdump-tools[7548]: get_memory_ranges:- 000000082f400000 - 000000084ebfffff
[ 441.737240] kdump-tools[7548]: get_memory_ranges:- 000000084ed50000 - 000000084ed50fff
[ 441.737392] kdump-tools[7548]: get_...

summary: - kdump cannot generate coredump file on bluefield with 5.4 kernel
+ kdump cannot generate coredump file on bluefield with 5.4 and 5.15
+ kernel
Revision history for this message
Taihsiang Ho (tai271828) wrote :

Hi William,

Do you mind providing the output info of this comment for both bf-2/5.4 and bf-3/5.15?

cat /etc/mlnx-release

Revision history for this message
William Tu (wtu) wrote (last edit ):

Hi Taihsiang,
cat /root@bu-lab6-oob:~# cat /etc/mlnx-release
DOCA_2.5.0_BSP_4.5.0_Ubuntu_22.04-1.20231116.dev

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.