[Ubuntu-24.04] FADump with recommended crash size is making the L1 hang

Bug #2060039 reported by bugproxy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Confirmed
High
Ubuntu on IBM Power Systems Bug Triage
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Problem description :
======================

Triggered FADump with the recommended crash. L1 host got hung.

As per the public document https://wiki.ubuntu.com/ppc64el/Recommendations recommended crash kernel size is 1024M for the system. But with 1024M and 2048M, the L1 is getting hanged. with 4096, crash is generated and collected.

root@ubuntu2404:~# uname -ar
Linux ubuntu2404 6.8.0-11-generic #11-Ubuntu SMP Wed Feb 14 00:33:03 UTC 2024 ppc64le ppc64le ppc64le GNU/Linux

root@ubuntu2404:~# free -h
               total used free shared buff/cache available
Mem: 48Gi 1.7Gi 46Gi 13Mi 687Mi 46Gi
Swap: 8.0Gi 0B 8.0Gi

root@ubuntu2404:~# cat /proc/cmdline
BOOT_IMAGE=/vmlinux-6.8.0-11-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro fadump=on crashkernel=1024M

root@ubuntu2404:~# dmesg | grep -i reser
[ 0.000000] fadump: Reserved 1024MB of memory at 0x00000040000000 (System RAM: 51200MB)
[ 0.000000] fadump: Initialized 0x40000000 bytes cma area at 1024MB from 0x40070000 bytes of memory reserved for firmware-assisted dump
[ 0.000000] Memory: 49316672K/52428800K available (23616K kernel code, 4096K rwdata, 25536K rodata, 8832K init, 2487K bss, 2063552K reserved, 1048576K cma-reserved)
[ 0.396408] ibmvscsi 30000066: Client reserve enabled

root@ubuntu2404:~# kdump-config show
DUMP_MODE: fadump
USE_KDUMP: 1
KDUMP_COREDIR: /var/crash
   /var/lib/kdump/vmlinuz
kdump initrd:
   /var/lib/kdump/initrd.img
current state: ready to fadump

IBM is looking to update the crash kernel reservations section of the wiki for Power.

Revision history for this message
bugproxy (bugproxy) wrote : Console logs

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-205921 severity-high targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : dmesg log

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : grub file

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2024-04-08 07:09 EDT-------
CONFIG_KFENCE is the config option that is increasing the memory requirement
significantly for radix MMU.

For radix MMU case, memory is direct mapped with 2MB size.
But when KFENCE is used, direct mapping is done at pagesize granularity (64K).
This sharply increased the page table allocation size, on a 100GB system,
from ~23MB to ~3223MB with CONFIG_KFENCE. Experimenting to find if this
memory requirement can be brought down. If not, thinking of disabling KFENCE
for dump capture environment.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-04-24 12:23 EDT-------
Posted the patches to reduce memory consumption for KFENCE upstream:

https://<email address hidden>/
("[PATCH 1/2] radix/kfence: map __kfence_pool at page granularity")

https://<email address hidden>/
("[PATCH 2/2] radix/kfence: support late __kfence_pool allocation")

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-06-19 06:54 EDT-------
Posted v2 upstream:

https://<email address hidden>/
("[PATCH v2] radix/kfence: map __kfence_pool at page granularity")

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
importance: Undecided → High
status: New → Confirmed
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.