kdump config remove intel eth drivers from ramdisk
Problem:
On a kernel crash, such as the watchdog timer firing, kexec
tries booting the crash recovery kernel in order to capture
a vmcore so that the issue can be debugged. This normally
succeeds unless the platform has ice network hardware. Why?
Because the crash recovery kernel has only a small amount of
memory set aside for it, and the ice driver allocates enough
memory to cause memory exhaustion. This causes the crash
recovery kernel's startup to fail, leading to complete platform
hang. In order to break out of the hang, one needs to manually
do a hardware reset or power cycle.
Solution:
Change kdump.conf to leave the ice driver module out of the
initramfs that is used by the crash recovery kernel. In
fact, leave all of the intel ethernet drivers out since they
are not needed and increase the risk of memory exhaustion.
Upon changing kdump.conf, the kdump service is restarted to
regenerate the initramfs.
Verification:
Install, check the kdump.conf file and unpack the initramfs file
making sure that those modules are gone. Check controller,
worker, and storage node types. Reboot node, make sure things
behave as expected ie. no extra kdump.conf mangling and no
unexpected kdump service restarts.
Also crash a node with intel ethernet hardware on it and make
sure it comes back up with a vmcore left in /var/log/crash.
Change-Id: I9112f722cee8e199d94393bca887d3bb9bb89b39
Closes-Bug: 1923879
Signed-off-by: Jim Somerville <email address hidden>
Reviewed: https:/ /review. opendev. org/c/starlingx /stx-puppet/ +/786329 /opendev. org/starlingx/ stx-puppet/ commit/ f46c154188b5d90 bdd19ba2a5952b4 f8c565d5d3
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit f46c154188b5d90 bdd19ba2a5952b4 f8c565d5d3
Author: Jim Somerville <email address hidden>
Date: Wed Apr 14 17:13:59 2021 -0400
kdump config remove intel eth drivers from ramdisk
Problem:
On a kernel crash, such as the watchdog timer firing, kexec
tries booting the crash recovery kernel in order to capture
a vmcore so that the issue can be debugged. This normally
succeeds unless the platform has ice network hardware. Why?
Because the crash recovery kernel has only a small amount of
memory set aside for it, and the ice driver allocates enough
memory to cause memory exhaustion. This causes the crash
recovery kernel's startup to fail, leading to complete platform
hang. In order to break out of the hang, one needs to manually
do a hardware reset or power cycle.
Solution:
Change kdump.conf to leave the ice driver module out of the
initramfs that is used by the crash recovery kernel. In
fact, leave all of the intel ethernet drivers out since they
are not needed and increase the risk of memory exhaustion.
Upon changing kdump.conf, the kdump service is restarted to
regenerate the initramfs.
Verification:
Install, check the kdump.conf file and unpack the initramfs file
making sure that those modules are gone. Check controller,
worker, and storage node types. Reboot node, make sure things
behave as expected ie. no extra kdump.conf mangling and no
unexpected kdump service restarts.
Also crash a node with intel ethernet hardware on it and make
sure it comes back up with a vmcore left in /var/log/crash.
Change-Id: I9112f722cee8e1 99d94393bca887d 3bb9bb89b39
Closes-Bug: 1923879
Signed-off-by: Jim Somerville <email address hidden>