StarlingX

Bug #1923879
Comment #3

Comment 3 for bug 1923879

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-04-15: Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/786329
Committed: https://opendev.org/starlingx/stx-puppet/commit/f46c154188b5d90bdd19ba2a5952b4f8c565d5d3
Submitter: "Zuul (22348)"
Branch: master

commit f46c154188b5d90bdd19ba2a5952b4f8c565d5d3
Author: Jim Somerville <email address hidden>
Date: Wed Apr 14 17:13:59 2021 -0400

kdump config remove intel eth drivers from ramdisk

    Problem:
    On a kernel crash, such as the watchdog timer firing, kexec
    tries booting the crash recovery kernel in order to capture
    a vmcore so that the issue can be debugged. This normally
    succeeds unless the platform has ice network hardware. Why?
    Because the crash recovery kernel has only a small amount of
    memory set aside for it, and the ice driver allocates enough
    memory to cause memory exhaustion. This causes the crash
    recovery kernel's startup to fail, leading to complete platform
    hang. In order to break out of the hang, one needs to manually
    do a hardware reset or power cycle.

    Solution:
    Change kdump.conf to leave the ice driver module out of the
    initramfs that is used by the crash recovery kernel. In
    fact, leave all of the intel ethernet drivers out since they
    are not needed and increase the risk of memory exhaustion.
    Upon changing kdump.conf, the kdump service is restarted to
    regenerate the initramfs.

    Verification:
    Install, check the kdump.conf file and unpack the initramfs file
    making sure that those modules are gone. Check controller,
    worker, and storage node types. Reboot node, make sure things
    behave as expected ie. no extra kdump.conf mangling and no
    unexpected kdump service restarts.
    Also crash a node with intel ethernet hardware on it and make
    sure it comes back up with a vmcore left in /var/log/crash.

    Change-Id: I9112f722cee8e199d94393bca887d3bb9bb89b39
    Closes-Bug: 1923879
    Signed-off-by: Jim Somerville <email address hidden>

Reviewed:  https://review.opendev.org/c/starlingx/stx-puppet/+/786329
Committed: https://opendev.org/starlingx/stx-puppet/commit/f46c154188b5d90bdd19ba2a5952b4f8c565d5d3
Submitter: "Zuul (22348)"
Branch:    master

commit f46c154188b5d90bdd19ba2a5952b4f8c565d5d3
Author: Jim Somerville <Jim.Somerville@windriver.com>
Date:   Wed Apr 14 17:13:59 2021 -0400

kdump config remove intel eth drivers from ramdisk
    
    Problem:
    On a kernel crash, such as the watchdog timer firing, kexec
    tries booting the crash recovery kernel in order to capture
    a vmcore so that the issue can be debugged. This normally
    succeeds unless the platform has ice network hardware. Why?
    Because the crash recovery kernel has only a small amount of
    memory set aside for it, and the ice driver allocates enough
    memory to cause memory exhaustion.  This causes the crash
    recovery kernel's startup to fail, leading to complete platform
    hang.  In order to break out of the hang, one needs to manually
    do a hardware reset or power cycle.
    
    Solution:
    Change kdump.conf to leave the ice driver module out of the
    initramfs that is used by the crash recovery kernel.  In
    fact, leave all of the intel ethernet drivers out since they
    are not needed and increase the risk of memory exhaustion.
    Upon changing kdump.conf, the kdump service is restarted to
    regenerate the initramfs.
    
    Verification:
    Install, check the kdump.conf file and unpack the initramfs file
    making sure that those modules are gone.  Check controller,
    worker, and storage node types.  Reboot node, make sure things
    behave as expected ie. no extra kdump.conf mangling and no
    unexpected kdump service restarts.
    Also crash a node with intel ethernet hardware on it and make
    sure it comes back up with a vmcore left in /var/log/crash.
    
    Change-Id: I9112f722cee8e199d94393bca887d3bb9bb89b39
    Closes-Bug: 1923879
    Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>