crash kernel sometimes fails to boot with Mellanox nics

Bug #1957938 reported by Jim Somerville
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jim Somerville

Bug Description

This is just an extension of bug 1923879 where the issue was seen with Intel ice network cards and we solved it by adding all of the intel network drivers to the omit list. We should have also added the Mellanox network drivers to that omit list at the same time.

Please see the template in https://bugs.launchpad.net/starlingx/+bug/1923879

Changed in starlingx:
assignee: nobody → Jim Somerville (jsomervi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/824771

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.7.0 stx.config
Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/824771
Committed: https://opendev.org/starlingx/stx-puppet/commit/89d75c5954a2ea8af7c04fb7601a891afeb6007d
Submitter: "Zuul (22348)"
Branch: master

commit 89d75c5954a2ea8af7c04fb7601a891afeb6007d
Author: Jim Somerville <email address hidden>
Date: Fri Jan 14 13:02:21 2022 -0500

    kdump config remove Mellanox eth drivers from ramdisk

    This is just an extension of commit:

    f46c154188b5d90bdd19ba2a5952b4f8c565d5d3
    kdump config remove intel eth drivers from ramdisk

    where we removed all of the Intel ethernet drivers
    from the kdump ramdisk. They were chewing up scarce
    memory and are not needed to dump a vmcore. We should
    have also removed the Mellanox network drivers at the
    same time. There are some reports of nodes containing
    Mellanox cards failing to dump the vmcore after
    a kernel crash.

    Verification:
    - ensure that /etc/kdump.conf now has the omit line
      containing the Mellanox drivers
    - on plaform containing Mellanox cards, force a
      crash dump via echo c >/proc/sysrq-trigger . On the
      serial console, watch the crash kernel boot up and
      observe that the Mellanox hardware is no longer
      initialized by a driver. Watch that the vmcore dump
      completes successfully.

    Change-Id: I721fef3e5e4769d821d3146c126b0ec908beed75
    Closes-Bug: 1957938
    Signed-off-by: Jim Somerville <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.