vmcore file is not generated in /var/log/crash if file size is greater than 3G

Bug #1936976 reported by Eric MacDonald
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

The current crashDumpMgr service has several filesystem protection methods that result in the auto deletion of a crashdump vmcore file. One of which is a hard cap of 3Gi.

This size is too small for some applications.

This bug report requests an increase to that size.

Severity: Major
---------------
Unable to capture large vmcore crashdumps which can affect debug of field issues.

Steps to Reproduce:
------------------
Trigger crashdump with vmcore size greater than 3Gi

Expected Behavior
------------------
compressed vmcore file shows up in /var/log/crash

Actual Behavior
----------------
Oversized vmcore file is auto deleted

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Any system with large apps

Branch/Pull Time/Commit
-----------------------
July, 2021

Last Pass
---------
N/A

Timestamp/Logs
--------------
2021-07-13T15:46:37.000 controller-0 crashDumpMgr: notice max crash dump vmcore size is 3Gi (3221225472)
2021-07-13T15:46:37.000 controller-0 crashDumpMgr: notice max_size=3221225472
2021-07-13T15:46:37.000 controller-0 crashDumpMgr: notice managing /var/crash
2021-07-13T15:46:37.000 controller-0 crashDumpMgr: notice saving summary: /var/crash/127.0.0.1-2021-07-13-15:42:57_vmcore-dmesg.txt
2021-07-13T15:46:37.000 controller-0 crashDumpMgr: notice new vmcore detected (size:3796221106:3.6G) ; /var/log/crash avail:6991488000:6.6G
2021-07-13T15:46:37.000 controller-0 crashDumpMgr: notice deleting oversize (3.6G) vmcore file 127.0.0.1-2021-07-13-15:42:57
2021-07-13T15:46:37.000 controller-0 crashDumpMgr: notice removing /var/crash/127.0.0.1-2021-07-13-15:42:57

Test Activity
-------------
N/A

Workaround
----------
modify /usr/lib/systemd/system/crashDumpMgr.service

ExecStart=/etc/init.d/crashDumpMgr --max-size <bigger size>

     <bigger size> suggestion 3.5Gi, 4Gi or 4.5Gi

sudo systemctl daemon-reload

Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to metal (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/metal/+/801545

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/c/starlingx/metal/+/801545
Committed: https://opendev.org/starlingx/metal/commit/74bfeba7d38dddf5820b2e341f5d3c5e49282a9e
Submitter: "Zuul (22348)"
Branch: master

commit 74bfeba7d38dddf5820b2e341f5d3c5e49282a9e
Author: Eric MacDonald <email address hidden>
Date: Tue Jul 20 18:54:14 2021 -0400

    Increase maximum preserved crash dump vmcore file size to 5Gi

    The current crashDumpMgr service has several filesystem
    protection methods that can result in the auto deletion
    of a crashdump vmcore file. One is a hard cap of 3Gi.

    This max vmcore size is too small for some applications.
    Crash dump vmcore files can get big with servers that have
    a lot of memory and big apps.

    This update modifies the crashDumpMgr service file
    max_size override to 5Gi.

    Test Plan:

    PASS: Verify change functions as expected
    PASS: Verify change is inserted after patch apply
    PASS: Verify crash dump under-size threshold handling
    PASS: Verify crash dump over-size threshold handling
    PASS: Verify change is reverted after patch removal

    Change-Id: I867600460ba9311818ace466986603f5bffe4cd7
    Closes-Bug: 1936976
    Signed-off-by: Eric MacDonald <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.metal
tags: added: stx.6.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.