Containers: collectd core dump file generated during initial system setup

Bug #1819473 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

Brief Description
-----------------
A coredump file was generated during system setup

Severity
--------
Major

Steps to Reproduce
------------------
After config_controller is done, running system cmd

Expected Behaviour
------------------
Done without coredump

Actual Behaviour
----------------
coredump file generated

Reproducibility
---------------
Intermittent

System Configuration
--------------------
One node system

Branch/Pull Time/Commit
-----------------------
master as of 20190305T060000Z

Timestamp/Logs
--------------
2019-03-07 05:58:20 [admin@admin]> RUNNING: system modify --name=yow-cgcs-supermicro-2 --description="yow-cgcs-supermicro-2:

'controller-0': ['-rw-r----- 1 root root 7015872 2019-07-03_05-58-56 core.collectd.0.27ebbafc566b4897bc41e33b73c75168.19967.1551938307000000.xz']}

Time stamp showed "system modify" CMD ran right before coredump file was generated.

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; requires further investigation. Issue appears to be intermittent.

Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
status: New → Triaged
importance: Undecided → Medium
summary: - Containers: core dump file generated during initial system setup
+ Containers: collectd core dump file generated during initial system
+ setup
tags: added: stx.2019.05 stx.metal
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-fault (master)

Fix proposed to branch: master
Review: https://review.openstack.org/649713

Changed in starlingx:
status: Triaged → In Progress
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-fault (master)

Reviewed: https://review.openstack.org/649713
Committed: https://git.openstack.org/cgit/openstack/stx-fault/commit/?id=7ae75e24c8c425bb1b2741a55d203c5069072990
Submitter: Zuul
Branch: master

commit 7ae75e24c8c425bb1b2741a55d203c5069072990
Author: Eric MacDonald <email address hidden>
Date: Wed Apr 3 15:50:41 2019 -0400

    Protect FM API shared data with thread locking

    Collectd runs all its python plugins concurrently.

    With the expansion of collectd python plugins due
    to the obsolesence of rmon collectd core dumps are
    being reported during collectd startup when the FM
    service on the controller is not running.

    Debug of the issue revealed that the core dumps are
    due to having no mutex around FM API's shared data.

    The required mutex is provided by this update by
    adding a while locked expression to the start of
    each API.

    Also fixed 3 pep8 errors.

    Closes-Bug: 1819473

    Test Plan:
    PASS: Test before and after cases to confirm that without
          the change we see core dumps but with the change the
          API and collectd plugin behavior is correct without
          the core dumps.
    PASS: System install with current collectd plugins and fm's
          python API enhanced with locking.
    PASS: Have sm stop managing the fmManager process, kill it
          and then restart collectd over and over.
          Should not see any collectd core dumps.
    PASS: Verify nfv alarming still works

    Change-Id: I3d5ef0bd9cb774299b4c0f3b9e33cddb7c0f776c
    Signed-off-by: Eric MacDonald <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

We did not see this issue recently.

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.