Clear bmc alarm over mtcAgent process restart for ALL system types

Bug #1931906 reported by Eric MacDonald
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Eric MacDonald

Bug Description

Brief Description
-----------------
The BMC access alarm is not cleared over a process restart in AIO SX yet the BMC can be provisioned for this system type.

Severity
--------
Minor ; alarm is cleared by the alarm audit following the add handler

Steps to Reproduce
------------------
provision BMC for AIO SX ; cause access failure alarm and then remove failure mode over the mtcAgent process restart.

Expected Behavior
------------------
alarm is cleared in the add handler like all other node types

Actual Behavior
----------------
BMC alarm is cleared by the alarm audit following the add handler.

Reproducibility
---------------
Reproducible with the repro recipe

System Configuration
--------------------
AIO SX with BMC provisioned

Branch/Pull Time/Commit
-----------------------
stx 4.0

Last Pass
---------
Not tested, is a unlikely double failure mode case

Timestamp/Logs
--------------

2021-06-14T20:10:45.394 [636230.00109] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (6469) add_handler : Info : controller-0 Host Add Completed (uptime:4694)
2021-06-14T20:10:45.729 [636230.00110] controller-0 mtcAgent alm mtcAlarm.cpp ( 479) mtcAlarm_audit : Info : controller-0 200.010 alarm mismatch ; warning -> clear
2021-06-14T20:10:45.729 [636230.00111] controller-0 mtcAgent alm mtcAlarm.cpp ( 558) mtcAlarm_clear : Info : controller-0 clearing 'Board Management Controller Access' alarm (200.010)

expect this

2021-06-14T20:27:43.405 [758553.00108] controller-0 mtcAgent hbs nodeClass.cpp (4492) set_bm_prov : Info : controller-0 bmc provisioned
2021-06-14T20:27:43.416 [758553.00109] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (6519) bmc_handler : Info : controller-0 bmc handler is waiting on bmc password
2021-06-14T20:27:43.416 [758553.00110] controller-0 mtcAgent alm mtcAlarm.cpp ( 558) mtcAlarm_clear : Info : controller-0 clearing 'Board Management Controller Access' alarm (200.010)
2021-06-14T20:27:43.416 [758553.00111] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (6469) add_handler : Info : controller-0 Host Add Completed (uptime:5712)

Test Activity
-------------
Developer Testing

Workaround
----------
None and none required

Tags: stx.metal
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to metal (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/metal/+/796318

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/c/starlingx/metal/+/796318
Committed: https://opendev.org/starlingx/metal/commit/fd5dd4254a5d3806b34442b7bbced5082c0b7bd1
Submitter: "Zuul (22348)"
Branch: master

commit fd5dd4254a5d3806b34442b7bbced5082c0b7bd1
Author: Eric MacDonald <email address hidden>
Date: Mon Jun 14 16:46:41 2021 -0400

    Clear bmc alarm over mtcAgent process restart for ALL system types

    If a host's BMC is provisioned and the mtcAgent process
    is restarted then remove the gating condition that avoids
    clearing the BMC access alarm in AIO SX.

    Change-Id: I0734c2203a7acaee27c40c3c0d259b4cc5726b5d
    Closes-Bug: 1931906
    Signed-off-by: Eric MacDonald <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
assignee: nobody → Eric MacDonald (rocksolidmtce)
tags: added: stx.metal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.