Host does not go degraded due to critical sensor event

Bug #1838020 reported by Eric MacDonald
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Eric MacDonald

Bug Description

Brief Description
-----------------
Long hostname support introduced a bug that causes the mtcAgent
to reject hardware monitor degrade requests due to the originating
service (daemon) not recognized.

Severity
--------
Major: Alarm raised but no degrade

Steps to Reproduce
------------------
create persistent critical sensor event

Expected Behavior
------------------
sensor alarm and host degrade

Actual Behavior
----------------
Sensor alarm but no degrade

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Any host with BMC provisioned.

Branch/Pull Time/Commit
-----------------------
July 26

Last Pass
---------
Before this update.

https://review.opendev.org/#/c/665969/

Timestamp/Logs
--------------
node_degrade_control :Swerr : controller-1 service not specified

Test Activity
-------------
Developer testing

Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to metal (master)

Fix proposed to branch: master
Review: https://review.opendev.org/672974

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.2.0 / high priority -- issue introduced by recent submission

Changed in starlingx:
importance: Undecided → High
tags: added: stx.2.0 stx.metal
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/672974
Committed: https://git.openstack.org/cgit/starlingx/metal/commit/?id=aefc81ec912583fb48f5eaefa4cbbacc438fce6f
Submitter: Zuul
Branch: master

commit aefc81ec912583fb48f5eaefa4cbbacc438fce6f
Author: Eric MacDonald <email address hidden>
Date: Fri Jul 26 09:02:08 2019 -0400

    Fix hardware monitor degrade event handling

    Long hostname support introduced a bug that causes the mtcAgent
    to reject hardware monitor degrade requests due to the originating
    service (daemon) not recognized.

    This update fixes the parsed parameters in mtcAgent and adds
    a sensor parm to the degrade API so that the sensor name
    accompanying the degrade event can be logged in mtcAgent.

    Test Plan: for hwmond degrade handling

    PASS: verify degrade assert and sensor name in mtcAgent degrade assert log
    PASS: Verify degrade clear handling and log

    Change-Id: I5c11cc5f679f21e6aadd4d5be25e6c08a241e80b
    Closes-Bug: 1838020
    Signed-off-by: Eric MacDonald <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.