Mtce fails to learn Redfish reset/power actions from some servers

Bug #1992286 reported by Eric MacDonald
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

Brief Description
-----------------
StarlingX Maintenance supports host power and reset control through both IPMI and Redfish Platform Management protocols when the host's BMC (Board Management Controller) is provisioned.

The power and reset action commands for Redfish are learned through the HTTP payload annotations at the Systems level ; "/redfish/v1/Systems.

The existing maintenance implementation only supports the "<email address hidden>" payload property annotation at the #ComputerSystem.Reset Actions property level.

However, the Redfish schema also supports an "ActionInfo" extension for the #ComputerSystem.Reset property
at /redfish/v1/Systems/1/ResetActionInfo.

For more information refer to the section 6.3 ActionInfo 1.3.0 of the Redfish Data Model Specification ; https://www.dmtf.org/sites/default/files/standards/documents/DSP0268_2022.2.pdf

StarlingX Maintenance is unable to learn the reset and power control commands from servers that publish them through the "@Redfish.ActionInfo extension for Reset.

This bug report requests StarlingX maintenance redfish platform management for reset and power control be updated to support this alternate ActionInfo extension.

Severity
--------
Minor: IPMI can be used instead

Steps to Reproduce
------------------
Provision BMC against a server that published reset and power control actions through ActionInfo

Expected Behavior
------------------
Maintenance is able to learn the server's reset and power control action commands through Redfish on all servers that are redfish compliant.

Actual Behavior
----------------
Maintenance is unable to learn the reset and power control action commands from servers that publish them through ActionInfo.

Reproducibility
---------------
100% reproducible for servers that publish them through ActionInfo.

System Configuration
--------------------
Any system config with hosts that support Redfish

Branch/Pull Time/Commit
-----------------------
Any date until this bug report is resolved.

Last Pass
---------
Never

Timestamp/Logs
--------------
2021-11-02T14:24:55.192 [114801.00189] controller-0 mtcAgent — redfishUtil.cpp ( 549) redfishUtil_get_bmc_info:Error : controller-1 actions list get failed ;
[ { "target": "\/redfish\/v1\/Systems\/1\/Actions\/ComputerSystem.Reset", "@Redfish.ActionInfo": "\/redfish\/v1\/Systems\/1\/ResetActionInfo" } ]

Test Activity
-------------
Issue Debug

Workaround
----------
Use IPMI

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to metal (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/metal/+/861114

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)
Download full text (3.3 KiB)

Reviewed: https://review.opendev.org/c/starlingx/metal/+/861114
Committed: https://opendev.org/starlingx/metal/commit/3f4c2cbb45ca652d1d6876c126f6699148befdbc
Submitter: "Zuul (22348)"
Branch: master

commit 3f4c2cbb45ca652d1d6876c126f6699148befdbc
Author: Eric MacDonald <email address hidden>
Date: Wed Oct 12 21:48:26 2022 +0000

    Mtce: Add ActionInfo extension support for reset operations.

    StarlingX Maintenance supports host power and reset control through
    both IPMI and Redfish Platform Management protocols when the host's
    BMC (Board Management Controller) is provisioned.

    The power and reset action commands for Redfish are learned through
    HTTP payload annotations at the Systems level; "/redfish/v1/Systems.

    The existing maintenance implementation only supports the
    "<email address hidden>" payload property annotation at
    the #ComputerSystem.Reset Actions property level.

    However, the Redfish schema also supports an 'ActionInfo' extension
    at /redfish/v1/Systems/1/ResetActionInfo.

    This update adds support for the 'ActionInfo' extension for Reset
    and power control command learning.

    For more information refer to the section 6.3 ActionInfo 1.3.0 of
    the Redfish Data Model Specification link in the launchpad report.

    Test Plan:

    PASS: Verify CentOS build and patch install.
    PASS: Verify Debian build and ISO install.
    PASS: Verify with Debian redfishtool 1.1.0 and 1.5.0
    PASS: Verify reset/power control cmd load from newly added second
          level query from ActionInfo service.

    Failure Handling: Significant failure path testing with this update

    PASS: Verify Redfish protocol is periodically retried from start
          when bm_type=redfish fails to connect.
    PASS: Verify BMC access protocol defaults to IPMI when
          bm_type=dynamic but failed connect using redfish.
          Connection failures in the above cases include
          - redfish bmc root query fails
          - redfish bmc info query fails
          - redfish bmc load power/reset control actions fails
          - missing second level Parameters label list
          - missing second level AllowableValues label list
    PASS: Verify sensor monitoring is relearned to ipmi from failed and
          retried with bm_type=redfish after switch to bm_type=dynamic
          or bm_type=ipmi by sysinv update command.

    Regression:

    PASS: Verify with CentOS redfishtool 1.1.0
    PASS: Verify switch back and forth between ipmi and redfish using
          update bm_type=ipmi and bm_type=redfish commands
    PASS: Verify switch from ipmi to redfish usinf bm_type=dynamic for
          hosts that support redfish
    PASS: Verify redfish protocol is preferred in bm_type=dynamic mode
    PASS: Verify IPMI sensor monitoring when bm_type=ipmi
    PASS: Verify IPMI sensor monitoring when bm_type=dynamic
          and redfish connect fails.
    PASS: Verify redfish sensor event assert/clear handling with
          alarm and degrade condition for both IPMI and redfish.
    PASS: Verify reset/power command learn by single level query.
...

Read more...

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
importance: Undecided → Medium
tags: added: stx.8.0 stx.metal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.