Comment 5 for bug 1880578

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/761760
Committed: https://git.openstack.org/cgit/starlingx/metal/commit/?id=11960566125e395e2556af1719778d737d4b86e5
Submitter: Zuul
Branch: master

commit 11960566125e395e2556af1719778d737d4b86e5
Author: Eric MacDonald <email address hidden>
Date: Fri Nov 6 09:21:22 2020 -0500

    Disable Redfish BMC audit and improve reinstall failure handling

    The Mtce Reinstall Handler can collide with the BMC Redfish
    audit resulting in reinstall failure. BMC handler's 2 minute
    connection audit can colliding with other BMC commands.

    The reinstall handler, with 4 bmc command operations is
    particularly suseptable.

    Two additional bmc communication improvements are implemented:

    1. Add 'retry' handling to all BMC requests in the Maintenance
       Reinstall Handler FSM to handle transient command failures.

       Note: There are already retries to all but the power status
       query and the netboot requests in that handler and retries
       in other administrative commands that involve bmc requests.

    2. Switch BMC power control command management from 'static' to
       'learned' lists. Some BMCs don't support both graceful and
       immediate power commands; Graceful Restart and Force Restart.
       To remove the possibility of using an unsupported BMC command,
       this update switches from static to learned power command lists
       with log produced if a server is missing command support.

       Power commands escalate from graceful to immediate in the
       presence of retries.

    Test Cases:

    PASS: Verify bmc handler redfish audit is disabled
    PASS: Verify reinstall soak using redfish
    PASS: Verify reinstall netboot and power status retry handling
    PASS: Verify all power control commands using redfish
    PASS: Verify graceful operations are used if available
    PASS: Verify immediate operations are used for retries

    Regression:

    PASS: Verify bmc ping audit success and failure handling

    PASS: Verify Reset Handling soak (redfish and ipmi)
    PASS: Verify Power-Off/On Handling soak (redfish and ipmi)
    PASS: Verify Reinstall Handling soak (redfish and ipmi)
    PASS: Verify Standard System Install (redfish and ipmi)
    PASS: Verify AIO DX System Install (redfish and ipmi)

    PASS: Verify this update as a patch

    Change-Id: Idb484512ccb1b16e2d0ea9aff4ab7965347b1322
    Closes-Bug: 1880578
    Signed-off-by: Eric MacDonald <email address hidden>