Comment 3 for bug 1862065

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/706895
Committed: https://git.openstack.org/cgit/starlingx/metal/commit/?id=da7b2e94f136fa9faf7ff691f175b2ca9b1605b1
Submitter: Zuul
Branch: master

commit da7b2e94f136fa9faf7ff691f175b2ca9b1605b1
Author: Eric MacDonald <email address hidden>
Date: Mon Feb 10 10:15:56 2020 -0500

    Modify Mtce Reinstall FSM to first power-off BMC provisioned hosts

    This update only applies to servers that support and are provisioned
    for Board Management Control (BMC).

    The BMC of some servers silently reject the 'set next boot device',
    a command while it is executing BIOS.

    The current reinstall algorithm when the BMC is provisioned starts by
    detecting the power state of the target server. If the power is off
    it will 'first power it on' and then proceed to 'set next boot device'
    to pxe followed by a reset. For the initial power off state case, the
    timing of these operations is such that the server is in BIOS when the
    'set next boot device' command is issued.

    This update modifies the host reinstall algorithm to first power-off
    a server followed by setting the next boot device while the server is
    confirmed to be powered off, then powered on. This ensures the server
    gets and handles the set next boot device command operation properly.

    This update also fixes a race condition between the bmc_handler and
    power_handler by moving the final power state update in the power
    handler to the power done phase.

    Test Plan:

    Verify all new reinstall failure path handling via fault insertion testing
    Verify reinstall of powered off host
    Verify reinstall of powered on host
    Verify reinstall of Wildcat server with ipmi
    Verify reinstall of Supermicro server with ipmi and redfish
    Verify reinstall of Ironpass server with ipmi
    Verify reinstall of WolfPass server with redfish and ipmi
    Verify reinstall of Dell server with ipmi

    Over 30 reinstalls were performed across all server types, with initial
    power on and off using both ipmi and redfish (where supported).

    Change-Id: Iefb17e9aa76c45f2ceadf83f23b1231ae82f000f
    Closes-Bug: 1862065
    Signed-off-by: Eric MacDonald <email address hidden>