Comment 6 for bug 2031945

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/c/starlingx/metal/+/906748
Committed: https://opendev.org/starlingx/metal/commit/50dc29f6c025de0b9dfea3196cf3bedff8c36908
Submitter: "Zuul (22348)"
Branch: master

commit 50dc29f6c025de0b9dfea3196cf3bedff8c36908
Author: Eric Macdonald <email address hidden>
Date: Mon Sep 18 18:48:56 2023 +0000

    Improve maintenance power/reset control command retry handling

    This update improves on and drives consistency into the
    maintenance power on/off and reset handling in terms of
    retries and use of graceful and immediate commands.

    This update maintains the 10 retries for both power-on
    and power-off commands and increases the number of retries
    for the reset command from 5 to 10 to line up with the
    power operation commands.

    This update also ensures that the first 5 retries are done
    with the graceful action command while the last 5 are with
    the immediate.

    This update also removed a power on handling case that could
    have lead to a stuck state. This case was virtually impossible
    to hit based on the required sequence of intermittent command
    failures but that scenario handling was fixed up anyway.

    Issues have been seen with the power-off handling on some servers.
    Suspect that those servers need more time to power-off. So, this
    introduced a 30 seconds delay following a power-off command before
    issuing the power status query to give the server some time to
    power-off before retrying the power-off command.

    Test Plan: Both IPMI and Redfish

    PASS: Verify power on/off and reset handling support up to 10 retries
    PASS: Verify graceful command is used for the first power on/off
          or reset try and the first 5 retries
    PASS: Verify immediate command is used for the final 5 retries
    PASS: Verify reset handling with/without retries (none/mid/max)
    PASS: Verify power-on handling with/without retries (none/mid/max)
    PASS: Verify power-off handling with/without retries (none/mid/max)
    PASS: Verify power status command failure handling for power on/off
    NOTE: FIT (fault insertion testing) was used to create retry scenarios

    PASS: Verify power-off inter retry delay feature
    PASS: Verify 30 second power-off to power query delay
    PASS: Verify redfish power/reset commands used are logged by default
    PASS: Verify power-off/on and reset logging

    Regression:

    PASS: verify power-on/off and reset handling without retries
    PASS: Verify power-off handling when power is already off
    PASS: Verify power-on handling when power is already on

    Closes-Bug: 2031945
    Signed-off-by: Eric Macdonald <email address hidden>
    Change-Id: Ie39326bcb205702df48ff9dd090f461c7110dd36