Comment 2 for bug 1884556

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/737558
Committed: https://git.openstack.org/cgit/starlingx/metal/commit/?id=4267d467869fcdbf2529bc1afa3d1c9958a2a6da
Submitter: Zuul
Branch: master

commit 4267d467869fcdbf2529bc1afa3d1c9958a2a6da
Author: Eric MacDonald <email address hidden>
Date: Tue Jun 23 09:23:51 2020 -0400

    Force heartbeat period reset on mtcAgent process startup

    In the case of a Multi Node Failure Avoidance (MNFA) event,
    mtcAgent (Maintenance) sends a 'back-off' request to the
    hbsAgent (Heartbeat) while there appears to be a networking
    issue that affects a number of hosts.

    This 'back-off' request tells the heartbeat service to slow
    down by a factor of 4; what was say a 100 ms period would
    change to a 400 ms period while in MNFA mode. When the MNFA
    condition resolves the mtcAgent sends a heartbeat 'recovery'
    command to the heartbeat service telling it to restore the
    heartbeat interval back the configured interval.

    However, if the mtcAgent process is 'restarted' while in
    MNFA mode, the knowledge that the heartbeat service was
    running at a reduced rate is lost and not restored.

    This update forces the heartbeat rate to be set back to the
    configured rate when the mtcAgent starts up.

    Note that MNFA mode is not and should not be preserved over
    mtcAgent process restart. If after restart a MNFA event.

    Change-Id: I254ef86c453cb2d40cbeda859bd7477ac28942bc
    Closes-Bug: 1884556
    Signed-off-by: Eric MacDonald <email address hidden>