StarlingX

Bug #1884556
Comment #2

Comment 2 for bug 1884556

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-24: Fix merged to metal (master)

Reviewed: https://review.opendev.org/737558
Committed: https://git.openstack.org/cgit/starlingx/metal/commit/?id=4267d467869fcdbf2529bc1afa3d1c9958a2a6da
Submitter: Zuul
Branch: master

commit 4267d467869fcdbf2529bc1afa3d1c9958a2a6da
Author: Eric MacDonald <email address hidden>
Date: Tue Jun 23 09:23:51 2020 -0400

Force heartbeat period reset on mtcAgent process startup

    In the case of a Multi Node Failure Avoidance (MNFA) event,
    mtcAgent (Maintenance) sends a 'back-off' request to the
    hbsAgent (Heartbeat) while there appears to be a networking
    issue that affects a number of hosts.

    This 'back-off' request tells the heartbeat service to slow
    down by a factor of 4; what was say a 100 ms period would
    change to a 400 ms period while in MNFA mode. When the MNFA
    condition resolves the mtcAgent sends a heartbeat 'recovery'
    command to the heartbeat service telling it to restore the
    heartbeat interval back the configured interval.

    However, if the mtcAgent process is 'restarted' while in
    MNFA mode, the knowledge that the heartbeat service was
    running at a reduced rate is lost and not restored.

This update forces the heartbeat rate to be set back to the
configured rate when the mtcAgent starts up.

Note that MNFA mode is not and should not be preserved over
mtcAgent process restart. If after restart a MNFA event.

    Change-Id: I254ef86c453cb2d40cbeda859bd7477ac28942bc
    Closes-Bug: 1884556
    Signed-off-by: Eric MacDonald <email address hidden>