commit 4267d467869fcdbf2529bc1afa3d1c9958a2a6da
Author: Eric MacDonald <email address hidden>
Date: Tue Jun 23 09:23:51 2020 -0400
Force heartbeat period reset on mtcAgent process startup
In the case of a Multi Node Failure Avoidance (MNFA) event,
mtcAgent (Maintenance) sends a 'back-off' request to the
hbsAgent (Heartbeat) while there appears to be a networking
issue that affects a number of hosts.
This 'back-off' request tells the heartbeat service to slow
down by a factor of 4; what was say a 100 ms period would
change to a 400 ms period while in MNFA mode. When the MNFA
condition resolves the mtcAgent sends a heartbeat 'recovery'
command to the heartbeat service telling it to restore the
heartbeat interval back the configured interval.
However, if the mtcAgent process is 'restarted' while in
MNFA mode, the knowledge that the heartbeat service was
running at a reduced rate is lost and not restored.
This update forces the heartbeat rate to be set back to the
configured rate when the mtcAgent starts up.
Note that MNFA mode is not and should not be preserved over
mtcAgent process restart. If after restart a MNFA event.
Change-Id: I254ef86c453cb2d40cbeda859bd7477ac28942bc
Closes-Bug: 1884556
Signed-off-by: Eric MacDonald <email address hidden>
Reviewed: https:/ /review. opendev. org/737558 /git.openstack. org/cgit/ starlingx/ metal/commit/ ?id=4267d467869 fcdbf2529bc1afa 3d1c9958a2a6da
Committed: https:/
Submitter: Zuul
Branch: master
commit 4267d467869fcdb f2529bc1afa3d1c 9958a2a6da
Author: Eric MacDonald <email address hidden>
Date: Tue Jun 23 09:23:51 2020 -0400
Force heartbeat period reset on mtcAgent process startup
In the case of a Multi Node Failure Avoidance (MNFA) event,
mtcAgent (Maintenance) sends a 'back-off' request to the
hbsAgent (Heartbeat) while there appears to be a networking
issue that affects a number of hosts.
This 'back-off' request tells the heartbeat service to slow
down by a factor of 4; what was say a 100 ms period would
change to a 400 ms period while in MNFA mode. When the MNFA
condition resolves the mtcAgent sends a heartbeat 'recovery'
command to the heartbeat service telling it to restore the
heartbeat interval back the configured interval.
However, if the mtcAgent process is 'restarted' while in
MNFA mode, the knowledge that the heartbeat service was
running at a reduced rate is lost and not restored.
This update forces the heartbeat rate to be set back to the
configured rate when the mtcAgent starts up.
Note that MNFA mode is not and should not be preserved over
mtcAgent process restart. If after restart a MNFA event.
Change-Id: I254ef86c453cb2 d40cbeda859bd74 77ac28942bc
Closes-Bug: 1884556
Signed-off-by: Eric MacDonald <email address hidden>