commit d863aea1729f0aa6cb17bbf8aec5f9fd3e9cc371
Author: Eric MacDonald <email address hidden>
Date: Fri Jun 16 18:11:53 2023 +0000
Increase mtce host offline threshold to handle slow host shutdown
Mtce polls/queries the remote host for mtcAlive messages
for 42 x 100 ms intervals over unlock or host failed cases.
Absence of mtcAlive during this (~5 sec) period indicates
the node is offline.
However, in the rare case where shutdown is slow, 5 seconds
is not long enough. Rare cases have been seen where 7 or 8
second wait time is required to properly declare offline.
To avoid the rare transient 200.004 host alarm over an
unlock operation, this update increases the mtce host
offline window from 5 to 10 seconds (approx) by modifying
the mtce configuration file offline threshold from 42 to 90.
Test Plan:
PASS: Verify unchallenged failed to offline period to be ~10 secs
PASS: Verify algorithm restarts if there is mtcAlive received
anytime during the polls/queries (challenge) window.
PASS: Verify challenge handling leads to a longer but successful offline declaration.
PASS: Verify above handling for both unlock and spontaneous
failure handling cases.
Closes-Bug: 2024249
Change-Id: Ice41ed611b4ba71d9cf8edbfe98da4b65dcd05cf
Signed-off-by: Eric MacDonald <email address hidden>
Reviewed: https:/ /review. opendev. org/c/starlingx /metal/ +/886289 /opendev. org/starlingx/ metal/commit/ d863aea1729f0aa 6cb17bbf8aec5f9 fd3e9cc371
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit d863aea1729f0aa 6cb17bbf8aec5f9 fd3e9cc371
Author: Eric MacDonald <email address hidden>
Date: Fri Jun 16 18:11:53 2023 +0000
Increase mtce host offline threshold to handle slow host shutdown
Mtce polls/queries the remote host for mtcAlive messages
for 42 x 100 ms intervals over unlock or host failed cases.
Absence of mtcAlive during this (~5 sec) period indicates
the node is offline.
However, in the rare case where shutdown is slow, 5 seconds
is not long enough. Rare cases have been seen where 7 or 8
second wait time is required to properly declare offline.
To avoid the rare transient 200.004 host alarm over an
unlock operation, this update increases the mtce host
offline window from 5 to 10 seconds (approx) by modifying
the mtce configuration file offline threshold from 42 to 90.
Test Plan:
PASS: Verify unchallenged failed to offline period to be ~10 secs
successful offline declaration.
PASS: Verify algorithm restarts if there is mtcAlive received
anytime during the polls/queries (challenge) window.
PASS: Verify challenge handling leads to a longer but
PASS: Verify above handling for both unlock and spontaneous
failure handling cases.
Closes-Bug: 2024249 1d9cf8edbfe98da 4b65dcd05cf
Change-Id: Ice41ed611b4ba7
Signed-off-by: Eric MacDonald <email address hidden>