commit ecd744ba0a37a69a0c048d0b2fad098147450f7e
Author: John Kung <email address hidden>
Date: Fri Aug 6 14:49:43 2021 -0500
Handle kube ApiException during collectd platform monitoring
During stress test/high platform load it is possible that the
kube-apiserver responds with an kube ApiException.
As platform monitoring of cpu and memory should not be affected by
unresponsive kube-api server, allow the kube ApiException to be handled
and the remaining platform resource utilization monitoring to proceed.
This could help identify the issue by allowing the raise of
the platform alarm (e.g. 100.101 Platform CPU threshold exceeded,
100.103 Memory threshold exceeded).
Verfied:
o Platform CPU Alarm is raised with stress test
o Platform CPU Alarm is raised with stress test
and intermittent ApiException
o Memory Alarm is raised with stress test
o Memory Alarm is raised with stress test
and intermittent ApiException
o the above alarm conditions are cleared after
debounce when stress condition is removed
Closes-Bug: 1939172
Signed-off-by: John Kung <email address hidden>
Change-Id: I2c9c39a390af1d7ae752ad00db18384479cf6e99
Reviewed: https:/ /review. opendev. org/c/starlingx /monitoring/ +/803803 /opendev. org/starlingx/ monitoring/ commit/ ecd744ba0a37a69 a0c048d0b2fad09 8147450f7e
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit ecd744ba0a37a69 a0c048d0b2fad09 8147450f7e
Author: John Kung <email address hidden>
Date: Fri Aug 6 14:49:43 2021 -0500
Handle kube ApiException during collectd platform monitoring
During stress test/high platform load it is possible that the
kube-apiserver responds with an kube ApiException.
As platform monitoring of cpu and memory should not be affected by
unresponsive kube-api server, allow the kube ApiException to be handled
and the remaining platform resource utilization monitoring to proceed.
This could help identify the issue by allowing the raise of
the platform alarm (e.g. 100.101 Platform CPU threshold exceeded,
100.103 Memory threshold exceeded).
Verfied:
o Platform CPU Alarm is raised with stress test
o Platform CPU Alarm is raised with stress test
and intermittent ApiException
o Memory Alarm is raised with stress test
o Memory Alarm is raised with stress test
and intermittent ApiException
o the above alarm conditions are cleared after
debounce when stress condition is removed
Closes-Bug: 1939172 7ae752ad00db183 84479cf6e99
Signed-off-by: John Kung <email address hidden>
Change-Id: I2c9c39a390af1d