Comment 2 for bug 1939172

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to monitoring (master)

Reviewed: https://review.opendev.org/c/starlingx/monitoring/+/803803
Committed: https://opendev.org/starlingx/monitoring/commit/ecd744ba0a37a69a0c048d0b2fad098147450f7e
Submitter: "Zuul (22348)"
Branch: master

commit ecd744ba0a37a69a0c048d0b2fad098147450f7e
Author: John Kung <email address hidden>
Date: Fri Aug 6 14:49:43 2021 -0500

    Handle kube ApiException during collectd platform monitoring

    During stress test/high platform load it is possible that the
    kube-apiserver responds with an kube ApiException.

    As platform monitoring of cpu and memory should not be affected by
    unresponsive kube-api server, allow the kube ApiException to be handled
    and the remaining platform resource utilization monitoring to proceed.

    This could help identify the issue by allowing the raise of
    the platform alarm (e.g. 100.101 Platform CPU threshold exceeded,
    100.103 Memory threshold exceeded).

    Verfied:
      o Platform CPU Alarm is raised with stress test
      o Platform CPU Alarm is raised with stress test
        and intermittent ApiException
      o Memory Alarm is raised with stress test
      o Memory Alarm is raised with stress test
        and intermittent ApiException
      o the above alarm conditions are cleared after
        debounce when stress condition is removed

    Closes-Bug: 1939172
    Signed-off-by: John Kung <email address hidden>
    Change-Id: I2c9c39a390af1d7ae752ad00db18384479cf6e99