Comment 4 for bug 1793314

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-integ (master)

Reviewed: https://review.openstack.org/604183
Committed: https://git.openstack.org/cgit/openstack/stx-integ/commit/?id=5142fac49806c8b823c50be52119c878841f0955
Submitter: Zuul
Branch: master

commit 5142fac49806c8b823c50be52119c878841f0955
Author: Eric MacDonald <email address hidden>
Date: Thu Sep 20 14:21:32 2018 -0400

    Make collectd alarm notifier retry alarm clear attempts that fail

    The Starling-X collectd alarm notification handler Fault Manager (FM)
    call to clear an alarm can lead to a stuck alarm if that FM request
    fails, say due to a concurrent swact operation, and the clear is not
    retried.

    The alarm will remain stuck until there is another same alarm assertion,
    followed by deassertion that leads to a successful clear.

    The fix is to execute a 'return' in the alarm clear failure path so
    that the alarm notifier's alarm manager control structure is not
    updated with the clear state so that the clear will be automatically
    retried on the next audit interval.

    Change-Id: Iddf4e0e7b99eab0bf0748230a25851419e7c06fa
    Closes-Bug: 1793314
    Signed-off-by: Eric MacDonald <email address hidden>