failcounts never expire

Bug #1802310 reported by Andrea Ieri
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Fix Released
Medium
Unassigned

Bug Description

failure-timeout is currently unset in any monitored resource. This causes failcounts to accumulate indefinitely, even if the error condition that triggered them has long disappeared.
I think we should set a reasonable default expiration timeout, such as 1 minute.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Agree. I think failure-timeout as a new charm config option would make sense. We should have more conversation though about the default value.

https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_failure_response.html

Changed in charm-hacluster:
importance: Undecided → Medium
David Ames (thedac)
Changed in charm-hacluster:
status: New → Triaged
milestone: none → 19.04
Revision history for this message
Andrea Ieri (aieri) wrote :

Here is a patch for setting failure-timeout to 3 minutes by default: https://review.openstack.org/#/c/624720/

David Ames (thedac)
Changed in charm-hacluster:
milestone: 19.04 → 19.07
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-hacluster (master)

Reviewed: https://review.opendev.org/624720
Committed: https://git.openstack.org/cgit/openstack/charm-hacluster/commit/?id=e28f8a9adc9f8ac1b3c1462d4f0724a4ddbc2257
Submitter: Zuul
Branch: master

commit e28f8a9adc9f8ac1b3c1462d4f0724a4ddbc2257
Author: Andrea Ieri <email address hidden>
Date: Wed Dec 12 15:32:59 2018 +0100

    Enable custom failure-timeout configuration

    As explained here[0], setting failure-timeout means that the cib will 'forget'
    that a resource agent action failed by setting failcount to 0:
    - if $failure-timeout seconds have elapsed from the last failure
    - if an event wakes up the policy engine (i.e. at the global resource
      recheck in an idle cluster)

    By default the failure-timeout will be set to 0, which disables the feature,
    however this change allows for tuning.

    [0] https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#_failure_response

    Change-Id: Ia958a8c5472547c7cf0cb4ecd7e70cb226074b88
    Closes-Bug: #1802310

Changed in charm-hacluster:
status: Triaged → Fix Committed
David Ames (thedac)
Changed in charm-hacluster:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.