Improve pacemaker configuration to react on events better

Bug #1517388 reported by Bogdan Dobrelya
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Bogdan Dobrelya

Bug Description

Current (default of 15min) value for the cluster-recheck-interval is in-optimal.

OpenStack HA guide [0] mentions that:
"Pacemaker uses an event-driven approach to cluster state processing. However, certain Pacemaker actions occur at a configurable interval, cluster-recheck-interval, which defaults to 15 minutes. It is usually prudent to reduce this to a shorter interval, such as 5 or 3 minutes."

Clusterlabs pacemaker guide [1] mentions that:
"If you rely on time based rules, it is essential that you set the cluster-recheck-interval option."

There is also interesting blogpost [2] about the cluster-recheck-interval and the failure-timeout correlation.

[0] http://docs.openstack.org/high-availability-guide/content/_setting_basic_cluster_properties.html
[1] http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-failure-migration.html#ftn.id718985
[2] http://blog.kennyrasschaert.be/blog/2013/12/18/pacemaker-high-failability/

Given that, and bearing in mind that Fuel configures the failure-timeout values for resources in the [30; 180] sec range,
the cluster-recheck-interval should be changed to the value which would be near to the 180 sec, like 190 or so.
That should make the pacemaker to process some events for defined resources much faster, up to x5 times basically.

Changed in fuel:
importance: Undecided → High
milestone: none → 8.0
assignee: nobody → Bogdan Dobrelya (bogdando)
tags: added: area-library ha pacemaker
Ilya Kutukov (ikutukov)
Changed in fuel:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/246912

Changed in fuel:
status: Confirmed → In Progress
Dmitry Pyzhov (dpyzhov)
tags: added: feature
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This is not a feature but a configuration change, this can be easily backported.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/246912
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=372b37123065b8a267c9ea9f2305c3ff404acff9
Submitter: Jenkins
Branch: master

commit 372b37123065b8a267c9ea9f2305c3ff404acff9
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Nov 18 13:14:00 2015 +0100

    Configure corosync cluster-recheck-interval

    Add the cluster_recheck_interval param with
    a 190 sec default, configurable via hiera.
    Add rspec and noop tests.

    Closes-bug: #1517388

    Change-Id: I82b7220e24282d1fbda69ed7d3788e2bdc0afcfc
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.