Fuel for OpenStack

Improve pacemaker configuration to react on events better

Bug #1517388 reported by Bogdan Dobrelya on 2015-11-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Bogdan Dobrelya	Fuel for OpenStack 8.0

Bug Description

Current (default of 15min) value for the cluster-recheck-interval is in-optimal.

OpenStack HA guide [0] mentions that:
"Pacemaker uses an event-driven approach to cluster state processing. However, certain Pacemaker actions occur at a configurable interval, cluster-recheck-interval, which defaults to 15 minutes. It is usually prudent to reduce this to a shorter interval, such as 5 or 3 minutes."

Clusterlabs pacemaker guide [1] mentions that:
"If you rely on time based rules, it is essential that you set the cluster-recheck-interval option."

There is also interesting blogpost [2] about the cluster-recheck-interval and the failure-timeout correlation.

[0] http://docs.openstack.org/high-availability-guide/content/_setting_basic_cluster_properties.html
[1] http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-failure-migration.html#ftn.id718985
[2] http://blog.kennyrasschaert.be/blog/2013/12/18/pacemaker-high-failability/

Given that, and bearing in mind that Fuel configures the failure-timeout values for resources in the [30; 180] sec range,
the cluster-recheck-interval should be changed to the value which would be near to the 180 sec, like 190 or so.
That should make the pacemaker to process some events for defined resources much faster, up to x5 times basically.

Tags:

Bogdan Dobrelya (bogdando) on 2015-11-18

Changed in fuel:
importance:	Undecided → High
milestone:	none → 8.0
assignee:	nobody → Bogdan Dobrelya (bogdando)
tags:	added: area-library ha pacemaker

Ilya Kutukov (ikutukov) on 2015-11-18

Changed in fuel:
status:	New → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-18: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/246912

Changed in fuel:
status:	Confirmed → In Progress

Dmitry Pyzhov (dpyzhov) on 2015-11-18

tags:

added: feature

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-11-18:

This is not a feature but a configuration change, this can be easily backported.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-18: Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/246912
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=372b37123065b8a267c9ea9f2305c3ff404acff9
Submitter: Jenkins
Branch: master

commit 372b37123065b8a267c9ea9f2305c3ff404acff9
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Nov 18 13:14:00 2015 +0100

Configure corosync cluster-recheck-interval

    Add the cluster_recheck_interval param with
    a 190 sec default, configurable via hiera.
    Add rspec and noop tests.

Closes-bug: #1517388

Change-Id: I82b7220e24282d1fbda69ed7d3788e2bdc0afcfc
Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status:	In Progress → Fix Committed

Aleksei Stepanov (penguinolog) on 2015-11-19

tags:

added: on-verification

Aleksei Stepanov (penguinolog) on 2015-11-24

tags:	removed: on-verification
Changed in fuel:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.