OpenStack Heat

watch timers can't be disabled for multiple engines

Bug #1322128 reported by Steven Hardy on 2014-05-22

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Heat	Fix Released	Medium	Angus Salkeld	OpenStack Heat 2014.2 "juno"

Bug Description

There are two issues with the current implementation of watch rule timers, which trigger periodic evaluation of watch rules (which are used by the CWLiteAlarm resource).

Long term we need to remove all the watch rule stuff an mandate use of ceilometer instead, but short term we have the following issues:

1. The watch tasks are started after the fork when multiple workers processes are specified. In practice this appears to not result in duplicate tasks, because of the global StackWatch object and common ThreadGroupManager, but it's suboptimal and could have undesired side-effects - we only ever want one watch task as StackWatch doesn't use the stack lock.

2. There is no way to globally disable the watch tasks. This is an issue for multi-engine deployments, where due to the aforementioned lack of locking, every engine will race each other running the same watch tasks for every stack. The solution is to globally disable watch tasks and mandate that only ceilometer alarms be used in a scaled out deployment (which makes sense anyway, given the crude and unscalable implementation of heat internal alarming).

Currently, if someone creates a stack containing a CWLiteAlarm resource when running multiple engines, things will appear to work, but are highly likely to be racy, so we should add a switch to disable watch tasks and document that they should be disabled when running more than one heat-engine process.

OpenStack Infra (hudson-openstack) on 2014-05-22

Changed in heat:
assignee:	nobody → Steven Hardy (shardy)
status:	New → In Progress

Steven Hardy (shardy) on 2014-07-22

Changed in heat:
milestone:	none → juno-3

Revision history for this message

Visnusaran Murugan (visnusaran-murugan) wrote on 2014-08-05:

Hi shardy,

ThreadGroupManager created in create_periodic_tasks which is triggered first is overritten by service start's own ThreadGroupManager. I guess, there needs a check in EngineService.start before creating ThreadGroupManager again. StackWatch has its own disconnected ThreadGroupManager.

Revision history for this message

Visnusaran Murugan (visnusaran-murugan) wrote on 2014-08-05:

Any ways described problem exists for a multi-engine setup and my earlier comment is related to not having a separate ThreadGroupManager explicitly for StackWatch. The manager created using create_periodic_tasks should either be part of StackWatch class or not be part of EngineService. (self.thread_group_mgr = ThreadGroupManager())

Thierry Carrez (ttx) on 2014-09-03

Changed in heat:
milestone:	juno-3 → juno-rc1

Zane Bitter (zaneb) on 2014-09-17

Changed in heat:
importance:	Undecided → Medium

Steve Baker (steve-stevebaker) on 2014-09-19

Changed in heat:
assignee:	Steven Hardy (shardy) → nobody

Revision history for this message

Angus Salkeld (asalkeld) wrote on 2014-09-22:

I think part one of this is done: https://github.com/openstack/heat/commit/507555a585d22a6dc276344b7b97fa9a015e81e5
(Steven can you confirm?)

Changed in heat:
assignee:	nobody → Angus Salkeld (asalkeld)

Revision history for this message

Angus Salkeld (asalkeld) wrote on 2014-09-22:

I'll take a look at an option to disable cloud watch "lite".

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-22: Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/123039

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-27: Fix merged to heat (master)

Reviewed: https://review.openstack.org/123039
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=33eb87b3e2d8da3ba11901c4762c19b0fc740dab
Submitter: Jenkins
Branch: master

commit 33eb87b3e2d8da3ba11901c4762c19b0fc740dab
Author: Angus Salkeld <email address hidden>
Date: Thu Sep 25 08:26:36 2014 +1000

Add an option to disable cloud watch lite

This also adds a deprecation warning.
This also changes the default to use Ceilometer.

    Release message:
    Anyone deploying Heat should not be using OS::Heat::CWLiteAlarm, but
    OS::Ceilometer::Alarm.
    CWLiteAlarm should be explictly disabled in /etc/heat/heat.conf by
    setting "enable_cloud_watch_lite=false". This will stop Heat from
    running a period task check for alarms.

    DocImpact
    Change-Id: I2a10c14772bdafc001e211d7e94502ac1f6b32b1
    Closes-bug: #1322128

Changed in heat:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2014-10-02

Changed in heat:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in heat:
milestone:	juno-rc1 → 2014.2

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Deprecate WatchRule implementation

Remote bug watches

Bug watches keep track of this bug in other bug trackers.