watch timers can't be disabled for multiple engines

Bug #1322128 reported by Steven Hardy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Angus Salkeld

Bug Description

There are two issues with the current implementation of watch rule timers, which trigger periodic evaluation of watch rules (which are used by the CWLiteAlarm resource).

Long term we need to remove all the watch rule stuff an mandate use of ceilometer instead, but short term we have the following issues:

1. The watch tasks are started after the fork when multiple workers processes are specified. In practice this appears to not result in duplicate tasks, because of the global StackWatch object and common ThreadGroupManager, but it's suboptimal and could have undesired side-effects - we only ever want one watch task as StackWatch doesn't use the stack lock.

2. There is no way to globally disable the watch tasks. This is an issue for multi-engine deployments, where due to the aforementioned lack of locking, every engine will race each other running the same watch tasks for every stack. The solution is to globally disable watch tasks and mandate that only ceilometer alarms be used in a scaled out deployment (which makes sense anyway, given the crude and unscalable implementation of heat internal alarming).

Currently, if someone creates a stack containing a CWLiteAlarm resource when running multiple engines, things will appear to work, but are highly likely to be racy, so we should add a switch to disable watch tasks and document that they should be disabled when running more than one heat-engine process.

Changed in heat:
assignee: nobody → Steven Hardy (shardy)
status: New → In Progress
Steven Hardy (shardy)
Changed in heat:
milestone: none → juno-3
Revision history for this message
Visnusaran Murugan (visnusaran-murugan) wrote :

Hi shardy,

ThreadGroupManager created in create_periodic_tasks which is triggered first is overritten by service start's own ThreadGroupManager. I guess, there needs a check in EngineService.start before creating ThreadGroupManager again. StackWatch has its own disconnected ThreadGroupManager.

Revision history for this message
Visnusaran Murugan (visnusaran-murugan) wrote :

Any ways described problem exists for a multi-engine setup and my earlier comment is related to not having a separate ThreadGroupManager explicitly for StackWatch. The manager created using create_periodic_tasks should either be part of StackWatch class or not be part of EngineService. (self.thread_group_mgr = ThreadGroupManager())

Thierry Carrez (ttx)
Changed in heat:
milestone: juno-3 → juno-rc1
Zane Bitter (zaneb)
Changed in heat:
importance: Undecided → Medium
Changed in heat:
assignee: Steven Hardy (shardy) → nobody
Revision history for this message
Angus Salkeld (asalkeld) wrote :

I think part one of this is done: https://github.com/openstack/heat/commit/507555a585d22a6dc276344b7b97fa9a015e81e5
(Steven can you confirm?)

Changed in heat:
assignee: nobody → Angus Salkeld (asalkeld)
Revision history for this message
Angus Salkeld (asalkeld) wrote :

I'll take a look at an option to disable cloud watch "lite".

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/123039

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/123039
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=33eb87b3e2d8da3ba11901c4762c19b0fc740dab
Submitter: Jenkins
Branch: master

commit 33eb87b3e2d8da3ba11901c4762c19b0fc740dab
Author: Angus Salkeld <email address hidden>
Date: Thu Sep 25 08:26:36 2014 +1000

    Add an option to disable cloud watch lite

    This also adds a deprecation warning.
    This also changes the default to use Ceilometer.

    Release message:
    Anyone deploying Heat should not be using OS::Heat::CWLiteAlarm, but
    OS::Ceilometer::Alarm.
    CWLiteAlarm should be explictly disabled in /etc/heat/heat.conf by
    setting "enable_cloud_watch_lite=false". This will stop Heat from
    running a period task check for alarms.

    DocImpact
    Change-Id: I2a10c14772bdafc001e211d7e94502ac1f6b32b1
    Closes-bug: #1322128

Changed in heat:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in heat:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: juno-rc1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.