Heat should retry on DBConnectionError when releasing lock

Bug #1555840 reported by Zane Bitter
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Triaged
High
Unassigned

Bug Description

When Heat gets a DBConnectionError because the database is down, this is generally treated as an error and whatever write we were trying to do does not occur. This is probably the right behaviour in many cases, but there are circumstances where it can be problematic.

Specifically, when releasing a stack lock at the end of an operation, Heat should probably retry releasing the lock until the database comes back online. If it does not and simply lets the exception bubble up and kill the thread, then the engine not only retains the lock but remains alive so that no other engine can steal it. (This can be worked around comparatively easily by restarted the engine that holds the lock, but it's not at all obvious when it happens that that's what you need to do.)

Revision history for this message
Zane Bitter (zaneb) wrote :
Rabi Mishra (rabi)
Changed in heat:
milestone: newton-1 → newton-2
Thomas Herve (therve)
Changed in heat:
milestone: newton-2 → newton-3
Revision history for this message
huangtianhua (huangtianhua) wrote :
Revision history for this message
Zane Bitter (zaneb) wrote :

Yep, looks like the same issue.

Thomas Herve (therve)
Changed in heat:
milestone: newton-3 → next
Revision history for this message
Chris Suttles (killface007) wrote :

Does the reset stack in https://bugs.launchpad.net/bugs/1561214 address this?

Revision history for this message
Zane Bitter (zaneb) wrote :

Chris, probably not. If an engine is still holding the lock and is still alive, other engines will be unable to steal it. Restarting heat-engine would both enable the lock to be stolen and reset any in-progress stacks to failed at startup.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.