heat down instance create build bug

Bug #1368954 reported by Kevin Fox
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
Won't Fix
Medium
Unassigned

Bug Description

If Heat is not responding during a Trove instance create, the instance gets stuck in BUILD state forever and never recovers.

It also seems like there is no way any more to delete the instance in this state.

Revision history for this message
Denis M. (dmakogon) wrote :

i'd suggest to grep service catalog (from context) and see if there's an endpoint for heat, if not - raise an exception and makr instance as FAILED.

Revision history for this message
Kevin Fox (kevpn) wrote :

no, the endpoint is there. the heat api was not responding at the time of trove instance creation. I had to restart heat-api/engine before it became responsive again. But the same issue can happen, if there is a problem with network connectivity. Trove should not get stuck forever in BUILDING state and not allow you to fix it without database manipulations.

Revision history for this message
Denis M. (dmakogon) wrote :

In this case you have to validate that API is responsive, just call 'stack list' before going to create a new stack.

Revision history for this message
Denis M. (dmakogon) wrote :

But, in general, it seems like a deployment problem, not Trove, at all. You have to verify that heat API is reachable from Trove API host. So, i think this is not a bug.

Revision history for this message
Kevin Fox (kevpn) wrote :

So, if heat is down, trove should let an instance get stuck in BUILDING state forever and never let the user use or delete the instance but still dock their quota for a nonfunctional instance? That sounds like a bug to me. As a user, I dont expect to have to check on heat before launching a trove instance. I'd expect the instance to go into ERROR state on the failure, so I can delete it. That it doesn't is a bug.

Revision history for this message
Denis M. (dmakogon) wrote :

In production deployment you don't have only one Heat API service running (with respect to Load Ballancers). So i still stand on that this is not a Trove bug, it's any deployment mistake. Nothing else.

Changed in trove:
status: New → Invalid
Revision history for this message
Amrith Kumar (amrith) wrote :

Kevin Fox, please update with additional information to help debug this or close this bug. Thanks!

Changed in trove:
status: Invalid → Incomplete
Revision history for this message
Kevin Fox (kevpn) wrote :

Simple test case.
 * configure trove to use heat.
 * stop the heat api endpoint.
 * use trove to launch an instance.
 * wait a few seconds seconds.
 * start the heat api endpoint.

You now have a trove instance stuck in status=BUILD forever. At very least, it should error out at some point. Better would be for it to retry the heat stack-create multiple times before giving up.

I have yet to figure out a way to unstick it or delete the instance in trove without mucking around in the database.

Revision history for this message
Nikhil Manchanda (slicknik) wrote :

So we have an instance usage_timeout which basically configures the amount of time an instance can stay in BUILD before it is marked as ERROR (so that the user can go and clean it up.)

If this is not happening in the case when we use heat to provision an instance, that seems like a bug to me.

Changed in trove:
status: Incomplete → Triaged
importance: Undecided → Medium
milestone: none → kilo-1
Revision history for this message
Kevin Fox (kevpn) wrote :

Yeah that would be great. It would also solve a similar problem I just ran into. I tried launching a slave mysql off of my existing working instance. The master went to BACKUP state, and then the process died since I don't have swift enabled on my test cloud. So I have two instances now, one stuck in BACKUP for the last 3 days, and a second one the slave, stuck in BUILD state. Both undeletable.

Changed in trove:
milestone: kilo-1 → kilo-2
Changed in trove:
milestone: kilo-2 → kilo-3
Changed in trove:
milestone: kilo-3 → ongoing
Revision history for this message
Amrith Kumar (amrith) wrote :

heat support (trove using heat to provision instances) is being eliminated.

Changed in trove:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.