Comment 3 for bug 1714248

Revision history for this message
Mark Goddard (mgoddard) wrote :

Staring at the nova code a little longer, I think I've pieced together what happened.

* An instance was aborted during creation.
* Destroying the instance failed because the node was locked (possibly due to a long running neutron port update), and the retry mechanism maxed out.
* Shortly afterwards, during the compute service's update_available_resource periodic task, the compute node was determined to be an orphan, and deleted.
* Deleting the resource provider for the compute node failed because allocations still existed from the instance that wasn't cleaned up.

This raises a question, why was the compute node seen to be orphaned? This happened because the ironic virt driver did not include the node in the list returned by get_available_nodes(). I suspect this is because the ironic node still had an instance_uuid set, but that instance was not mapped to the compute host.

Another point worth mentioning is that I ended up deleting the stale resource provider in the DB, and the compute service created another, allowing things to return to normal.