Instances stuck in DELETING state if delete fails

Bug #1543511 reported by Radomir Dopieralski on 2016-02-09
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Unassigned

Bug Description

In Liberty, when a transient error happens (such as lost connection to the database) while the compute is performing a delete of an instance, that instance is stuck in DELETING state, and cannot be deleted anymore. This persists until restarting of the compute service fixes this, as during initialization all deletes are retried, and the delete finishes.

It would be better if the service restart wasn't required.

jichenjc (jichenjc) wrote :

you might try

[root@lljcma mnadmin] # nova reset-state
usage: nova reset-state [--active] <server> [<server> ...]
error: too few arguments
Try 'nova help reset-state' for more information.

and can you provide call back as reference?

Changed in nova:
status: New → Incomplete
Radomir Dopieralski (deshipu) wrote :

Yes, I know you can recover such instances manually, but the whole point is that it would be nice if they recovered automatically after a certain timeout. I intend to work on this.

What do you mean by "call back"?

jichenjc (jichenjc) wrote :

I mean the error lead to exception, sorry a typo, it should be trace back
usually this kind of bug is handled by nova itself, most error case is we didn't include the right exception
into catch list, so the automatic revert didn't take effect
in case you provide the trace back of exception, it will be much easier to know why the automatic revert didn't work, thanks

Radomir Dopieralski (deshipu) wrote :

The errors is a lost connection to the rabbit, and no, you can't just catch and recover from it, because, well, you don't have connection to the conductor, and so no access to the database to update the state.

Sean Dague (sdague) wrote :

I think it would be fine to have another periodic task to handle stuck deleting instances.

Changed in nova:
status: Incomplete → Confirmed
importance: Undecided → Low
status: Confirmed → Triaged
Changed in nova:
assignee: nobody → Mohammed Ashraf (mohammed-asharaf)
Changed in nova:
status: Triaged → In Progress
Changed in nova:
assignee: Mohammed Ashraf (mohammed-asharaf) → nobody
Changed in nova:
status: In Progress → Confirmed
Rajesh Tailor (ratailor) on 2016-03-14
Changed in nova:
assignee: nobody → Rajesh Tailor (ratailor)

Fix proposed to branch: master
Review: https://review.openstack.org/294491

Changed in nova:
status: Confirmed → In Progress

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/294491
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → Confirmed
assignee: Rajesh Tailor (ratailor) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers