Improve instance state recovery for Compute service failure during Create Server

Bug #1072734 reported by Rohit Karajgi
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Scenario:

Compute service spawns an instance but crashes just before instance's state is updated in database to Active, but instance has started running on the hypervisor.

In this situation, the recovery of the instance requires admin intervention:

- When compute service resumes, the check_instance_build_time periodic task sets the VM State to Error, while task state is still Spawning
- To recover the instance, Admin now has to reset the instance's state to Active (task state gets reset to None)

The instance can now be usable. The sync power state periodic task eventually sets the Power state to Running.

However , this is a tedious workflow needing admin intervention and should be handled in the code.

Michael Still (mikal)
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Grzegorz Grasza (xek)
Changed in nova:
assignee: nobody → Grzegorz Grasza (xek)
status: Triaged → In Progress
Revision history for this message
Grzegorz Grasza (xek) wrote :

To reproduce the error, I stopped the compute in _update_instance_after_spawn method.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/167281

Revision history for this message
wuhao (wuhao) wrote :

Grzegorz Grasza,

Is this work still in progress?

I wonder if it's ok to implement this work after your patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/167281
Reason: This patch has been stalled for a long time, so I am abandoning it. Please feel free to restore it when the code is ready for review.

Grzegorz Grasza (xek)
Changed in nova:
assignee: Grzegorz Grasza (xek) → nobody
Changed in nova:
status: In Progress → Confirmed
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance: Medium → Undecided
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.