Recover from Build state on compute manager start-up

Bug #1197024 reported by Phil Day
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
David McNally

Bug Description

If a compute manager is stopped / fails during a build operation then the instance will be left stuck with vm_state=BUILDING

During restart the one thing we can be sure about such instances is that providing the task state is not SCHEDULING (i.e the request is not still on the queue) then there is no thread running for this instance. (Might need to add another task_state to be set as soon as possible on the the compute manager).

In this case it should be possible to treat instances in this state as if they have failed to spawn and either put them into an ERROR state, or even better tidy-up and send back to the scheduler as if the spawn had failed.

Michael Still (mikal)
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Changed in nova:
assignee: nobody → David McNally (dave-mcnally)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/47836

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/47836
Committed: http://github.com/openstack/nova/commit/0e3d2622ae7e1fce85cda1e7451bcc932d116fbf
Submitter: Jenkins
Branch: master

commit 0e3d2622ae7e1fce85cda1e7451bcc932d116fbf
Author: David McNally <email address hidden>
Date: Mon Sep 23 14:31:15 2013 +0100

    Recover from build state on compute manager start-up

    If a compute manager is stopped / fails during a build operation
    then the instance will be left stuck with vm_state=BUILDING

    During restart the one thing we can be sure about such instances
    is that providing the task state is not SCHEDULING (i.e the request
    is not still on the queue) then there is no thread running for this
    instance.

    In this case we can treat instances in this state as if they have
    failed to spawn and put the into an ERROR state.

    Closes-Bug: 1197024
    Related to blueprint recover-stuck-state

    Change-Id: I7b116d8154036c43b0feb23f5669f52358408d2b

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → icehouse-1
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
tags: added: havana-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.