Failed VM's not set to error state on exception

Bug #1182056 reported by moorryan
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
moorryan
Grizzly
Fix Released
High
Ruby Loo

Bug Description

We've identified that when an instance build fails because it has exceeded the number of retries (i.e. been tried on 3 different hosts) the upstream logic in the scheduler does not correctly set the VM state to ERROR.

There is an exception handler for NoValidHost in manager.run_instance(), but it relies on request_spec[instance_uuids] to determine which instances to put into the Error State, and schedule_run_instance removes this value (as its normally about to split the request up into several separate requests).

moorryan (moorryan)
Changed in nova:
assignee: nobody → moorryan (moorryan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/29780

Changed in nova:
status: New → In Progress
tags: added: grizzly-backport-potential
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/29780
Committed: http://github.com/openstack/nova/commit/aefc28dd481354edd0f3b5aec18db006680b2ffe
Submitter: Jenkins
Branch: master

commit aefc28dd481354edd0f3b5aec18db006680b2ffe
Author: Ryan Moore <email address hidden>
Date: Mon May 20 14:56:55 2013 +0100

    set ERROR state when scheduler hits max attempts

    Presently when scheduler raises NoValidHost due to max attempts
    being reached, the instance remains in a build state.
    Exception handler for NoValidHost in manager.run_instance() needs
    request_spec[instance_uuids] to know which host to put into an
    error state in _set_vm_state_and_notify().
    schedule_run_instances() was popping instance_uuids from the
    request_spec prior to a call to _schedule().
    Changed pop of instance_uuids prior to call to _schedule() to be a get.
    Added pop of instance_uuids to beneath call to _schedule() as
    individual creates do not need them.

    Change-Id: I9654820e01d5611763e9e673f15f46b947d09e6d
    Fixes: bug #1182056

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → havana-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/grizzly)

Fix proposed to branch: stable/grizzly
Review: https://review.openstack.org/30892

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/grizzly)

Reviewed: https://review.openstack.org/30892
Committed: http://github.com/openstack/nova/commit/7726dae1c3720477191cfdb239fdd2c5d0952285
Submitter: Jenkins
Branch: stable/grizzly

commit 7726dae1c3720477191cfdb239fdd2c5d0952285
Author: Ryan Moore <email address hidden>
Date: Mon May 20 14:56:55 2013 +0100

    set ERROR state when scheduler hits max attempts

    Presently when scheduler raises NoValidHost due to max attempts
    being reached, the instance remains in a build state.
    Exception handler for NoValidHost in manager.run_instance() needs
    request_spec[instance_uuids] to know which host to put into an
    error state in _set_vm_state_and_notify().
    schedule_run_instances() was popping instance_uuids from the
    request_spec prior to a call to _schedule().
    Changed pop of instance_uuids prior to call to _schedule() to be a get.
    Added pop of instance_uuids to beneath call to _schedule() as
    individual creates do not need them.

    Conflicts:
     nova/scheduler/filter_scheduler.py

    Change-Id: I9654820e01d5611763e9e673f15f46b947d09e6d
    Fixes: bug #1182056
    (cherry picked from commit aefc28dd481354edd0f3b5aec18db006680b2ffe)

Alan Pevec (apevec)
tags: removed: grizzly-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-1 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.