If no hosts found during resize, scheduler will leave instance stuck in RESIZE

Bug #928521 reported by Johannes Erdfelt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Andrew Clay Shafer

Bug Description

While doing a resize to a very large instance type, the scheduler was left with no hosts to schedule to:

2012-02-07 21:55:33,247 WARNING nova.scheduler.manager [-] Failed to schedule_prep_resize: No valid host was found.

Unfortunately, this left the instance stuck in RESIZE.

Revision history for this message
Nirmal Ranganathan (rnirmal) wrote :

This and a few other states are interesting cos they are in a fungible state, it's not an error state, but still a user request could not be completed. So would this just be marked as ERROR even if the instance is alive, and let the user know with an asynchronous fault that the instance could not be resized?

I'm not sure how such states are currently being handled in nova.

Brian Waldon (bcwaldon)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
milestone: none → essex-4
Revision history for this message
Johannes Erdfelt (johannes.erdfelt) wrote :

That's a good point. I can confirm the instance was still running fine, just with the original instance type (as expected). It does seem odd to move to ERROR if it's still running. I think moving back to ACTIVE and adding an asynchronous fault would make most sense.

Changed in nova:
assignee: nobody → Andrew Clay Shafer (littleidea)
Revision history for this message
Thierry Carrez (ttx) wrote :

Looks like we won't have a fix in time for E4

Changed in nova:
milestone: essex-4 → essex-rc1
Revision history for this message
Mark Washenberger (markwash) wrote :

Andrew,

Have you already started working on this? I was playing around with a partial fix but I won't mess with it if you're already on your way.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/4798

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/4798
Committed: http://github.com/openstack/nova/commit/3d4213d1faa76179a6fafba653845ede1c73a7bb
Submitter: Jenkins
Branch: master

commit 3d4213d1faa76179a6fafba653845ede1c73a7bb
Author: Andrew Clay Shafer <email address hidden>
Date: Thu Mar 1 22:41:15 2012 -0500

    Reset instance to ACTIVE when no hosts found

    bug 928521

    modified nova/scheduler/manager.py to reset vm_state to ACTIVE and set
    task_state to None when prep_resize raises a NoHostsFound

    refactored run_instance and prep_resize so they don't go through
    _schedule and now must be implemented in driver

    Changed behavior to set vm_state to error on any other exception in
    prep_resize.

    Change behavior to change instance vm_state to ERROR on exceptions

    Added tests that the vm_state gets updated

    Added tests that schedule_prep_resize and schedule_run_instance
    have no implementation in the Driver base class

    Had to adjust methods and tests for Multi scheduler to reflect the
    new Scheduler contract

    Change-Id: Ibcac7ef0df3456793a2132beb7a711849510da80

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: essex-rc1 → 2012.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.