Scheduler didn't release already allocated HostState resources after multiple creating instances fails

Bug #1408859 reported by Rui Chen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Rui Chen

Bug Description

We multiple-create 3 instances, but the host resource is only enough for 1 instance,
nova-scheduler consume the resource of selected host for the first instance in select_destinations.
After the multiple creating fails, we try to boot 1 instance with same flavor, the host have
enough resource to boot it, but nova-scheduler raise 'No Valid Host'. And more worse is that
host resource tracker only update compute node into DB when the host resource have changed,so ComputeNode's update time in DB will be less than the update time in scheduler cache,the scheduler cache can't be updated. In this case, the host will not be selected forever.
We need to release the host resource when multiple creating instance is failed.

Tags: scheduler
Rui Chen (kiwik-chenrui)
description: updated
Changed in nova:
assignee: nobody → Rui Chen (kiwik-chenrui)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/147048

Changed in nova:
status: New → In Progress
Rui Chen (kiwik-chenrui)
tags: added: scheduler
Changed in nova:
importance: Undecided → Low
Changed in nova:
assignee: Rui Chen (kiwik-chenrui) → Jay Pipes (jaypipes)
Revision history for this message
Jay Pipes (jaypipes) wrote :

Rui, are you using the caching filter scheduler by any chance? I can't see how this could happen with the regular filter scheduler because on each call to select_destinations(), the entire HostState list in the HostStateManager is completely rebuilt with a fresh SELECT * FROM compute_nodes query:

https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L535

-jay

Changed in nova:
assignee: Jay Pipes (jaypipes) → Rui Chen (kiwik-chenrui)
Revision history for this message
Rui Chen (kiwik-chenrui) wrote :

Hi Jay:

    I used the regular filter scheduler, and had described the reason detail in the last patch comments. Please have a look it when you have time, thanks.

    https://review.openstack.org/#/c/147048/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/147048
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=800c5980130568379019c61b24e5c752bfa11fff
Submitter: Jenkins
Branch: master

commit 800c5980130568379019c61b24e5c752bfa11fff
Author: Rui Chen <email address hidden>
Date: Mon Apr 27 19:04:52 2015 +0800

    Fix scheduler issue when multiple-create failed

    If multiple creating failed, set the updated time of
    selected HostState to None so that these HostStates are
    refreshed according to database in next schedule, and
    release the resource consumed by instance in the process
    of selecting host.

    Change-Id: I70b3272b7dc3d29f39bd8c2d8fed362cf497c887
    Closes-Bug: #1408859

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.