broken instances are considered to be consuming resources

Bug #1012822 reported by Alexej Ababilov
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Unassigned

Bug Description

nova/scheduler/host_manager.py uses a simple model for determination if a host has enough resources to run a new instance. It simply adds all resources for all instances that are scheduled for that host. However, these instances can be broken (in ERROR state) - they can simply do not exist at all, so, they don's consume resources. An instance can be broken if there are no free networks, if its images are improper, or even due to RPC timeout in nova-compute as it happens. These instances cannot be revived and should not be taken into account.

Tags: scheduler
Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
tags: added: scheduler
Revision history for this message
wangpan (hzwangpan) wrote :

I have a question, if the instance becomes to ERROR state during spawn in hypervisor, in other words, the disk/image has already been created(disk resource need to be consumed?), how we deal this situation?

Rohan (kanaderohan)
Changed in nova:
assignee: nobody → Rohan (kanaderohan)
Revision history for this message
Rohan (kanaderohan) wrote :

1st scenario : During the scheduler activities during instance create, the scheduler updates its compute node level data before the request to spawn even reaches the compute node. So if the instance is in ERROR state during these activities, i think we can ignore updating the resource consumed by that instance in the host manager update call.

2nd scenario (wangpan has pointed out) : Instance is being spawned by hypervisor and disk has been created and something goes wrong and instance goes into ERROR state, This case should be handled by the periodic resource tracker and not the host_manager since host manager has no idea about whats happened at hypervisor.

do these approaches sound alright?

Changed in nova:
assignee: Rohan (kanaderohan) → nobody
Changed in nova:
assignee: nobody → Sumant Murke (sumant-murke)
Changed in nova:
assignee: Sumant Murke (sumant-murke) → nobody
AMIT KUMAR (maurya0092)
Changed in nova:
assignee: nobody → AMIT KUMAR (maurya0092)
Revision history for this message
stgleb (gstepanov) wrote :

i think wee need clear way how to consider resource as acquired or not. Since we cannot do that it is more reliable way
to consider all resources as aquired ( bad scenario) and delegate responsibility to reclaim resources of instances in ERROR state to
cloud operator.

Changed in nova:
assignee: AMIT KUMAR (maurya0092) → nobody
Revision history for this message
Chris Dent (cdent) wrote :

I agree with gstepanov: It is better to be conservative about these failures. It's hard to be sure about the state of the instance so better to clear up the error states later, with some administrative oversight.

So is this even a bug?

Chris Dent (cdent)
Changed in nova:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.