OpenStack Compute (nova)

broken instances are considered to be consuming resources

Bug #1012822 reported by Alexej Ababilov on 2012-06-13

This bug affects 7 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Invalid	Medium	Unassigned

Bug Description

nova/scheduler/host_manager.py uses a simple model for determination if a host has enough resources to run a new instance. It simply adds all resources for all instances that are scheduled for that host. However, these instances can be broken (in ERROR state) - they can simply do not exist at all, so, they don's consume resources. An instance can be broken if there are no free networks, if its images are improper, or even due to RPC timeout in nova-compute as it happens. These instances cannot be revived and should not be taken into account.

Tags:

Thierry Carrez (ttx) on 2012-06-18

Changed in nova:
importance:	Undecided → Medium
status:	New → Confirmed
tags:	added: scheduler

Revision history for this message

wangpan (hzwangpan) wrote on 2013-03-22:

I have a question, if the instance becomes to ERROR state during spawn in hypervisor, in other words, the disk/image has already been created(disk resource need to be consumed?), how we deal this situation?

Rohan (kanaderohan) on 2013-11-26

Changed in nova:
assignee:	nobody → Rohan (kanaderohan)

Revision history for this message

Rohan (kanaderohan) wrote on 2013-12-04:

1st scenario : During the scheduler activities during instance create, the scheduler updates its compute node level data before the request to spawn even reaches the compute node. So if the instance is in ERROR state during these activities, i think we can ignore updating the resource consumed by that instance in the host manager update call.

2nd scenario (wangpan has pointed out) : Instance is being spawned by hypervisor and disk has been created and something goes wrong and instance goes into ERROR state, This case should be handled by the periodic resource tracker and not the host_manager since host manager has no idea about whats happened at hypervisor.

do these approaches sound alright?

Davanum Srinivas (DIMS) (dims-v) on 2015-03-16

Changed in nova:
assignee:	Rohan (kanaderohan) → nobody

Sumant Murke (sumant-murke) on 2015-11-02

Changed in nova:
assignee:	nobody → Sumant Murke (sumant-murke)

Sumant Murke (sumant-murke) on 2015-11-04

Changed in nova:
assignee:	Sumant Murke (sumant-murke) → nobody

AMIT KUMAR (maurya0092) on 2016-02-11

Changed in nova:
assignee:	nobody → AMIT KUMAR (maurya0092)

Revision history for this message

stgleb (gstepanov) wrote on 2016-02-26:

i think wee need clear way how to consider resource as acquired or not. Since we cannot do that it is more reliable way
to consider all resources as aquired ( bad scenario) and delegate responsibility to reclaim resources of instances in ERROR state to
cloud operator.

Davanum Srinivas (DIMS) (dims-v) on 2016-03-04

Changed in nova:
assignee:	AMIT KUMAR (maurya0092) → nobody

Revision history for this message

Chris Dent (cdent) wrote on 2016-03-09:

I agree with gstepanov: It is better to be conservative about these failures. It's hard to be sure about the state of the instance so better to clear up the error states later, with some administrative oversight.

So is this even a bug?

Chris Dent (cdent) on 2016-03-15

Changed in nova:
status:	Confirmed → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.