resource tracking is incorrect for ironic

Bug #1402658 reported by Paul Murray
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Ironic nodes can only be assigned to a single instances. So resource allocation is "all or nothing" - i.e. if an instance is spawned on a node all the resources become unavailable no mater how much was stated in the flavor used. The view of the resource tracker does not match this behavior, leading to attempts to spawn instances that can not be accommodated.

More specifically, when an instance is sent to a host manager to build (in ComputeManager.build_and_run_instance()) the resource tracker for a target node will test to see if the node has sufficient resources to accept the instance (in ResourceTracker.instance_claim()). This test checks to see if the amount of resource available on the host (ram, disk, cpu, etc.) is more than the amount requested. If the instance passes this test, the resource tracker will deduct the amount of resource requested from the amount that is available. This may leave a lesser amount still available.

If there is a subsequent instance sent for that node, the resource tracker will accept it if the amount of resource it requests is less than the remaining amount available.

In ironic, spawning an instance will result in no resources left available on that host. So the resource tracker has the wrong view of how much resource is available at the node.

Note that the ironic driver is complemented by an ironic version of the host manager at the scheduler that does do the right thing for ironic in its consume method. However, its values are overridden by those retrieved from the resource tracker, so it is overridden by the values provided by the resource tracker.

Also note that the get_available_resource() method on the ironic virt driver returns the correct resource availability to the resource tracker, but the resource tracker only uses the resource totals and recalculates the availability, thus overriding the values from the driver as well.

Resource tracking can only work with ironic if the exact match filters are used. In which case it works by coincidence. The resource tracker resource consumption should be specialised to deal with ironic nodes correctly.

Paul Murray (pmurray)
description: updated
melanie witt (melwitt)
tags: added: ironic
Revision history for this message
Alex Xu (xuhj) wrote :

I saw some problem, and I test it, this problem can reproduce.

Changed in nova:
status: New → Triaged
Alex Xu (xuhj)
Changed in nova:
assignee: nobody → Alex Xu (xuhj)
Revision history for this message
Alex Xu (xuhj) wrote :
Changed in nova:
assignee: Alex Xu (xuhj) → nobody
Revision history for this message
Paul Murray (pmurray) wrote :

Wrong spec @alex - https://review.openstack.org/#/c/127610/ is the request object spec. That doesn't deal with how resource availability is calculated. The spec that deals with resources is https://review.openstack.org/#/c/127609/

However, the resource objects spec doesn't fix this in itself - it just provides a good place to fix it.

Changed in nova:
importance: Undecided → Low
Sean Dague (sdague)
tags: added: scheduler
Changed in nova:
status: Triaged → Confirmed
Revision history for this message
Michael Davies (mrda) wrote :

This was discussed on 12-Aug at the Ironic Midcycle in Seattle. It was suggested that this interface return "Yes there are resources available" or "No resources available" (i.e. -1 or 0), which is sufficient for Ironic scheduling. This is covered by https://review.openstack.org/#/c/194453/

Revision history for this message
Chris Dent (cdent) wrote :

Is this bug active, in the sense that the current state of the code reflects the situation?

Revision history for this message
Michael Davies (mrda) wrote :

This bug should be investigated as I think it's still valid.

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance: Low → Undecided
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers