unavailable ironic nodes being scheduled to
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Jesse J. Cook | ||
Mitaka |
Fix Released
|
Medium
|
Jay Faulkner |
Bug Description
When the compute resource tracker checks nodes, the ironic driver checks the node against a list of states that it should return resources for. This is to prevent nodes in various ironic states, like our cleaning process, that are not available from being scheduled to by nova.
The logic around this check ( https:/
The problem is when you have an orphaned instance on your node, one which ironic sees as present but nova does not (usually nova lists it as having been deleted).
The instance detection will return true, causing the memory_mb_used and memory_mb values to be set to the retrieved value from instance_
The check for _node_resources
Once the resource tracker calls _update_
Ironic will then fail the build attempt due to it showing an instance already associated with the node.
Changed in nova: | |
assignee: | nobody → Mark Silence (madasi) |
tags: | added: ironic |
Changed in nova: | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in nova: | |
assignee: | Mark Silence (madasi) → nobody |
Changed in nova: | |
assignee: | nobody → Jesse J. Cook (jesse-j-cook) |
tags: | added: liberty-backport-potential mitaka-backport-potential |
tags: |
added: ironic removed: mitaka-backport-potential |
My initial thought was to swap the logical order of the _node_resources _used and the _node_resources _unavailable checks so that the check for instances only happens after we check for unavailable conditions, however I think that would cause the same situation as bug #1502177 where if you have a maintenance node with an active instance that nova does know about, it would set the usage itself from the instance record, subtract it from the 0 total resources we sent due to maintenance state, and would report negative free space.
It looks like if the ironic driver implements the get_per_ instance_ usage() call, then the compute's resource tracker would properly account for orphaned instances and stop reporting them as available capacity. However, I think we would need to pass an ironic node identifier since this probably addresses a compute under nova's one compute == one host assumption. This would mean changing the function signature and thus the driver API, which is not a trivial change.
Trying to see what the best way to do this is.