PCI devices claimed on compute node during _claim_test()
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Jay Pipes |
Bug Description
The nova.compute.
def __init__(self, context, instance, tracker, resources, overhead=None,
<snip>
# Check claim at constructor to avoid mess code
# Raise exception ComputeResource
If we take a look at _claim_test(), we see pretty clearly that resources are NOT supposed to be actually claimed -- instead, the method should only *check* to see if the request can be fulfilled:
def _claim_test(self, resources, limits=None):
"""Test if this claim can be satisfied given available resources and
optional oversubscription limits
This should be called before the compute node actually consumes the
resources required to execute the claim.
:param resources: available local compute node resources
:returns: Return true if resources are available to claim.
"""
<snip>
reasons = [self._
reasons = reasons + self._test_
reasons = [r for r in reasons if r is not None]
if len(reasons) > 0:
raise exception.
Unfortunately, the PCI devices are *actually* claimed in the _test_pci() method:
def _test_pci(self):
if pci_requests.
devs = self.tracker.
if not devs:
What this means is that if an instance is attempted to be launched on a compute node and that instance has PCI requests that can be satisfied by the compute host, but say, there isn't enough available RAM on the node, the Claim will raise ComputeResource
devs = self.tracker.
The above code actually marks one or more PCI devices on the compute host as claimed for the instance. This introduces inconsistent state into the system. Making things worse is the fact that the nova.pci.
tags: | added: pci |
Changed in nova: | |
assignee: | nobody → Jay Pipes (jaypipes) |
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → Low |
tags: | added: mitaka-rc-potential |
tags: |
added: resource-tracker removed: mitaka-rc-potential |
This really is a data/state- corruption bug, since it erroneously consumes PCI devices on the host even when the Claim does not succeed.