Nova scheduler attempts to re-assign currently in-use SR-IOV VF to new VM
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
New
|
Undecided
|
Unassigned |
Bug Description
Running a small cluster with 16 compute nodes and 3 controller nodes on OpenStack Queens using SR-IOV VFs. From time to time, it appears that the Nova scheduler loses track of some of the PCI devices (VFs) that are actively mapped into servers. We don't know exactly when this occurs and we cannot trigger it on demand, but it occurs on a number of the compute nodes over time. Restarting the given compute node resolves the issue.
The problem is manifest with the following errors:
/var/log/
The compute nodes in question are configured with the following PCI whitelist:
[pci]
passthrough_
Note the, despite similar bugs, there haven't been changes to the whitelist that would likely cause this to occur. It just seems to develop over time.
===== Versions =====
Compute nodes:
ii nova-common 2:17.0.6-0ubuntu1 all OpenStack Compute - common files
ii nova-compute 2:17.0.6-0ubuntu1 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:17.0.6-0ubuntu1 all OpenStack Compute - compute node (KVM)
ii nova-compute-
Controller nodes:
ii nova-api 2:17.0.9-0ubuntu1 all OpenStack Compute - API frontend
ii nova-common 2:17.0.9-0ubuntu1 all OpenStack Compute - common files
ii nova-compute 2:17.0.9-0ubuntu1 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:17.0.9-0ubuntu1 all OpenStack Compute - compute node (KVM)
ii nova-compute-
ii nova-conductor 2:17.0.9-0ubuntu1 all OpenStack Compute - conductor service
ii nova-consoleauth 2:17.0.9-0ubuntu1 all OpenStack Compute - Console Authenticator
ii nova-novncproxy 2:17.0.9-0ubuntu1 all OpenStack Compute - NoVNC proxy
ii nova-placement-api 2:17.0.9-0ubuntu1 all OpenStack Compute - placement API frontend
ii nova-scheduler 2:17.0.9-0ubuntu1 all OpenStack Compute - virtual machine scheduler
ii nova-serialproxy 2:17.0.9-0ubuntu1 all OpenStack Compute - serial proxy
ii nova-xvpvncproxy 2:17.0.9-0ubuntu1 all OpenStack Compute - XVP VNC proxy
tags: | added: pci resource-tracker scheduler |
I'm assuming you've seen bug 1633120 which sounds very familiar to this but is also already fixed in a stable queens release: https:/ /review. opendev. org/#/c/ 635072/
It looks like you don't have that fix though since it was released in 17.0.10 and you've got 17.0.9. I'd say pick up that fix and see if it solves your problem.