Comment 3 for bug 1892361

Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

Able to simulate the error scenario with the following steps:

1. Deployed an openstack environment Queens with single compute node having one SRIOV NIC
2. nova.pci_devices table have all the device entries properly.
3. Stopped the nova-compute service
4. Updated the device with dev_type type-PF to type-PCI in nova.pci_devices table
   ( This is to simulate a scenario where the device is registered without sriov capabilities initially)
5. Started the nova-compute service
6. Verified the nova.pci_devices table.
   The device dev_type is modified to type-PF in the Database ( as expected )
   The resource updates from the hypervisor modified the dev_type from type-PCI to type-PF back.
7. Launched an instance with SRIOV port. Fails with the same error as mentioned in the bug report.

Looking at the logs, the PCI stats pool is not updated during step 5.
INFO nova.compute.resource_tracker [req-6b15330f-2874-464a-b49b-452e177bb38f - - - - -] Final resource view: name=test.compute phys_ram=64355MB used_ram=512MB phys_disk=365GB used_disk=0GB total_vcpus=28 used_vcpus=0 pci_stats=[PciDevicePool(count=63,numa_node=0,product_id='1515',tags={dev_type='type-VF',physical_network='physnet1'},vendor_id='8086'), PciDevicePool(count=1,numa_node=0,product_id='1528',tags={dev_type='type-PCI',physical_network='physnet1'},vendor_id='8086')]

The device is listed under the pool with dev_type type-PCI and the expectation is the device will be under pool with dev_type type-PF. The pools are not updated unless the device address changes (only add/remove scenario)

And since the dev_type of the pool is still type-PCI, the device is eligible for selection and further resulting into the error.

In short, this case may arise when initially the NIC is not registered with SRIOV capabilities. And then NIC is configured as SRIOV and restarted nova-compute service (before it gathers information from periodic hypervisor resource updates)