Comment 1 for bug 1633120

Revision history for this message
Jon Proulx (jproulx) wrote : Re: Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance (openstack-mitaka)

I ran into a very similar issue with GPU passthrough (satble/mitaka from ubuntu cloudarchive on 14.04).

In my case there was a config management bug on my end which removed the active devices from the nova DB and then when the config was fixed nova created new "available" records for all the devices including the ones currently in use.

I think nova should check if duplicate "deleted" records exist and undletete them checking if the assinged instance if there is one still exists, if it does leave it assigned if it doesn't mark the resource as available in addition to undeleting.

example DB state:
> SELECT created_at,deleted_at,deleted,id,compute_node_id,address,status,instance_uuid FROM pci_devices WHERE address='0000:09:00.0';
+---------------------+---------------------+---------+----+-----------------+--------------+-----------+--------------------------------------+
| created_at | deleted_at | deleted | id | compute_node_id | address | status | instance_uuid |
+---------------------+---------------------+---------+----+-----------------+--------------+-----------+--------------------------------------+
| 2016-07-06 00:12:30 | 2016-10-13 21:04:53 | 4 | 4 | 90 | 0000:09:00.0 | allocated | 9269391a-4ce4-4c8d-993d-5ad7a9c3879b |
| 2016-10-18 18:01:35 | NULL | 0 | 12 | 90 | 0000:09:00.0 | available | NULL |
+---------------------+---------------------+---------+----+-----------------+--------------+-----------+--------------------------------------+

In this case instance ID 9269391a-4ce4-4c8d-993d-5ad7a9c3879b did exist and was using PCI 09:00.0 but it was associated in the deleted row.

I only had three devices which were affected by this (and in use) so could relatively easily fix by hand. I wonder the SRIOV issue is the same.