[SRU] Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
sean mooney | ||
Ocata |
Fix Committed
|
Medium
|
sean mooney | ||
Pike |
Fix Committed
|
Medium
|
sean mooney | ||
Queens |
Fix Committed
|
Medium
|
sean mooney | ||
Rocky |
Fix Committed
|
Medium
|
sean mooney | ||
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Mitaka |
Won't Fix
|
High
|
Unassigned | ||
Ocata |
Fix Released
|
High
|
Unassigned | ||
Queens |
Fix Released
|
Undecided
|
Unassigned | ||
Rocky |
Fix Released
|
Undecided
|
Unassigned | ||
Stein |
Fix Released
|
Undecided
|
Unassigned | ||
nova (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Won't Fix
|
High
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Cosmic |
Fix Released
|
Undecided
|
Unassigned | ||
Disco |
Fix Released
|
Undecided
|
Unassigned | ||
Eoan |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
This patch is required to prevent nova from accidentally marking pci_device allocations as deleted when it incorrectly reads the passthrough whitelist
[Test Case]
* deploy openstack (any version that supports sriov)
* single compute configured for sriov with at least once device in pci_passthrough
* create a vm and attach sriov port
* remove device from pci_passthrough
* check that pci_devices allocations have not been marked as deleted
[Regression Potential]
None anticipated
-------
Upon trying to create VM instance (Say A) with one QAT VF, it fails with the following error i.e., “Requested operation is not valid: PCI device 0000:88:04.7 is in use by driver QEMU, domain instance-00000081”. Please note that, PCI device 0000:88:04.7 is already being assigned to another VM (Say B) . We have installed openstack-mitaka release on CentO7 system. It has two Intel QAT devices. There are 32 VF devices available per QAT Device/DH895xCC device Out of 64 VFs, only 8 VFs are allocated (to VM instances) and rest should be available.
But the nova scheduler tries to assign an already-in-use SRIOV VF to a new instance and instance fails. It appears that the nova database is not tracking which VF's have already been taken. But if I shut down VM B instance, then other instance VM A boots up and vice-versa. Note that, both the VM instances cannot run simultaneously because of the aforesaid issue.
We should always be able to create as many instances with the requested PCI devices as there are available VFs.
Please feel free to let me know if additional information is needed. Can anyone please suggest why it tries to assign same PCI device which has been assigned already? Is there any way to resolve this issue? Thank you in advance for your support and help.
[root@localhost ~(keystone_admin)]# lspci -d:435
83:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
88:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
[root@localhost ~(keystone_admin)]#
[root@localhost ~(keystone_admin)]# lspci -d:443 | grep "QAT Virtual Function" | wc -l
64
[root@localhost ~(keystone_admin)]#
[root@localhost ~(keystone_admin)]# mysql -u root nova -e "SELECT hypervisor_
localhost 0000:88:04.7 e10a76f3-
localhost 0000:88:04.7 c3dbac90-
localhost 0000:88:04.7 c7f6adad-
localhost.
[root@localhost ~(keystone_admin)]#
[root@localhost ~(keystone_admin)]# grep -r e10a76f3-
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
[root@localhost ~(keystone_admin)]#
[root@localhost ~(keystone_admin)]# grep -r 0c3c11a5-
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
[root@localhost ~(keystone_admin)]#
On the controller, , it appears there are duplicate PCI device entries in the Database:
MariaDB [nova]> select hypervisor_
+------
| hypervisor_hostname | address | count(*) |
+------
| localhost | 0000:05:00.0 | 3 |
| localhost | 0000:05:00.1 | 3 |
| localhost | 0000:83:01.0 | 3 |
| localhost | 0000:83:01.1 | 3 |
| localhost | 0000:83:01.2 | 3 |
| localhost | 0000:83:01.3 | 3 |
| localhost | 0000:83:01.4 | 3 |
| localhost | 0000:83:01.5 | 3 |
| localhost | 0000:83:01.6 | 3 |
| localhost | 0000:83:01.7 | 3 |
| localhost | 0000:83:02.0 | 3 |
| localhost | 0000:83:02.1 | 3 |
| localhost | 0000:83:02.2 | 3 |
| localhost | 0000:83:02.3 | 3 |
| localhost | 0000:83:02.4 | 3 |
| localhost | 0000:83:02.5 | 3 |
| localhost | 0000:83:02.6 | 3 |
| localhost | 0000:83:02.7 | 3 |
| localhost | 0000:83:03.0 | 3 |
| localhost | 0000:83:03.1 | 3 |
| localhost | 0000:83:03.2 | 3 |
| localhost | 0000:83:03.3 | 3 |
| localhost | 0000:83:03.4 | 3 |
| localhost | 0000:83:03.5 | 3 |
| localhost | 0000:83:03.6 | 3 |
| localhost | 0000:83:03.7 | 3 |
| localhost | 0000:83:04.0 | 3 |
| localhost | 0000:83:04.1 | 3 |
| localhost | 0000:83:04.2 | 3 |
| localhost | 0000:83:04.3 | 3 |
| localhost | 0000:83:04.4 | 3 |
| localhost | 0000:83:04.5 | 3 |
| localhost | 0000:83:04.6 | 3 |
| localhost | 0000:83:04.7 | 3 |
| localhost | 0000:88:01.0 | 3 |
| localhost | 0000:88:01.1 | 3 |
| localhost | 0000:88:01.2 | 3 |
| localhost | 0000:88:01.3 | 3 |
| localhost | 0000:88:01.4 | 3 |
| localhost | 0000:88:01.5 | 3 |
| localhost | 0000:88:01.6 | 3 |
| localhost | 0000:88:01.7 | 3 |
| localhost | 0000:88:02.0 | 3 |
| localhost | 0000:88:02.1 | 3 |
| localhost | 0000:88:02.2 | 3 |
| localhost | 0000:88:02.3 | 3 |
| localhost | 0000:88:02.4 | 3 |
| localhost | 0000:88:02.5 | 3 |
| localhost | 0000:88:02.6 | 3 |
| localhost | 0000:88:02.7 | 3 |
| localhost | 0000:88:03.0 | 3 |
| localhost | 0000:88:03.1 | 3 |
| localhost | 0000:88:03.2 | 3 |
| localhost | 0000:88:03.3 | 3 |
| localhost | 0000:88:03.4 | 3 |
| localhost | 0000:88:03.5 | 3 |
| localhost | 0000:88:03.6 | 3 |
| localhost | 0000:88:03.7 | 3 |
| localhost | 0000:88:04.0 | 3 |
| localhost | 0000:88:04.1 | 3 |
| localhost | 0000:88:04.2 | 3 |
| localhost | 0000:88:04.3 | 3 |
| localhost | 0000:88:04.4 | 3 |
| localhost | 0000:88:04.5 | 3 |
| localhost | 0000:88:04.6 | 3 |
| localhost | 0000:88:04.7 | 3 |
+------
66 rows in set (0.00 sec)
MariaDB [nova]>
Changed in nova: | |
status: | New → Confirmed |
tags: | added: pci |
summary: |
Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new - instance (openstack-mitaka) + instance |
Changed in nova: | |
importance: | Undecided → High |
assignee: | nobody → sean mooney (sean-k-mooney) |
status: | Confirmed → Triaged |
importance: | High → Medium |
summary: |
- Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new - instance + [SRU] Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a + new instance |
description: | updated |
tags: | added: sts-sru-needed |
tags: |
added: sts-sru-done removed: sts-sru-needed |
I ran into a very similar issue with GPU passthrough (satble/mitaka from ubuntu cloudarchive on 14.04).
In my case there was a config management bug on my end which removed the active devices from the nova DB and then when the config was fixed nova created new "available" records for all the devices including the ones currently in use.
I think nova should check if duplicate "deleted" records exist and undletete them checking if the assinged instance if there is one still exists, if it does leave it assigned if it doesn't mark the resource as available in addition to undeleting.
example DB state: at,deleted_ at,deleted, id,compute_ node_id, address, status, instance_ uuid FROM pci_devices WHERE address= '0000:09: 00.0'; ------- ------- -+----- ------- ------- --+---- -----+- ---+--- ------- ------- +------ ------- -+----- ------+ ------- ------- ------- ------- ------- ---+ ------- ------- -+----- ------- ------- --+---- -----+- ---+--- ------- ------- +------ ------- -+----- ------+ ------- ------- ------- ------- ------- ---+ 4ce4-4c8d- 993d-5ad7a9c387 9b | ------- ------- -+----- ------- ------- --+---- -----+- ---+--- ------- ------- +------ ------- -+----- ------+ ------- ------- ------- ------- ------- ---+
> SELECT created_
+------
| created_at | deleted_at | deleted | id | compute_node_id | address | status | instance_uuid |
+------
| 2016-07-06 00:12:30 | 2016-10-13 21:04:53 | 4 | 4 | 90 | 0000:09:00.0 | allocated | 9269391a-
| 2016-10-18 18:01:35 | NULL | 0 | 12 | 90 | 0000:09:00.0 | available | NULL |
+------
In this case instance ID 9269391a- 4ce4-4c8d- 993d-5ad7a9c387 9b did exist and was using PCI 09:00.0 but it was associated in the deleted row.
I only had three devices which were affected by this (and in use) so could relatively easily fix by hand. I wonder the SRIOV issue is the same.