[SRU] Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Medium
|
sean mooney | |||
Ocata |
Medium
|
sean mooney | |||
Pike |
Medium
|
sean mooney | |||
Queens |
Medium
|
sean mooney | |||
Rocky |
Medium
|
sean mooney | |||
Ubuntu Cloud Archive |
Undecided
|
Unassigned | |||
Mitaka |
High
|
Unassigned | |||
Ocata |
High
|
Unassigned | |||
Queens |
Undecided
|
Unassigned | |||
Rocky |
Undecided
|
Unassigned | |||
Stein |
Undecided
|
Unassigned | |||
nova (Ubuntu) |
Undecided
|
Unassigned | |||
Xenial |
High
|
Unassigned | |||
Bionic |
Undecided
|
Unassigned | |||
Cosmic |
Undecided
|
Unassigned | |||
Disco |
Undecided
|
Unassigned | |||
Eoan |
Undecided
|
Unassigned |
Bug Description
[Impact]
This patch is required to prevent nova from accidentally marking pci_device allocations as deleted when it incorrectly reads the passthrough whitelist
[Test Case]
* deploy openstack (any version that supports sriov)
* single compute configured for sriov with at least once device in pci_passthrough
* create a vm and attach sriov port
* remove device from pci_passthrough
* check that pci_devices allocations have not been marked as deleted
[Regression Potential]
None anticipated
-------
Upon trying to create VM instance (Say A) with one QAT VF, it fails with the following error i.e., “Requested operation is not valid: PCI device 0000:88:04.7 is in use by driver QEMU, domain instance-00000081”. Please note that, PCI device 0000:88:04.7 is already being assigned to another VM (Say B) . We have installed openstack-mitaka release on CentO7 system. It has two Intel QAT devices. There are 32 VF devices available per QAT Device/DH895xCC device Out of 64 VFs, only 8 VFs are allocated (to VM instances) and rest should be available.
But the nova scheduler tries to assign an already-in-use SRIOV VF to a new instance and instance fails. It appears that the nova database is not tracking which VF's have already been taken. But if I shut down VM B instance, then other instance VM A boots up and vice-versa. Note that, both the VM instances cannot run simultaneously because of the aforesaid issue.
We should always be able to create as many instances with the requested PCI devices as there are available VFs.
Please feel free to let me know if additional information is needed. Can anyone please suggest why it tries to assign same PCI device which has been assigned already? Is there any way to resolve this issue? Thank you in advance for your support and help.
[root@localhost ~(keystone_admin)]# lspci -d:435
83:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
88:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
[root@localhost ~(keystone_admin)]#
[root@localhost ~(keystone_admin)]# lspci -d:443 | grep "QAT Virtual Function" | wc -l
64
[root@localhost ~(keystone_admin)]#
[root@localhost ~(keystone_admin)]# mysql -u root nova -e "SELECT hypervisor_
localhost 0000:88:04.7 e10a76f3-
localhost 0000:88:04.7 c3dbac90-
localhost 0000:88:04.7 c7f6adad-
localhost.
[root@localhost ~(keystone_admin)]#
[root@localhost ~(keystone_admin)]# grep -r e10a76f3-
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
[root@localhost ~(keystone_admin)]#
[root@localhost ~(keystone_admin)]# grep -r 0c3c11a5-
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
/etc/libvirt/
[root@localhost ~(keystone_admin)]#
On the controller, , it appears there are duplicate PCI device entries in the Database:
MariaDB [nova]> select hypervisor_
+------
| hypervisor_hostname | address | count(*) |
+------
| localhost | 0000:05:00.0 | 3 |
| localhost | 0000:05:00.1 | 3 |
| localhost | 0000:83:01.0 | 3 |
| localhost | 0000:83:01.1 | 3 |
| localhost | 0000:83:01.2 | 3 |
| localhost | 0000:83:01.3 | 3 |
| localhost | 0000:83:01.4 | 3 |
| localhost | 0000:83:01.5 | 3 |
| localhost | 0000:83:01.6 | 3 |
| localhost | 0000:83:01.7 | 3 |
| localhost | 0000:83:02.0 | 3 |
| localhost | 0000:83:02.1 | 3 |
| localhost | 0000:83:02.2 | 3 |
| localhost | 0000:83:02.3 | 3 |
| localhost | 0000:83:02.4 | 3 |
| localhost | 0000:83:02.5 | 3 |
| localhost | 0000:83:02.6 | 3 |
| localhost | 0000:83:02.7 | 3 |
| localhost | 0000:83:03.0 | 3 |
| localhost | 0000:83:03.1 | 3 |
| localhost | 0000:83:03.2 | 3 |
| localhost | 0000:83:03.3 | 3 |
| localhost | 0000:83:03.4 | 3 |
| localhost | 0000:83:03.5 | 3 |
| localhost | 0000:83:03.6 | 3 |
| localhost | 0000:83:03.7 | 3 |
| localhost | 0000:83:04.0 | 3 |
| localhost | 0000:83:04.1 | 3 |
| localhost | 0000:83:04.2 | 3 |
| localhost | 0000:83:04.3 | 3 |
| localhost | 0000:83:04.4 | 3 |
| localhost | 0000:83:04.5 | 3 |
| localhost | 0000:83:04.6 | 3 |
| localhost | 0000:83:04.7 | 3 |
| localhost | 0000:88:01.0 | 3 |
| localhost | 0000:88:01.1 | 3 |
| localhost | 0000:88:01.2 | 3 |
| localhost | 0000:88:01.3 | 3 |
| localhost | 0000:88:01.4 | 3 |
| localhost | 0000:88:01.5 | 3 |
| localhost | 0000:88:01.6 | 3 |
| localhost | 0000:88:01.7 | 3 |
| localhost | 0000:88:02.0 | 3 |
| localhost | 0000:88:02.1 | 3 |
| localhost | 0000:88:02.2 | 3 |
| localhost | 0000:88:02.3 | 3 |
| localhost | 0000:88:02.4 | 3 |
| localhost | 0000:88:02.5 | 3 |
| localhost | 0000:88:02.6 | 3 |
| localhost | 0000:88:02.7 | 3 |
| localhost | 0000:88:03.0 | 3 |
| localhost | 0000:88:03.1 | 3 |
| localhost | 0000:88:03.2 | 3 |
| localhost | 0000:88:03.3 | 3 |
| localhost | 0000:88:03.4 | 3 |
| localhost | 0000:88:03.5 | 3 |
| localhost | 0000:88:03.6 | 3 |
| localhost | 0000:88:03.7 | 3 |
| localhost | 0000:88:04.0 | 3 |
| localhost | 0000:88:04.1 | 3 |
| localhost | 0000:88:04.2 | 3 |
| localhost | 0000:88:04.3 | 3 |
| localhost | 0000:88:04.4 | 3 |
| localhost | 0000:88:04.5 | 3 |
| localhost | 0000:88:04.6 | 3 |
| localhost | 0000:88:04.7 | 3 |
+------
66 rows in set (0.00 sec)
MariaDB [nova]>
Jon Proulx (jproulx) wrote : | #1 |
Changed in nova: | |
status: | New → Confirmed |
tags: | added: pci |
Frode Nordahl (fnordahl) wrote : | #2 |
I believe the current Nova PCI implementation is susceptible to becoming out of sync in multiple scenarios, we have seen this too and suspect rows ended up in 'deleted' state because of lost messages and/or other operational events occurring at the same time as instance life cycle events took place.
Tracking down all the places this disconnect might happen seems like a impossible task, and I believe we should focus on:
a) means for operator to force refresh of PCI devices from a compute node that could easily be back-ported to previous OpenStack versions
b) improve handling of instance PCI attachements in periodic refresh of compute nodes
summary: |
Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new - instance (openstack-mitaka) + instance |
Sean Dague (sdague) wrote : Re: Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance | #3 |
Automatically discovered version mitaka in description. If this is incorrect, please update the description to include 'nova version: ...'
tags: | added: openstack-version.mitaka |
Matt Riedemann (mriedem) wrote : | #4 |
Seems to me that a very brute force way to prevent deleting allocated pci device records would be to raise an exception here:
If self.instance_uuid is not None. I don't know what is setting the PciDevice.status to REMOVED/DELETED in the stack but clearly it's wrong and we should guard against that.
Matt Riedemann (mriedem) wrote : | #5 |
This might be where the status is changed to REMOVED:
https:/
Matt Riedemann (mriedem) wrote : | #6 |
Yup the reporter of duplicate bug 1809040 reported seeing that warning:
https:/
2018-12-18 20:32:45.051 4961 WARNING nova.pci.manager [req-88cfd6bc-
which aligns with a deleted allocated pci device record:
https:/
So we should probably just change https:/
Changed in nova: | |
importance: | Undecided → High |
assignee: | nobody → sean mooney (sean-k-mooney) |
status: | Confirmed → Triaged |
importance: | High → Medium |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
status: | Triaged → In Progress |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 26c41eccade6412
Author: Sean Mooney <email address hidden>
Date: Wed Dec 19 19:40:05 2018 +0000
PCI: do not force remove allocated devices
In the ocata release the pci_passthrough
was moved from the [DEFAULT] section of the nova.conf
to the [pci] section and renamed to passthrough_
On upgrading if the operator chooses to migrate the config
value to the new section it is not uncommon
to forget to rename the config value.
Similarly if an operator is updateing the whitelist and
mistypes the value it can also lead to the whitelist
being ignored.
As a result of either error the nova compute agent
would delete all database entries for a host regardless of
if the pci device was in use by an instance. If this occurs
the only recorse for an operator is to delete and recreate
the guest on that host after correcting the error or manually
restore the database to backup or otherwise consistent state.
This change alters the _set_hvdevs function to not force
remove allocated or claimed devices if they are no longer
present in the pci whitelist.
Closes-Bug: #1633120
Change-Id: I6e871311a0fa10
Changed in nova: | |
status: | In Progress → Fix Released |
Fix proposed to branch: stable/rocky
Review: https:/
Fix proposed to branch: stable/queens
Review: https:/
Fix proposed to branch: stable/pike
Review: https:/
Fix proposed to branch: stable/ocata
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/rocky
commit 9f9f372f33310ba
Author: Sean Mooney <email address hidden>
Date: Wed Dec 19 19:40:05 2018 +0000
PCI: do not force remove allocated devices
In the ocata release the pci_passthrough
was moved from the [DEFAULT] section of the nova.conf
to the [pci] section and renamed to passthrough_
On upgrading if the operator chooses to migrate the config
value to the new section it is not uncommon
to forget to rename the config value.
Similarly if an operator is updateing the whitelist and
mistypes the value it can also lead to the whitelist
being ignored.
As a result of either error the nova compute agent
would delete all database entries for a host regardless of
if the pci device was in use by an instance. If this occurs
the only recorse for an operator is to delete and recreate
the guest on that host after correcting the error or manually
restore the database to backup or otherwise consistent state.
This change alters the _set_hvdevs function to not force
remove allocated or claimed devices if they are no longer
present in the pci whitelist.
Closes-Bug: #1633120
Change-Id: I6e871311a0fa10
(cherry picked from commit 26c41eccade6412
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/queens
commit 955ecf26c57d1ee
Author: Sean Mooney <email address hidden>
Date: Wed Dec 19 19:40:05 2018 +0000
PCI: do not force remove allocated devices
In the ocata release the pci_passthrough
was moved from the [DEFAULT] section of the nova.conf
to the [pci] section and renamed to passthrough_
On upgrading if the operator chooses to migrate the config
value to the new section it is not uncommon
to forget to rename the config value.
Similarly if an operator is updateing the whitelist and
mistypes the value it can also lead to the whitelist
being ignored.
As a result of either error the nova compute agent
would delete all database entries for a host regardless of
if the pci device was in use by an instance. If this occurs
the only recorse for an operator is to delete and recreate
the guest on that host after correcting the error or manually
restore the database to backup or otherwise consistent state.
This change alters the _set_hvdevs function to not force
remove allocated or claimed devices if they are no longer
present in the pci whitelist.
Closes-Bug: #1633120
Change-Id: I6e871311a0fa10
(cherry picked from commit 26c41eccade6412
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/pike
commit 239bdd0fd243a97
Author: Sean Mooney <email address hidden>
Date: Wed Dec 19 19:40:05 2018 +0000
PCI: do not force remove allocated devices
In the ocata release the pci_passthrough
was moved from the [DEFAULT] section of the nova.conf
to the [pci] section and renamed to passthrough_
On upgrading if the operator chooses to migrate the config
value to the new section it is not uncommon
to forget to rename the config value.
Similarly if an operator is updateing the whitelist and
mistypes the value it can also lead to the whitelist
being ignored.
As a result of either error the nova compute agent
would delete all database entries for a host regardless of
if the pci device was in use by an instance. If this occurs
the only recorse for an operator is to delete and recreate
the guest on that host after correcting the error or manually
restore the database to backup or otherwise consistent state.
This change alters the _set_hvdevs function to not force
remove allocated or claimed devices if they are no longer
present in the pci whitelist.
Conflicts:
nova/
Closes-Bug: #1633120
Change-Id: I6e871311a0fa10
(cherry picked from commit 26c41eccade6412
This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.
This issue was fixed in the openstack/nova 17.0.10 release.
This issue was fixed in the openstack/nova 18.2.0 release.
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/ocata
commit 5c5a6b93a07b0b5
Author: Sean Mooney <email address hidden>
Date: Wed Dec 19 19:40:05 2018 +0000
PCI: do not force remove allocated devices
In the ocata release the pci_passthrough
was moved from the [DEFAULT] section of the nova.conf
to the [pci] section and renamed to passthrough_
On upgrading if the operator chooses to migrate the config
value to the new section it is not uncommon
to forget to rename the config value.
Similarly if an operator is updateing the whitelist and
mistypes the value it can also lead to the whitelist
being ignored.
As a result of either error the nova compute agent
would delete all database entries for a host regardless of
if the pci device was in use by an instance. If this occurs
the only recorse for an operator is to delete and recreate
the guest on that host after correcting the error or manually
restore the database to backup or otherwise consistent state.
This change alters the _set_hvdevs function to not force
remove allocated or claimed devices if they are no longer
present in the pci whitelist.
Conflicts:
nova/
Closes-Bug: #1633120
Change-Id: I6e871311a0fa10
(cherry picked from commit 26c41eccade6412
This issue was fixed in the openstack/nova 16.1.8 release.
summary: |
- Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new - instance + [SRU] Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a + new instance |
description: | updated |
tags: | added: sts-sru-needed |
Corey Bryant (corey.bryant) wrote : | #21 |
Cosmic is EOL so let's just fix this in Rocky.
Changed in nova (Ubuntu Cosmic): | |
status: | New → Won't Fix |
Corey Bryant (corey.bryant) wrote : | #22 |
This is fixed in Ubuntu in all packages > Ocata.
Changed in nova (Ubuntu Eoan): | |
status: | New → Fix Released |
Changed in nova (Ubuntu Disco): | |
status: | New → Fix Released |
Changed in nova (Ubuntu Cosmic): | |
status: | Won't Fix → Fix Released |
Changed in nova (Ubuntu Bionic): | |
status: | New → Fix Released |
Changed in nova (Ubuntu Xenial): | |
importance: | Undecided → High |
status: | New → Triaged |
Corey Bryant (corey.bryant) wrote : | #23 |
I'm not sure how much this is needed in Ubuntu Mitaka. It seems from the commit message this is triggered mostly by movement of pci_passthrough
Corey Bryant (corey.bryant) wrote : | #24 |
Ok chatted with Edward Hope-Morley offline and he confirmed this is being hit on mitaka as well.
Hello Chinmaya, or anyone else affected,
Accepted nova into ocata-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.
Please help us by testing this new package. To enable the -proposed repository:
sudo add-apt-repository cloud-archive:
sudo apt-get update
Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-
Further information regarding the verification process can be found at https:/
tags: | added: verification-ocata-needed |
Edward Hope-Morley (hopem) wrote : | #26 |
Xenial Ocata verified using [Test Case]
Test output: https:/
tags: |
added: verification-ocata-done removed: verification-ocata-needed |
Edward Hope-Morley (hopem) wrote : | #27 |
Mitaka not backportable so abandoning:
$ git-deps -e mitaka-eol 5c5a6b93a07b0b5
c2c3b97259258ee
$ git-deps -e mitaka-eol c2c3b97259258ee
a023c32c70b5ddb
74fbff886398912
e83842b80b73c45
1f259e2a9423a47
b01187eede3881f
49d9433c62d74f6
Changed in nova (Ubuntu Xenial): | |
status: | Triaged → Won't Fix |
The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Corey Bryant (corey.bryant) wrote : | #29 |
This bug was fixed in the package nova - 2:15.1.
---------------
nova (2:15.1.
.
* d/p/pci-
from upstream to prevent forced removal of allocated PCI devices
(LP: #1633120).
tags: |
added: sts-sru-done removed: sts-sru-needed |
I ran into a very similar issue with GPU passthrough (satble/mitaka from ubuntu cloudarchive on 14.04).
In my case there was a config management bug on my end which removed the active devices from the nova DB and then when the config was fixed nova created new "available" records for all the devices including the ones currently in use.
I think nova should check if duplicate "deleted" records exist and undletete them checking if the assinged instance if there is one still exists, if it does leave it assigned if it doesn't mark the resource as available in addition to undeleting.
example DB state: at,deleted_ at,deleted, id,compute_ node_id, address, status, instance_ uuid FROM pci_devices WHERE address= '0000:09: 00.0'; ------- ------- -+----- ------- ------- --+---- -----+- ---+--- ------- ------- +------ ------- -+----- ------+ ------- ------- ------- ------- ------- ---+ ------- ------- -+----- ------- ------- --+---- -----+- ---+--- ------- ------- +------ ------- -+----- ------+ ------- ------- ------- ------- ------- ---+ 4ce4-4c8d- 993d-5ad7a9c387 9b | ------- ------- -+----- ------- ------- --+---- -----+- ---+--- ------- ------- +------ ------- -+----- ------+ ------- ------- ------- ------- ------- ---+
> SELECT created_
+------
| created_at | deleted_at | deleted | id | compute_node_id | address | status | instance_uuid |
+------
| 2016-07-06 00:12:30 | 2016-10-13 21:04:53 | 4 | 4 | 90 | 0000:09:00.0 | allocated | 9269391a-
| 2016-10-18 18:01:35 | NULL | 0 | 12 | 90 | 0000:09:00.0 | available | NULL |
+------
In this case instance ID 9269391a- 4ce4-4c8d- 993d-5ad7a9c387 9b did exist and was using PCI 09:00.0 but it was associated in the deleted row.
I only had three devices which were affected by this (and in use) so could relatively easily fix by hand. I wonder the SRIOV issue is the same.