SRIOV instance gets type-PF interface, libvirt kvm fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Hemanth Nakkina | ||
Queens |
Fix Released
|
Medium
|
Hemanth Nakkina | ||
Rocky |
Fix Released
|
Medium
|
Hemanth Nakkina | ||
Stein |
Fix Released
|
Medium
|
Hemanth Nakkina | ||
Train |
Fix Released
|
Medium
|
Hemanth Nakkina | ||
Ussuri |
Fix Released
|
Medium
|
Hemanth Nakkina | ||
Victoria |
Fix Released
|
Medium
|
Hemanth Nakkina | ||
Ubuntu Cloud Archive |
Fix Released
|
Medium
|
Unassigned | ||
Queens |
Fix Released
|
Undecided
|
Unassigned | ||
Rocky |
Fix Released
|
Undecided
|
Unassigned | ||
Stein |
Fix Released
|
Undecided
|
Unassigned | ||
Train |
Fix Released
|
Undecided
|
Unassigned | ||
Ussuri |
Fix Released
|
Undecided
|
Unassigned | ||
Victoria |
Fix Released
|
Medium
|
Unassigned | ||
nova (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Chris MacNaughton | ||
Groovy |
Fix Released
|
Medium
|
Unassigned | ||
Hirsute |
Fix Released
|
Medium
|
Unassigned |
Bug Description
When spawning an SR-IOV enabled instance on a newly deployed host, nova attempts to spawn it with an type-PF pci device. This fails with the below stack trace.
After restarting neutron-sriov-agent and nova-compute services on the compute node and spawning an SR-IOV instance again, a type-VF pci device is selected, and instance spawning succeeds.
Stack trace:
2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
nce: 9498ea75-
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.561 7624 ERROR nova.compute.
2020-08-20 08:29:09.599 7624 INFO nova.compute.
To reproduce, bring up an instance with an SR-IOV port on a freshly deployed compute:
+ openstack port create -f value -c id --network testinstance_net --vnic-type=direct --binding-profile type=dict --binding-profile physical_
+ openstack server create --flavor ce6da933-
Observe that a PF is getting selected for the sriov nic.
From nova-compute.log:
<interface type='hostdev' managed='yes'>
<mac address=
<source>
<address type='pci' domain='0x0000' bus='0xd8' slot='0x00' function='0x1'/>
</source>
<vlan>
<tag id='48'/>
</vlan>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</interface>
...
2020-08-20 08:29:09.056 7624 DEBUG nova.virt.
vif_type=hw_veb ...
vif={"profile":
{"pci_slot": "0000:d8:00.1", "physical_network": "physnet2", "pci_vendor_info": "15b3:1015"},
"ovs_
"address": "192.168.0.5"}], "version": 4, "meta": {"dhcp_server": "192.168.0.2"}, "dns": [], "routes": [], "cidr": "192.168.0.0/29",
"gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.0.1"}}], "meta": {"injected": false, "tenant_id": "dd99e7950a5b46
"physical_
"id": "60b3001e-
"details": {"port_filter": false, "vlan": "48"}, "address": "98:03:
virt_type=kvm get_config /usr/lib/
Device is a PF:
# lspci | grep d8:00.1
d8:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
Also the nova pci_devices table has it's dev_type correctly listed:
mysql> select compute_nodes.host, pci_devices.
+------
| host | created_at | compute_node_id | address | dev_type | status | dev_id |
+------
| foo-compute-host | 2020-08-12 17:10:19 | 95 | 0000:19:00.1 | type-PF | available | pci_0000_19_00_1 |
| foo-compute-host | 2020-08-12 17:10:19 | 95 | 0000:d8:00.1 | type-PF | available | pci_0000_d8_00_1 |
+------
Restarting services:
# systemctl status neutron-
# systemctl restart neutron-
Spawning an instance again, it gets allocated a type-VF port (and spawning succeeds):
<interface type='hostdev' managed='yes'>
<mac address=
<source>
<address type='pci' domain='0x0000' bus='0xd8' slot='0x05' function='0x1'/>
</source>
<vlan>
<tag id='4'/>
</vlan>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</interface>
# lspci | grep d8:05.1
d8:05.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
After spawning an instance, the PF get marked as "unavailable" in the nova db:
+------
| host | created_at | updated_at | instance_uuid | compute_node_id | address | dev_type | status | dev_id |
+------
| foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:45:07 | NULL | 95 | 0000:19:00.1 | type-PF | available | pci_0000_19_00_1 |
| foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:46:30 | NULL | 95 | 0000:d8:00.1 | type-PF | unavailable | pci_0000_d8_00_1 |
+------
Software versions:
# dpkg -l | grep nova-common
ii nova-common 2:17.0.12-0ubuntu1 all OpenStack Compute - common files
# dpkg -l | grep libvirt0
ii libvirt0:amd64 4.0.0-1ubuntu8.17 amd64 library for interfacing with different virtualization systems
# lsb_release -r
Release: 18.04
+++++++
[Impact]
Spawning an SR-IOV instance fails on a newly deployed compute.
Nova attempts to spawn a PCI device of type type-PCI instead of type-VF.
This was happened in OpenStack Queens deployment.
[Test case]
1. Issue can be reproduced by following steps in comment #3
https:/
2. Install the package with fixed code
3. Confirm bug have been fixed
Repeat the steps mentioned in comment #3 and check if the instance with sriov port is created successfully.
[Where problems could occur]
Upstream CI ran all the functional test cases that triggers this scenario.
Installation of new package will result in restart of nova-compute service.
description: | updated |
Changed in nova: | |
assignee: | nobody → Hemanth Nakkina (hemanth-n) |
Changed in nova (Ubuntu Hirsute): | |
status: | New → Fix Released |
Changed in nova (Ubuntu Groovy): | |
status: | New → Fix Released |
Changed in nova (Ubuntu Hirsute): | |
importance: | Undecided → Medium |
Changed in nova (Ubuntu Groovy): | |
importance: | Undecided → Medium |
Changed in nova (Ubuntu Focal): | |
assignee: | nobody → Chris MacNaughton (chris.macnaughton) |
tags: |
added: verification-done-bionic removed: verification-needed-bionic |
Subscribing field-high as this is a common source of issues with customer deploys (for fresh deploys and hardware repairs)