SRIOV instance gets type-PF interface, libvirt kvm fails

Bug #1892361 reported by Peter Sabaini on 2020-08-20
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Hemanth Nakkina
Queens
Medium
Hemanth Nakkina
Rocky
Medium
Hemanth Nakkina
Stein
Medium
Hemanth Nakkina
Train
Medium
Hemanth Nakkina
Ussuri
Medium
Hemanth Nakkina
Victoria
Medium
Hemanth Nakkina
Ubuntu Cloud Archive
Medium
Unassigned
Queens
Undecided
Unassigned
Rocky
Undecided
Unassigned
Stein
Undecided
Unassigned
Train
Undecided
Unassigned
Ussuri
Undecided
Unassigned
Victoria
Medium
Unassigned
nova (Ubuntu)
Medium
Unassigned
Bionic
Undecided
Unassigned
Focal
Undecided
Chris MacNaughton
Groovy
Medium
Unassigned
Hirsute
Medium
Unassigned

Bug Description

When spawning an SR-IOV enabled instance on a newly deployed host, nova attempts to spawn it with an type-PF pci device. This fails with the below stack trace.

After restarting neutron-sriov-agent and nova-compute services on the compute node and spawning an SR-IOV instance again, a type-VF pci device is selected, and instance spawning succeeds.

Stack trace:
2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [insta
nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last):
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in _build_resources
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] yield resources
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] block_device_info=block_device_info)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in spawn
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure=True)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in _create_domain_and_network
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise()
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, self.tb)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5620, in _create_domain_and_network
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] post_xml_callback=post_xml_callback)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5555, in _create_domain
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] guest.launch(pause=pause)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 144, in launch
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self._encoded_xml, errors='ignore')
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise()
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, self.tb)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 139, in launch
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] return self._domain.createWithFlags(flags)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] result = proxy_call(self._autowrap, f, *args, **kwargs)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] rv = execute(f, *args, **kwargs)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(c, e, tb)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] rv = meth(*args, **kwargs)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1092, in createWithFlags
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]
2020-08-20 08:29:09.599 7624 INFO nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Terminating instance

To reproduce, bring up an instance with an SR-IOV port on a freshly deployed compute:

+ openstack port create -f value -c id --network testinstance_net --vnic-type=direct --binding-profile type=dict --binding-profile physical_network=physnet2 testinstance_net-port
+ openstack server create --flavor ce6da933-adc3-4e5f-a688-63b037705729 --image a3580f59-a6c6-41f6-85fa-2fc7277492a1 --nic port-id=547cd89a-3f91-4646-84d9-c9559b497526 --availability-zone nova:foo-compute-host testinstance_vanilla_66016d81-bc32-4def-a7b3-a3a164ca5164

Observe that a PF is getting selected for the sriov nic.

From nova-compute.log:

    <interface type='hostdev' managed='yes'>
      <mac address='98:03:9b:61:22:e9'/>
      <source>
        <address type='pci' domain='0x0000' bus='0xd8' slot='0x00' function='0x1'/>
      </source>
      <vlan>
        <tag id='48'/>
      </vlan>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </interface>
...
2020-08-20 08:29:09.056 7624 DEBUG nova.virt.libvirt.vif [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b]
vif_type=hw_veb ...
vif={"profile":
  {"pci_slot": "0000:d8:00.1", "physical_network": "physnet2", "pci_vendor_info": "15b3:1015"},
  "ovs_interfaceid": null, "preserve_on_delete": true, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [],
  "address": "192.168.0.5"}], "version": 4, "meta": {"dhcp_server": "192.168.0.2"}, "dns": [], "routes": [], "cidr": "192.168.0.0/29",
  "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.0.1"}}], "meta": {"injected": false, "tenant_id": "dd99e7950a5b46b5b924ccd1720b6257",
  "physical_network": "physnet2", "mtu": 9000},
  "id": "60b3001e-21c1-4947-8996-314449f614c060b3001e-21c1-4947-8996-314449f614c0", "label": "net_20Aug-1"}, "devname": "tapf3953098-98", "vnic_type": "direct", "qbh_params": null, "meta": {},
  "details": {"port_filter": false, "vlan": "48"}, "address": "98:03:9b:61:22:e9", "active": false, "type": "hw_veb", "id": "f3953098-98f7-4dd1-8b31-11f51a5a760f", "qbg_params": null}
virt_type=kvm get_config /usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py:572

Device is a PF:

# lspci | grep d8:00.1
d8:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

Also the nova pci_devices table has it's dev_type correctly listed:

mysql> select compute_nodes.host, pci_devices.created_at, compute_node_id, address, dev_type, status, pci_devices.dev_id from pci_devices join compute_nodes ON (compute_nodes.id = pci_devices.compute_node_id) where compute_nodes.host = 'foo-compute-host' and pci_devices.dev_type = 'type-PF';
+------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
| host | created_at | compute_node_id | address | dev_type | status | dev_id |
+------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
| foo-compute-host | 2020-08-12 17:10:19 | 95 | 0000:19:00.1 | type-PF | available | pci_0000_19_00_1 |
| foo-compute-host | 2020-08-12 17:10:19 | 95 | 0000:d8:00.1 | type-PF | available | pci_0000_d8_00_1 |
+------------------+---------------------+-----------------+--------------+----------+-----------+------------------+

Restarting services:

# systemctl status neutron-sriov-agent.service
# systemctl restart neutron-sriov-agent.service

Spawning an instance again, it gets allocated a type-VF port (and spawning succeeds):

    <interface type='hostdev' managed='yes'>
      <mac address='fa:16:3e:34:d2:99'/>
      <source>
        <address type='pci' domain='0x0000' bus='0xd8' slot='0x05' function='0x1'/>
      </source>
      <vlan>
        <tag id='4'/>
      </vlan>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </interface>

# lspci | grep d8:05.1
d8:05.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]

After spawning an instance, the PF get marked as "unavailable" in the nova db:

+------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
| host | created_at | updated_at | instance_uuid | compute_node_id | address | dev_type | status | dev_id |
+------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
| foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:45:07 | NULL | 95 | 0000:19:00.1 | type-PF | available | pci_0000_19_00_1 |
| foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:46:30 | NULL | 95 | 0000:d8:00.1 | type-PF | unavailable | pci_0000_d8_00_1 |
+------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+

Software versions:

# dpkg -l | grep nova-common
ii nova-common 2:17.0.12-0ubuntu1 all OpenStack Compute - common files
# dpkg -l | grep libvirt0
ii libvirt0:amd64 4.0.0-1ubuntu8.17 amd64 library for interfacing with different virtualization systems
# lsb_release -r
Release: 18.04

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

[Impact]

Spawning an SR-IOV instance fails on a newly deployed compute.
Nova attempts to spawn a PCI device of type type-PCI instead of type-VF.

This was happened in OpenStack Queens deployment.

[Test case]

1. Issue can be reproduced by following steps in comment #3
   https://bugs.launchpad.net/nova/+bug/1892361/comments/3

2. Install the package with fixed code

3. Confirm bug have been fixed
   Repeat the steps mentioned in comment #3 and check if the instance with sriov port is created successfully.

[Where problems could occur]

Upstream CI ran all the functional test cases that triggers this scenario.
Installation of new package will result in restart of nova-compute service.

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Subscribing field-high as this is a common source of issues with customer deploys (for fresh deploys and hardware repairs)

description: updated
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Additional data point: the behaviour changes when only restarting nova-compute -- the pci device allocation changes to a type-VF device.

However, spawning an instance still fails with VIF creation error in this case:

2020-08-20 13:47:41.505 681090 ERROR nova.compute.manager [req-39da5984-92fc-44d9-976d-2bac4279497e ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [i
nstance: 1b22a515-3ddc-4c5b-a5d6-537d98a4d990] Failed to allocate network(s): VirtualInterfaceCreateException: Virtual Interface creation failed
2020-08-20 13:47:41.505 681090 ERROR nova.compute.manager [instance: 1b22a515-3ddc-4c5b-a5d6-537d98a4d990] Traceback (most recent call last):
2020-08-20 13:47:41.505 681090 ERROR nova.compute.manager [instance: 1b22a515-3ddc-4c5b-a5d6-537d98a4d990] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance
2020-08-20 13:47:41.505 681090 ERROR nova.compute.manager [instance: 1b22a515-3ddc-4c5b-a5d6-537d98a4d990] block_device_info=block_device_info)
2020-08-20 13:47:41.505 681090 ERROR nova.compute.manager [instance: 1b22a515-3ddc-4c5b-a5d6-537d98a4d990] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in spawn
2020-08-20 13:47:41.505 681090 ERROR nova.compute.manager [instance: 1b22a515-3ddc-4c5b-a5d6-537d98a4d990] destroy_disks_on_failure=True)
2020-08-20 13:47:41.505 681090 ERROR nova.compute.manager [instance: 1b22a515-3ddc-4c5b-a5d6-537d98a4d990] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5644, in _create_domain_and_network
2020-08-20 13:47:41.505 681090 ERROR nova.compute.manager [instance: 1b22a515-3ddc-4c5b-a5d6-537d98a4d990] raise exception.VirtualInterfaceCreateException()
2020-08-20 13:47:41.505 681090 ERROR nova.compute.manager [instance: 1b22a515-3ddc-4c5b-a5d6-537d98a4d990] VirtualInterfaceCreateException: Virtual Interface creation failed

Changed in nova:
assignee: nobody → Hemanth Nakkina (hemanth-n)
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

Able to simulate the error scenario with the following steps:

1. Deployed an openstack environment Queens with single compute node having one SRIOV NIC
2. nova.pci_devices table have all the device entries properly.
3. Stopped the nova-compute service
4. Updated the device with dev_type type-PF to type-PCI in nova.pci_devices table
   ( This is to simulate a scenario where the device is registered without sriov capabilities initially)
5. Started the nova-compute service
6. Verified the nova.pci_devices table.
   The device dev_type is modified to type-PF in the Database ( as expected )
   The resource updates from the hypervisor modified the dev_type from type-PCI to type-PF back.
7. Launched an instance with SRIOV port. Fails with the same error as mentioned in the bug report.

Looking at the logs, the PCI stats pool is not updated during step 5.
INFO nova.compute.resource_tracker [req-6b15330f-2874-464a-b49b-452e177bb38f - - - - -] Final resource view: name=test.compute phys_ram=64355MB used_ram=512MB phys_disk=365GB used_disk=0GB total_vcpus=28 used_vcpus=0 pci_stats=[PciDevicePool(count=63,numa_node=0,product_id='1515',tags={dev_type='type-VF',physical_network='physnet1'},vendor_id='8086'), PciDevicePool(count=1,numa_node=0,product_id='1528',tags={dev_type='type-PCI',physical_network='physnet1'},vendor_id='8086')]

The device is listed under the pool with dev_type type-PCI and the expectation is the device will be under pool with dev_type type-PF. The pools are not updated unless the device address changes (only add/remove scenario)

And since the dev_type of the pool is still type-PCI, the device is eligible for selection and further resulting into the error.

In short, this case may arise when initially the NIC is not registered with SRIOV capabilities. And then NIC is configured as SRIOV and restarted nova-compute service (before it gathers information from periodic hypervisor resource updates)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/749175

Changed in nova:
status: New → In Progress
Revision history for this message
sean mooney (sean-k-mooney) wrote :

openstack port create -f value -c id --network testinstance_net --vnic-type=direct --binding-profile type=dict --binding-profile physical_network=physnet2 testinstance_net-port

just an fyi setting physical_network=physnet2 in the binding profile is not supported

that is set by nova users should never set that.

we get the physnet info form the network and then use that to update the binding profile but the binding profile is inteded to convay info form nova to neturon.
its admin only be defualt and there is only one user/admin settable filed with is for trusted VF.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

what release of openstack was this observed in?

vnic_type=direct was allowed to select PF in older relesase and its still valid for it to select type-pci i belive i will have to look.

change that added support for vnic-type=direct-phsyical broke backwards compatiabliy with the old behaver of direct mean vf or pf but type-pci may be an expction to that.

and yes the assment in https://bugs.launchpad.net/nova/+bug/1892361/comments/3
is correct this bug will only be hit if you updated the firmware of bios config of a device that was previously not capable of support sriov to one that is.

this could also happen if you replaced a non sriov vf capable nic with one that does
the workaround for that case is to remove the nic, start the agent, add an nic and start the agent.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

i think this should really be set to low as its very unlikely to happen as it require either a hardware change or frimware update to trigger. with that said i am going to triage this as medium however since its no trivial for an operator to figure this out.

Changed in nova:
importance: Undecided → Medium
tags: added: compute pci resource-tracker
Revision history for this message
sean mooney (sean-k-mooney) wrote :

https://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/sriov-pf-passthrough-neutron-port.html is the spec that track adding support that redeined the meaning of vnic_type=direct by the way.

before that driect ment any of the 3 types.

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

FTR. physical_network=physnet2 was set to address issues with a specific SDN integration.

This was observed on a Queens cloud.

I suspect what we saw is not hardware change or firmware upgrade but the fact the number of enabled VFs in sysfs, e.g. as in /sys/class/net/eno2/device/sriov_numvfs is initially set to 0, and we are enabling the number of VFs during startup. Possibly initial registration happened prior to enabling VFs

Revision history for this message
sean mooney (sean-k-mooney) wrote :

no we do not take account of /sys/class/net/eno2/device/sriov_numvfs when determining the type
its based solely on the pci config space capablities

chaning /sys/class/net/eno2/device/sriov_numvfs form 0 to another value should not affect the pcie capabilities.

that is a common misconception however.
this is the logic that determins the device type
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7125-L7163

if you have a device that exposes /sys/class/net/eno2/device/sriov_numvfs
without the "virt_functions" capablity or a device wehre that change based the number
of VFs allcoated you should file a driver bug with your nic vendor as both behaviors would be incorrect.

this predates queens but i dont think we will backport it further then queens so i have added the different releases between master and queens.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/749175
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b8695de6da56db42b83b9d9d4c330148766644be
Submitter: Zuul
Branch: master

commit b8695de6da56db42b83b9d9d4c330148766644be
Author: Hemanth Nakkina <email address hidden>
Date: Tue Sep 1 09:36:51 2020 +0530

    Update pci stat pools based on PCI device changes

    At start up of nova-compute service, the PCI stat pools are
    populated based on information in pci_devices table in Nova
    database. The pools are updated only when new device is added
    or removed but not on any device changes like device type.

    If an existing device is configured as SRIOV and nova-compute
    is restarted, the pci_devices table gets updated but the device
    is still listed under the old pool in pci_tracker.stats.pool
    (in-memory object).

    This patch looks for device type updates in existing devices
    and updates the pools accordingly.

    Change-Id: Id4ebb06e634a612c8be4be6c678d8265e0b99730
    Closes-Bug: #1892361

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/761700

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/761701

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/761725

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/761727

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/761824

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/761825

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/victoria)

Reviewed: https://review.opendev.org/761700
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d8b8a8193b6b8228f6e7d6bde68b5ea6bb53dd8b
Submitter: Zuul
Branch: stable/victoria

commit d8b8a8193b6b8228f6e7d6bde68b5ea6bb53dd8b
Author: Hemanth Nakkina <email address hidden>
Date: Tue Sep 1 09:36:51 2020 +0530

    Update pci stat pools based on PCI device changes

    At start up of nova-compute service, the PCI stat pools are
    populated based on information in pci_devices table in Nova
    database. The pools are updated only when new device is added
    or removed but not on any device changes like device type.

    If an existing device is configured as SRIOV and nova-compute
    is restarted, the pci_devices table gets updated but the device
    is still listed under the old pool in pci_tracker.stats.pool
    (in-memory object).

    This patch looks for device type updates in existing devices
    and updates the pools accordingly.

    Change-Id: Id4ebb06e634a612c8be4be6c678d8265e0b99730
    Closes-Bug: #1892361
    (cherry picked from commit b8695de6da56db42b83b9d9d4c330148766644be)

Changed in nova (Ubuntu Hirsute):
status: New → Fix Released
Changed in nova (Ubuntu Groovy):
status: New → Fix Released
Changed in nova (Ubuntu Hirsute):
importance: Undecided → Medium
Changed in nova (Ubuntu Groovy):
importance: Undecided → Medium
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
description: updated
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

SRU team,
debdiff's for focal, bionic, UCA train/stein/rocky are uploaded

Changed in nova (Ubuntu Focal):
assignee: nobody → Chris MacNaughton (chris.macnaughton)
Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello Peter, or anyone else affected,

Accepted nova into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:21.1.1-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in nova (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Hello Peter, or anyone else affected,

Accepted nova into ussuri-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ussuri-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ussuri-needed to verification-ussuri-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ussuri-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ussuri-needed
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

Verified the fix on ubuntu focal and bionic-ussuri and the test case is successful.

tags: added: verification-done-focal verification-ussuri-done
removed: verification-needed-focal verification-ussuri-needed
Revision history for this message
melanie witt (melwitt) wrote :

Fix has been committed but is not yet released. Release is proposed here:

https://review.opendev.org/c/openstack/releases/+/773347

Revision history for this message
melanie witt (melwitt) wrote :

Fix has been committed to openstack/nova but is not yet released for stable/train. Release is proposed here:

https://review.opendev.org/c/openstack/releases/+/773074

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.5.0

This issue was fixed in the openstack/nova 20.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 21.1.2

This issue was fixed in the openstack/nova 21.1.2 release.

Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

Hi SRU team,

Please ignore the earlier uploaded debdiff's for UCA stein/rocky/queens since the debdiffs and upstream submitted code are different.

Attached the new updated debdiff for UCA stein.

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for nova has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2:21.1.1-0ubuntu2

---------------
nova (2:21.1.1-0ubuntu2) focal; urgency=medium

  * d/p/lp1892361.patch: Update pci stat pools based on PCI device changes (LP: #1892361).

 -- Chris MacNaughton <email address hidden> Mon, 18 Jan 2021 15:25:16 +0000

Changed in nova (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

This bug was fixed in the package nova - 2:21.1.1-0ubuntu2~cloud0
---------------

 nova (2:21.1.1-0ubuntu2~cloud0) bionic-ussuri; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 nova (2:21.1.1-0ubuntu2) focal; urgency=medium
 .
   * d/p/lp1892361.patch: Update pci stat pools based on PCI device changes (LP: #1892361).

Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello Peter, or anyone else affected,

Accepted nova into train-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:train-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-train-needed to verification-train-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-train-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-train-needed
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

Verified on bionic-train with the package from cloud-archive:train-proposed.
Launching an instance with sriov port is successful.

tags: added: verification-train-done
removed: verification-train-needed
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote : Update Released

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

This bug was fixed in the package nova - 2:20.5.0-0ubuntu1~cloud0
---------------

 nova (2:20.5.0-0ubuntu1~cloud0) bionic-train; urgency=medium
 .
   * New stable point release for OpenStack Train (LP: #1915787).
   * d/p/lp1892361.patch: Removed after change landed upstream.
 .
 nova (2:20.4.1-0ubuntu1~cloud1) bionic-train; urgency=medium
 .
   * d/p/lp1892361.patch: Update pci stat pools based on PCI device changes (LP: #1892361).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.0.0.0rc1

This issue was fixed in the openstack/nova 23.0.0.0rc1 release candidate.

Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

Hi SRU Team,

Could you please pick the debdiff for UCA Stein from the following comment
https://bugs.launchpad.net/nova/queens/+bug/1892361/comments/32

Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello Peter, or anyone else affected,

Accepted nova into stein-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:stein-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-stein-needed to verification-stein-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-stein-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-stein-needed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/c/openstack/nova/+/761824
Committed: https://opendev.org/openstack/nova/commit/1fb4cc03e315f5b4dbebc521f0d1299273c7c396
Submitter: "Zuul (22348)"
Branch: stable/rocky

commit 1fb4cc03e315f5b4dbebc521f0d1299273c7c396
Author: Hemanth Nakkina <email address hidden>
Date: Tue Sep 1 09:36:51 2020 +0530

    Update pci stat pools based on PCI device changes

    At start up of nova-compute service, the PCI stat pools are
    populated based on information in pci_devices table in Nova
    database. The pools are updated only when new device is added
    or removed but not on any device changes like device type.

    If an existing device is configured as SRIOV and nova-compute
    is restarted, the pci_devices table gets updated but the device
    is still listed under the old pool in pci_tracker.stats.pool
    (in-memory object).

    This patch looks for device type updates in existing devices
    and updates the pools accordingly.

    Conflicts:
        nova/tests/functional/libvirt/test_pci_sriov_servers.py
        nova/tests/unit/virt/libvirt/fakelibvirt.py

    The functional test requires to skip the capabilities of pci
    device. This can be done by getting capability template out of
    pci_dev_template [1] which is introduced by commit
    b927748c257e705903c2aa0ffa47b19914e31ede. Not able to clean
    backport the mentioned commit and so removed funtional test
    case.

    [1] https://opendev.org/openstack/nova/src/commit/b0a451d4289dae2086b730fde6b0c7b30f3da2e8/nova/tests/unit/virt/libvirt/fakelibvirt.py#L186

    Change-Id: Id4ebb06e634a612c8be4be6c678d8265e0b99730
    Closes-Bug: #1892361
    (cherry picked from commit b8695de6da56db42b83b9d9d4c330148766644be)
    (cherry picked from commit d8b8a8193b6b8228f6e7d6bde68b5ea6bb53dd8b)
    (cherry picked from commit f58399cf496566e39d11f82a61e0b47900f2eafa)
    (cherry picked from commit 8378785f995dd4bec2a5a20f7bf5946b3075120d)
    (cherry picked from commit 73e631862a81e85fdf9305f3d15b201d780c8743)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/c/openstack/nova/+/761825
Committed: https://opendev.org/openstack/nova/commit/420df86e23acf050e78b64315f1058ae9704c6d0
Submitter: "Zuul (22348)"
Branch: stable/queens

commit 420df86e23acf050e78b64315f1058ae9704c6d0
Author: Hemanth Nakkina <email address hidden>
Date: Tue Sep 1 09:36:51 2020 +0530

    Update pci stat pools based on PCI device changes

    At start up of nova-compute service, the PCI stat pools are
    populated based on information in pci_devices table in Nova
    database. The pools are updated only when new device is added
    or removed but not on any device changes like device type.

    If an existing device is configured as SRIOV and nova-compute
    is restarted, the pci_devices table gets updated but the device
    is still listed under the old pool in pci_tracker.stats.pool
    (in-memory object).

    This patch looks for device type updates in existing devices
    and updates the pools accordingly.

    Change-Id: Id4ebb06e634a612c8be4be6c678d8265e0b99730
    Closes-Bug: #1892361
    (cherry picked from commit b8695de6da56db42b83b9d9d4c330148766644be)
    (cherry picked from commit d8b8a8193b6b8228f6e7d6bde68b5ea6bb53dd8b)
    (cherry picked from commit f58399cf496566e39d11f82a61e0b47900f2eafa)
    (cherry picked from commit 8378785f995dd4bec2a5a20f7bf5946b3075120d)
    (cherry picked from commit 73e631862a81e85fdf9305f3d15b201d780c8743)
    (cherry picked from commit 1fb4cc03e315f5b4dbebc521f0d1299273c7c396)

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I've uploaded this fix to both rocky-staging and the bionic unapproved queue: https://launchpad.net/ubuntu/bionic/+queue?queue_state=1&queue_text=nova

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers