SRIOV port binding_profile attributes for OVS hardware offload are stripped on instance deletion or port detachment

Bug #2008238 reported by Stig Telfer
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Unassigned

Bug Description

Description
===========

This issue applies for systems using SRIOV with Mellanox ASAP2 SDN offloads.

An SRIOV port capable for ASAP2 SDN acceleration (OVS hardware offloads) has 'capabilities=[switchdev]' added to the port binding_profile.

After a VM has been created with SRIOV port attached, the port can no longer be used for subsequent VM builds. Attempt to reuse the port results in an error of the form "Cannot set interface MAC/vlanid to <mac>/<vlan> for ifname ens1f0 vf 7: Operation not supported"

The underlying issue appears to be that when an SRIOV port is detached from a VM, or the VM is destroyed, the capabilities=[switchdev] property is removed from the port binding_profile. This converts the port from ASAP2 to “Legacy SRIOV” (in Mellanox-speak) and makes it unusable.

If the port binding_profile property is restored then the port can be successfully reused.

The property is preserved during live migration, instance resizes and rebuilds. It only appears to be instance depletion or port detachment where the binding_profile property is removed.

Steps to reproduce
==================

1. Create SRIOV port with ASAP2 capability:

openstack port create --project <project> --network <network> --vnic-type=direct --binding-profile '{"capabilities": ["switchdev"]}' sriov-port-1

2. Check the port binding_profile property:

openstack port show -c binding_profile sriov-port-1

3. Create an instance using the port:

openstack server create --flavor <flavor> --image <image> --key-name <key> --nic port-id=sriov-port-1 sriov-vm-1

4. Delete the instance:

openstack server delete sriov-vm-1

5. Check the port binding_profile property:

openstack port show -c binding_profile sriov-port-1

Expected Result
===============

Nova sets properties in the binding_profile while the instance is in use. Alongside those properties the capabilities='['switchdev']' property should be preserved.

Actual Result
=============

After the instance is deleted (or port detached), the binding_profile is empty.

Environment
===========

This has been observed with the following configuration:

- OpenStack Yoga
- OVN Neutron driver

Logs
====

From Nova Compute:

2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest Traceback (most recent call last):
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 165, in launch
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest return self._domain.createWithFlags(flags)
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest result = proxy_call(self._autowrap, f, *args, **kwargs)
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest rv = execute(f, *args, **kwargs)
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest six.reraise(c, e, tb)
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest raise value
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest rv = meth(*args, **kwargs)
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1385, in createWithFlags
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest raise libvirtError('virDomainCreateWithFlags() failed')
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest libvirt.libvirtError: Cannot set interface MAC/vlanid to fa:16:3e:43:1e:ce/2107 for ifname ens1f0 vf 7: Operation not supported
2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest
2023-01-24 19:55:32.273 7 ERROR nova.virt.libvirt.driver [req-581cd9e8-11c8-44be-9ed2-a03a5f70d0f4 802a31d98b364da79be43fe6e9566d63 76f401abee7b4e80b7efd86f2f26e3ca - default default] [instance: d2091824-1f7a-4de1-8776-8f781956130a] Failed to start libvirt guest: libvirt.libvirtError: Cannot set interface MAC/vlanid to fa:16:3e:43:1e:ce/2107 for ifname ens1f0 vf 7: Operation not supported

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I agree that nova should not manipulate keys in the binding_profile that is not added by nova in the first place.
I looked through the yoga neutron code path and I don't see where the binding_profile is manipulated improperly. What I see that nova has a list of keys to manipulate and capabilities is not one of them.
I also tried on master with a normal port and added capabilities to the binding_profile then booted with the port and then deleted the VM. The capabilites I added remained in the port.

So I need help. Could you reproduce the lost capability with a normal port?

Changed in nova:
status: New → Incomplete
Revision history for this message
Stig Telfer (stigtelfer) wrote :

I have tested this again on another (Yoga) system and can reproduce the issue, and also with non-SRIOV ports.

As admin user:

openstack port create --project stackhpc --network external-internet --binding-profile '{"capabilities": ["switchdev"]}' sriov-port-3

As normal user:

openstack server create --key-name mykey --flavor myflavor --image Rocky9 --nic port-id=sriov-port-3 myvm

openstack server delete myvm

As admin user:

openstack port show sriov-port-3

For me, the binding_profile field is cleared when the instance is deleted.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/884439

Changed in nova:
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/884439
Committed: https://opendev.org/openstack/nova/commit/cef3b5ef2cc1fe983578e4966208cf95fdea5880
Submitter: "Zuul (22348)"
Branch: master

commit cef3b5ef2cc1fe983578e4966208cf95fdea5880
Author: Alexey Stupnikov <email address hidden>
Date: Thu May 25 21:23:32 2023 +0200

    Translate VF network capabilities to port binding

    Libvirt's node device driver accumulates and reports information
    about host devices. Network capabilities reported by node device
    driver for NIC contain information about HW offloads supported
    by this NIC.

    One of possible features reported by node device driver is
    switchdev: a NIC capability to implement VFs similar to actual
    HW switch ports (also referred to as SR-IOV OVS hardware offload).
    From Neutron perspective, vnic-type should be set to "direct" and
    "switchdev" capability should be added to port binding profile to
    enable HW offload (there are also configuration steps on compute
    hosts to tune NIC config).

    This patch was written to automatically translate "switchdev" from
    VF network capabilities reported by node device driver to Neutron
    port binding profile and allow user to skip manual step that
    requires admin privileges.

    Other capabilities are also translated: they are not used right
    now, but provide visibility and can be utilized later.

    Closes-bug: #2020813
    Closes-bug: #2008238
    Change-Id: I3b17f386325b8f42c0c374f766fb21c520161a59

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/nova/+/898945

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/nova/+/899225

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/nova/+/899229

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/899254

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/nova/+/899225
Committed: https://opendev.org/openstack/nova/commit/7e4f45df91f33fa8b75feec95e5636db06fda443
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 7e4f45df91f33fa8b75feec95e5636db06fda443
Author: Alexey Stupnikov <email address hidden>
Date: Thu May 25 21:23:32 2023 +0200

    Translate VF network capabilities to port binding

    Libvirt's node device driver accumulates and reports information
    about host devices. Network capabilities reported by node device
    driver for NIC contain information about HW offloads supported
    by this NIC.

    One of possible features reported by node device driver is
    switchdev: a NIC capability to implement VFs similar to actual
    HW switch ports (also referred to as SR-IOV OVS hardware offload).
    From Neutron perspective, vnic-type should be set to "direct" and
    "switchdev" capability should be added to port binding profile to
    enable HW offload (there are also configuration steps on compute
    hosts to tune NIC config).

    This patch was written to automatically translate "switchdev" from
    VF network capabilities reported by node device driver to Neutron
    port binding profile and allow user to skip manual step that
    requires admin privileges.

    Other capabilities are also translated: they are not used right
    now, but provide visibility and can be utilized later.

    Closes-bug: #2020813
    Closes-bug: #2008238
    Change-Id: I3b17f386325b8f42c0c374f766fb21c520161a59
    (cherry picked from commit cef3b5ef2cc1fe983578e4966208cf95fdea5880)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/nova/+/898945
Committed: https://opendev.org/openstack/nova/commit/4fcc8c369f2c580f86dbfc6b1f812516f80262c0
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 4fcc8c369f2c580f86dbfc6b1f812516f80262c0
Author: Alexey Stupnikov <email address hidden>
Date: Thu May 25 21:23:32 2023 +0200

    Translate VF network capabilities to port binding

    Libvirt's node device driver accumulates and reports information
    about host devices. Network capabilities reported by node device
    driver for NIC contain information about HW offloads supported
    by this NIC.

    One of possible features reported by node device driver is
    switchdev: a NIC capability to implement VFs similar to actual
    HW switch ports (also referred to as SR-IOV OVS hardware offload).
    From Neutron perspective, vnic-type should be set to "direct" and
    "switchdev" capability should be added to port binding profile to
    enable HW offload (there are also configuration steps on compute
    hosts to tune NIC config).

    This patch was written to automatically translate "switchdev" from
    VF network capabilities reported by node device driver to Neutron
    port binding profile and allow user to skip manual step that
    requires admin privileges.

    Other capabilities are also translated: they are not used right
    now, but provide visibility and can be utilized later.

    Closes-bug: #2020813
    Closes-bug: #2008238
    Change-Id: I3b17f386325b8f42c0c374f766fb21c520161a59
    (cherry picked from commit cef3b5ef2cc1fe983578e4966208cf95fdea5880)
    (cherry picked from commit 7e4f45df91f33fa8b75feec95e5636db06fda443)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 28.0.1

This issue was fixed in the openstack/nova 28.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 27.2.0

This issue was fixed in the openstack/nova 27.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/nova/+/899229
Committed: https://opendev.org/openstack/nova/commit/c36e0db95749395d5915b366fe6d36f516151c1a
Submitter: "Zuul (22348)"
Branch: stable/zed

commit c36e0db95749395d5915b366fe6d36f516151c1a
Author: Alexey Stupnikov <email address hidden>
Date: Thu May 25 21:23:32 2023 +0200

    Translate VF network capabilities to port binding

    Libvirt's node device driver accumulates and reports information
    about host devices. Network capabilities reported by node device
    driver for NIC contain information about HW offloads supported
    by this NIC.

    One of possible features reported by node device driver is
    switchdev: a NIC capability to implement VFs similar to actual
    HW switch ports (also referred to as SR-IOV OVS hardware offload).
    From Neutron perspective, vnic-type should be set to "direct" and
    "switchdev" capability should be added to port binding profile to
    enable HW offload (there are also configuration steps on compute
    hosts to tune NIC config).

    This patch was written to automatically translate "switchdev" from
    VF network capabilities reported by node device driver to Neutron
    port binding profile and allow user to skip manual step that
    requires admin privileges.

    Other capabilities are also translated: they are not used right
    now, but provide visibility and can be utilized later.

    Closes-bug: #2020813
    Closes-bug: #2008238
    Change-Id: I3b17f386325b8f42c0c374f766fb21c520161a59
    (cherry picked from commit cef3b5ef2cc1fe983578e4966208cf95fdea5880)
    (cherry picked from commit 7e4f45df91f33fa8b75feec95e5636db06fda443)
    (cherry picked from commit 4fcc8c369f2c580f86dbfc6b1f812516f80262c0)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/yoga)

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/905440

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 26.2.1

This issue was fixed in the openstack/nova 26.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/yoga)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/905440
Reason: stable/yoga branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/yoga if you want to further work on this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/899254
Reason: stable/yoga branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/yoga if you want to further work on this patch.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Adding Neutron to this bug. A port extension could be needed to avoid writing in the port port_binding register when creating a port. That will avoid the Neutron policy restriction, that only allows to write on this field to service role.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Neutron no longer needs to provide any port_binding information for HW offloaded ports: https://review.opendev.org/c/openstack/neutron/+/898556

no longer affects: neutron
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 29.0.0.0rc1

This issue was fixed in the openstack/nova 29.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/905134
Committed: https://opendev.org/openstack/nova/commit/a1a07e0d2d01e95b7d3c8db62d149a4278617c93
Submitter: "Zuul (22348)"
Branch: master

commit a1a07e0d2d01e95b7d3c8db62d149a4278617c93
Author: Amit Uniyal <email address hidden>
Date: Tue Jan 9 15:40:27 2024 +0000

    Refactor vf profile for PCI device

    In general the card_serial_number will not be present on sriov
    VFs/PFs, it is only supported on very new cards.
    Also, all 3 need not to be always required for vf_profile.

    Related-Bug: #2008238
    Change-Id: I00b126635612ace51b5e3138afcb064f001f1901

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/zed)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/zed
Review: https://review.opendev.org/c/openstack/nova/+/905138
Reason: stable/zed branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/zed if you want to further work on this patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.