ML2 OVN - Creating an instance with hardware offloaded port is broken

Bug #1975743 reported by Itai Levy
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
neutron
Fix Released
Medium
Frode Nordahl

Bug Description

OpenStack Release: Yoga
Platform: Ubuntu focal

Creating an instance with vnic-type ‘direct’ port and ‘switchdev’ binding-profile is failing over the following validation error:
```
2022-05-25 19:13:40.331 125269 DEBUG neutron.api.v2.base [req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Request body: {'port': {'device_id': 'd46aef48-e42e-49c8-af9f-a83768747b4f', 'device_owner': 'compute:nova', 'binding:profile': {'capabilities': ['switchdev'], 'pci_vendor_info': '15b3:101e', 'pci_slot': '0000:08:03.2', 'physical_network': None, 'card_serial_number': 'MT2034X11488', 'pf_mac_address': '04:3f:72:9e:0b:a1', 'vf_num': 7}, 'binding:host_id': 'node3.maas', 'dns_name': 'vm1'}} prepare_request_body /usr/lib/python3/dist-packages/neutron/api/v2/base.py:729

2022-05-25 19:13:40.429 125269 DEBUG neutron_lib.callbacks.manager [req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Publish callbacks ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-1311372', 'neutron.services.ovn_l3.plugin.OVNL3RouterPlugin._port_update-8735219071964'] for port (0f1e4e9c-68ef-4b38-a3bc-68e624bca6c7), before_update _notify_loop /usr/lib/python3/dist-packages/neutron_lib/callbacks/manager.py:176
2022-05-25 19:13:41.221 125269 DEBUG neutron.notifiers.nova [req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Ignoring state change previous_port_status: DOWN current_port_status: DOWN port_id 0f1e4e9c-68ef-4b38-a3bc-68e624bca6c7 record_port_status_changed /usr/lib/python3/dist-packages/neutron/notifiers/nova.py:233
2022-05-25 19:13:41.229 125269 DEBUG neutron_lib.callbacks.manager [req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Publish callbacks [] for port (0f1e4e9c-68ef-4b38-a3bc-68e624bca6c7), precommit_update _notify_loop /usr/lib/python3/dist-packages/neutron_lib/callbacks/manager.py:176

2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers [req-504a0204-6f1a-46ae-8b95-dcfdf2692f91 b2a31335e63b4dd391cc3e6bf4600fe1 - - 654b9b803e6a4a68b31676c16973e3cc 654b9b803e6a4a68b31676c16973e3cc] Mechanism driver 'ovn' failed in update_port_precommit: neutron_lib.exceptions.InvalidInput: Invalid input for operation: Invalid binding:profile. too many parameters.
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/managers.py", line 482, in _call_on_drivers
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context)
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 792, in update_port_precommit
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers ovn_utils.validate_and_get_data_from_binding_profile(port)
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/common/ovn/utils.py", line 266, in validate_and_get_data_from_binding_profile
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers raise n_exc.InvalidInput(error_message=msg)
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers neutron_lib.exceptions.InvalidInput: Invalid input for operation: Invalid binding:profile. too many parameters.
2022-05-25 19:13:41.229 125269 ERROR neutron.plugins.ml2.managers
```

Seems like the issue is related to the commit from:
https://review.opendev.org/c/openstack/neutron/+/818420

To reproduce:
https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-ovn.html

1. Prepare a setup with SR-IOV adjusted for OVN HW Offload
2. Create a port with switchdev capabilities

$ openstack port create direct_overlay2 --vnic-type=direct --network gen_data --binding-profile '{"capabilities":["switchdev"]}' --security-group my_policy

3. Create an instance

$ openstack server create --key-name bastion --flavor d1.demo --image ubuntu --port direct_overlay1 vm1 --availability-zone nova:node3.maas

Revision history for this message
Frode Nordahl (fnordahl) wrote (last edit ):

Itai, thank you for reporting this bug.

The Neutron OVN driver does strict validation of the binding profile. As part of adding support for SmartNIC DPUs the validation was extended to handle both the existing hardware offload vnic-type direct + capabilities switchdev workflow as well as the new SmartNIC DPU vnic-type remote-managed workflow.

What's happening here is that Neutron does not expect Nova to provide the 'card_serial_number', 'pf_mac_address' and 'vf_num' keys in the binding profile for the vnic-type direct + capabilities switchdev workflow, and rejects the request.

The key/value pairs appear to be added whenever a VF from a card with a serial number in the VPD is used, if the card does not have a serial in the VPD the key/value pairs are not provided.

This is problematic because there exist cards that do not provide this information, and cards that do provide the information depending on which firmware version is in use.

The Neutron validation code does currently not have a concept of a optional key in the binding profile, and since the information is not required for the vnic direct + capabilities switchdev workflow I'm inclined to think Nova should refrain from providing it in this case.

To unblock you while we figure out how to solve this properly you could apply this patch [0] to your neutron-api units.

0: https://pastebin.ubuntu.com/p/3dsHX4rHdT/

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Nova (at the point when Yoga was released) extends the set of parameters for a PCI device that get placed into binding:profile in _get_vf_pci_device_profile which doesn't have the information about a VNIC type of a port:

https://github.com/openstack/nova/blob/ffb810e2ba2fdec9b2a881a88fa6d65cd32f8fa3/nova/network/neutron.py#L1596-L1605

So it just adds those parameters unconditionally for VFs.

As of commit 2234b179b5202d4d609c1df2c9c999656ca12378 after the initial Yoga release we also make sure that either all additional PCI-related parameters are present or none of them are:

https://github.com/openstack/nova/commit/2234b179b5202d4d609c1df2c9c999656ca12378#diff-ea2a329415c3b1dac196856801202b5a31cb5cf5f588e9e5dbdc031370a63791R1578
        if all((pf_mac, vf_num, card_serial_number)):
            vf_profile.update({

So the challenge here is that it adds some variability to what Neutron validation needs to do.

From the API contract point of view, there is no specific format and the way "binding:profile" contents get interpreted is up to a networking backend:

https://docs.openstack.org/api-ref/network/v2/index.html?expanded=create-port-detail#id72
"binding:profile (Optional)
A dictionary that enables the application running on the specific host to pass and receive vif port information specific to the networking back-end. The networking API does not define a specific format of this field. The default is an empty dictionary. If you update it with null then it is treated like {} in the response."

I would expect a backend to ignore extra parameters that it does not know about rather than raising an error to be honest. If a valid subset of keys in "binding:profile" for a particular VNIC type is present and the values are OK, I think the validation should pass.

Frode Nordahl (fnordahl)
Changed in neutron:
status: New → Confirmed
Changed in nova:
status: New → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/843601

Changed in neutron:
status: Confirmed → In Progress
Frode Nordahl (fnordahl)
Changed in neutron:
assignee: nobody → Frode Nordahl (fnordahl)
Changed in neutron:
importance: Undecided → Medium
tags: added: ovn
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/843601
Committed: https://opendev.org/openstack/neutron/commit/8a9ffcb0d4c0c44fc226810fd7f08dab9b79fcc0
Submitter: "Zuul (22348)"
Branch: master

commit 8a9ffcb0d4c0c44fc226810fd7f08dab9b79fcc0
Author: Frode Nordahl <email address hidden>
Date: Fri May 27 11:58:04 2022 +0200

    [OVN] Make binding profile validation more robust

    The main purpose of Neutron's validation of the binding profile
    is to make sure expected keys are present and that their values
    are of the expected type.

    The Nova compute component updates the binding profile as part of
    instance creation. Depending on the version of the Nova compute
    component and which hardware it interfaces with, the information
    provided by Nova in the binding profile may differ.

    Nova also has limited information at its disposal at the point in
    time it updates the port binding profile, so it would be
    non-trivial for it to provide information conditionally based on
    things like VNIC_TYPE and existing binding profile data.

    Make the Neutron binding profile validation more robust for both
    upgrade and heterogeneous hardware scenarios by accepting the
    presence of surplus keys in the binding profile. The data that
    Neutron expects will still be validated and any surplus keys will
    be pruned before further processing internally in Neutron.

    Closes-Bug: #1975743
    Change-Id: I3a91f442a1fd72f9027f10f2b1b6572cee3f8360

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/845390

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845390
Committed: https://opendev.org/openstack/neutron/commit/f0276fa31403cb52817c2a883b2213e38f2cc77b
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit f0276fa31403cb52817c2a883b2213e38f2cc77b
Author: Frode Nordahl <email address hidden>
Date: Fri May 27 11:58:04 2022 +0200

    [OVN] Make binding profile validation more robust

    The main purpose of Neutron's validation of the binding profile
    is to make sure expected keys are present and that their values
    are of the expected type.

    The Nova compute component updates the binding profile as part of
    instance creation. Depending on the version of the Nova compute
    component and which hardware it interfaces with, the information
    provided by Nova in the binding profile may differ.

    Nova also has limited information at its disposal at the point in
    time it updates the port binding profile, so it would be
    non-trivial for it to provide information conditionally based on
    things like VNIC_TYPE and existing binding profile data.

    Make the Neutron binding profile validation more robust for both
    upgrade and heterogeneous hardware scenarios by accepting the
    presence of surplus keys in the binding profile. The data that
    Neutron expects will still be validated and any surplus keys will
    be pruned before further processing internally in Neutron.

    Closes-Bug: #1975743
    Change-Id: I3a91f442a1fd72f9027f10f2b1b6572cee3f8360
    (cherry picked from commit 8a9ffcb0d4c0c44fc226810fd7f08dab9b79fcc0)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.2.0

This issue was fixed in the openstack/neutron 20.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.0.0.0rc1

This issue was fixed in the openstack/neutron 21.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.