[rfe] Off-path SmartNIC Port Binding with OVN

Bug #1932154 reported by Frode Nordahl
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Unassigned

Bug Description

I just realized that we have submitted a spec [0] without following the Neutron RFE process [1].

I'll repeat the summary here:

Off-path SmartNIC DPUs introduce an architecture change where network
agents responsible for NIC switch configuration and representor
interface plugging run on a separate SoC with its own CPU, memory and
that runs a separate OS kernel. The side-effect of that is that
hypervisor hostnames no longer match SmartNIC DPU hostnames which are
seen by ovs-vswitchd and OVN agents while the existing port binding
code relies on that. The goal of this specification is to introduce
changes necessary to extend the existing hardware offload code to cope
with the hostname mismatch and related design challenges while reusing
the rest of the code. To do that, PCI(e) add-in card tracking is
introduced for boards with unique serial numbers so that it can be used
to determine the correct hostname of a SmartNIC DPU which is responsible
for a particular VF. Additionally, more information is suggested to be
passed in the "binding:profile" during a port update to facilitate
representor port plugging.

We would be happy to attend the Neutron drivers meeting to discuss the details.

0: https://review.opendev.org/c/openstack/neutron-specs/+/788821
1: https://docs.openstack.org/neutron/latest/contributor/policies/blueprints.html#neutron-request-for-feature-enhancements

Nova spec: https://review.opendev.org/c/openstack/nova-specs/+/787458
Nova BP: https://blueprints.launchpad.net/nova/+spec/integration-with-off-path-network-backends

Tags: rfe-approved
tags: added: rfe
Revision history for this message
Brian Haley (brian-haley) wrote :

I won't be able to attend the drivers meeting tomorrow, but after reading the Nova spec I had a question (actually I think Sean raised the question). You mentioned ovn-controller, are there requirements on core OVN here? Thanks.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

At the core of the challenge is a requirement to support SmartNIC DPU
deployments with cards in restricted mode. In this mode we treat
the hypervisors as untrusted components which disallows direct
communication between Nova running on the host and components running
on the SmartNIC control plane CPUs.

Due to how these cards are laid out, information such as name of
representor ports are also not available from the host side, restricted
mode or not. So a component running on the SmartNIC control plane CPU
is required to discover names of representor ports and to perform
plugging on instance creation.

My understanding of the direction of the Neutron OVN driver is that the
traditional AMQP RPC and agent topology is being phased out, replaced by
native OVN functionality.

Following this theme, we are proposing to coordinate representor port
plugging through the OVN database and have the ovn-controller, or some
other OVN service, do the plugging. There is an RFC patch up for
discussion [0][1][2] and we have a target to complete that work for
OVN 21.09 if we reach agreement.

0: https://patchwork.ozlabs<email address hidden>/
1: https://mail.openvswitch.org/pipermail/ovs-dev/2021-May/382837.html
2: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/383727.html

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Let's discuss that during next drivers meeting on Friday.

tags: added: rfe-triaged
removed: rfe
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Great, thank you. For completeness there has been more movement in the discussion with the OVN team [3].

3: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385047.html

summary: - [rfe] Off-path SmartNIC Port Binding
+ [rfe] Off-path SmartNIC Port Binding with OVN
Revision history for this message
Miguel Lavalle (minsel) wrote :

This RFE was approved during today's drivers meeting. We will continue the conversation in the spec.

tags: added: rfe-approved
removed: rfe-triaged
Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-specs (master)

Reviewed: https://review.opendev.org/c/openstack/neutron-specs/+/788821
Committed: https://opendev.org/openstack/neutron-specs/commit/8ff7a77df9ec4fb3b5cb21ca42fc302b855a63fb
Submitter: "Zuul (22348)"
Branch: master

commit 8ff7a77df9ec4fb3b5cb21ca42fc302b855a63fb
Author: Dmitrii Shcherbakov <email address hidden>
Date: Thu Apr 29 21:35:04 2021 +0300

    Off-path SmartNIC Port Binding with OVN

    https://blueprints.launchpad.net/neutron/+spec/off-path-smartnic-dpu-port-binding-with-ovn

    Off-path SmartNIC DPUs introduce an architecture change where network
    agents responsible for NIC switch configuration and representor
    interface plugging run on a separate SoC with its own CPU, memory and
    that runs a separate OS kernel. The side-effect of that is that
    hypervisor hostnames no longer match SmartNIC DPU hostnames which are
    seen by ovs-vswitchd and OVN agents while the existing port binding
    code relies on that. The goal of this specification is to introduce
    changes necessary to extend the existing hardware offload code to cope
    with the hostname mismatch and related design challenges while reusing
    the rest of the code. To do that, PCI(e) add-in card tracking is
    introduced for boards with unique serial numbers so that it can be used
    to determine the correct hostname of a SmartNIC DPU which is responsible
    for a particular VF. Additionally, more information is suggested to be
    passed in the "binding:profile" during a port update to facilitate
    representor port plugging.

    WIP code: https://review.opendev.org/c/openstack/neutron/+/808961
    Nova spec: https://review.opendev.org/c/openstack/nova-specs/+/787458
    Nova BP: https://blueprints.launchpad.net/nova/+spec/integration-with-off-path-network-backends

    Needed-By: I07ef52769da72cde8867f996111b7df4a80e4d79
    Change-Id: Ic8db22d1b6570f68bd6400ecc653dc893a4b6184
    Closes-Bug: #1932154

Changed in neutron:
status: In Progress → Fix Released
Frode Nordahl (fnordahl)
Changed in neutron:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/826099

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/826099
Committed: https://opendev.org/openstack/neutron/commit/1a7f4d5b038e396578644c74a120048f77595a47
Submitter: "Zuul (22348)"
Branch: master

commit 1a7f4d5b038e396578644c74a120048f77595a47
Author: Frode Nordahl <email address hidden>
Date: Mon Jan 24 11:09:41 2022 +0100

    [OVN] Add unit test for binding profile validation

    The ``validate_and_get_data_from_binding_profile`` helper function
    in ``neutron.common.ovn.utils`` does currently not have unit tests.

    To be able to safely modify this function in a upcoming patch we
    add unit tests separately.

    Partial-Bug: #1932154
    Change-Id: I1a5f705064f90f422fc0ca971d79135ac3ccfc9f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/828103

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron-lib/+/828174

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-specs (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-specs (master)

Change abandoned by "Frode Nordahl <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron-specs/+/828226
Reason: Duplicate of https://review.opendev.org/c/openstack/neutron-specs/+/828173

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Frode Nordahl <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron-specs/+/828225
Reason: Duplicate of https://review.opendev.org/c/openstack/neutron-specs/+/828173

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-specs (master)

Reviewed: https://review.opendev.org/c/openstack/neutron-specs/+/828173
Committed: https://opendev.org/openstack/neutron-specs/commit/7795bbd62ade1dbd24eeb74001ec4afeb14ac56f
Submitter: "Zuul (22348)"
Branch: master

commit 7795bbd62ade1dbd24eeb74001ec4afeb14ac56f
Author: Dmitrii Shcherbakov <email address hidden>
Date: Mon Feb 7 20:16:17 2022 +0300

    Use VNIC_REMOTE_MANAGED instead of VNIC_SMARTNIC

    After a round of reviews of Nova patches that utilized VNIC_SMARTNIC for
    the off-path backend spec

    https://review.opendev.org/q/topic:2021-09-10-off-path-net-backends-dep

    it was determined that the Ironic's usage of VNIC_SMARTNIC would be
    affected by how the decision is made whether a port is remote-managed or
    not during the resource request creation.

    https://review.opendev.org/c/openstack/nova/+/824835/13/nova/network/neutron.py#2325

    See the following log for the relevant discussion:

    https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2022-02-07.log.html#t2022-02-07T15:49:56

    Therefore, this change introduces a spec update to use a new VNIC
    type.

    Nova spec: https://review.opendev.org/c/openstack/nova-specs/+/787458
    Nova BP: https://blueprints.launchpad.net/nova/+spec/integration-with-off-path-network-backends

    Partial-Bug: #1932154
    Change-Id: I63f156c5bfb5a41e5ebf94dc8f069828569e270a

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (master)

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/828174
Committed: https://opendev.org/openstack/neutron-lib/commit/18a69117202080a56a39e32b130e3813c344b9f2
Submitter: "Zuul (22348)"
Branch: master

commit 18a69117202080a56a39e32b130e3813c344b9f2
Author: Dmitrii Shcherbakov <email address hidden>
Date: Mon Feb 7 20:32:58 2022 +0300

    Add VNIC_REMOTE_MANAGED for off-path backends

    Added a new VNIC type for remote-managed ports of off-path networking
    backends.

    Related-Bug: #1932154
    Change-Id: I496db96ea40da3bee5b81bcee1edc79e1f46b541

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/829210

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Dmitrii Shcherbakov <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/829210
Reason: Somehow managed to generate another Change-Id which was not intentional. Abandoning in favor of the old change.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/818420
Committed: https://opendev.org/openstack/neutron/commit/1f970697906e91f932d3dec7fd8e5197bf81893c
Submitter: "Zuul (22348)"
Branch: master

commit 1f970697906e91f932d3dec7fd8e5197bf81893c
Author: Frode Nordahl <email address hidden>
Date: Thu Nov 18 14:31:58 2021 +0100

    [OVN] Extend port binding parameter validation

    To allow for validating multiple port binding parameter sets that
    may contain some of the same keys, as well as validating
    polymorphic values, the ``OVN_PORT_BINDING_PROFILE_PARAMS`` constant
    is extended to list a new ``OVNPortBindingProfileParamSet`` named
    tuple where you can specify for which vnic_type and capability
    each parameter set is valid for.

    Partial-Bug: #1932154
    Change-Id: I8493cba28e92b9b36bdb952c9737c0fea6fb7b75

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-specs (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-specs (master)

Reviewed: https://review.opendev.org/c/openstack/neutron-specs/+/829715
Committed: https://opendev.org/openstack/neutron-specs/commit/be44496b705f6ce8b7e43aa0fa6ddf35c7408e66
Submitter: "Zuul (22348)"
Branch: master

commit be44496b705f6ce8b7e43aa0fa6ddf35c7408e66
Author: Frode Nordahl <email address hidden>
Date: Thu Feb 17 14:30:00 2022 +0100

    smartnic-dpu: Update implementation details

    During the review of the Nova and Neutron implementations the
    wording used changed from "board serial number" to "card serial
    number".

    Partial-Bug: #1932154
    Change-Id: Ib342351cad3ff1cd46016c1fcfe05e05bf92bf2b

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/808961
Committed: https://opendev.org/openstack/neutron/commit/7d64d0c116cf3f5ec2a35fb2ffe08dde61343ad8
Submitter: "Zuul (22348)"
Branch: master

commit 7d64d0c116cf3f5ec2a35fb2ffe08dde61343ad8
Author: Frode Nordahl <email address hidden>
Date: Tue Sep 14 14:33:29 2021 +0200

    [OVN] Off-path SmartNIC DPU Port Binding with OVN

    Traditionally it has been the CMSs, in OpenStacks case Nova's,
    responsibility to create Virtual Interfaces (VIFs) as part of
    instance life cycle, and subsequently manage plug/unplug operations
    on the Open vSwitch integration bridge.

    With the advent of SmartNIC DPUs which are connected to multiple
    distinct CPUs we can have a topology where the instance runs on one
    host and Open vSwitch and OVN runs on a different host, the
    SmartNIC DPU control plane CPU.

    One of the main use cases for having this topology is security
    where we treat the hypervisor host as untrusted and prohibit
    direct communication between the hypervisor host and the SmartNIC
    DPU control plane host. In addition to that control facilities
    such as switchdev devices are only visible from the SmartNIC DPU
    control plane CPUs.

    Adds support for binding ports of type VNIC_REMOTE_MANAGED by
    looking up chassis based on serial number that Nova provides in
    the binding_profile.

    Information required by the OVN controller to successfully look up
    and plug representor port is provided as options on the LSP as
    defined by the representor plug provider documentation [0][1].

    0: https://docs.ovn.org/en/stable/topics/vif-plug-providers/vif-plug-providers.html
    1: https://github.com/ovn-org/ovn-vif/blob/main/Documentation/topics/vif-plug-providers/vif-plug-representor.rst
    Partial-Bug: #1932154
    Depends-On: I496db96ea40da3bee5b81bcee1edc79e1f46b541
    Depends-On: I83a128a260acdd8bf78fede566af6881b8b82a9c
    Change-Id: Icc6c2d0f7f8f5cc94997db6244175a8e8884789f

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/neutron/+/828103
Committed: https://opendev.org/openstack/neutron/commit/0c9c60d5ca53bd415fac76ef8aa5ae17f92704e7
Submitter: "Zuul (22348)"
Branch: master

commit 0c9c60d5ca53bd415fac76ef8aa5ae17f92704e7
Author: Frode Nordahl <email address hidden>
Date: Mon Feb 7 11:27:21 2022 +0100

    [OVN] Off-path SmartNIC DPU Documentation

    Closes-Bug: #1932154
    Co-Authored-By: Dmitrii Shcherbakov <email address hidden>
    Change-Id: I5b5e7957cfe8020001777fd40e038eaafb5fb894

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.0.0.0rc1

This issue was fixed in the openstack/neutron 20.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.