DPDK containerized applications are not working with Mellanox cards

Bug #1973361 reported by Antonio Augusto Vilas Boas Teixeira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Antonio Augusto Vilas Boas Teixeira

Bug Description

Brief Description
-----------------
When trying to run a DPDK containerized application using VFs from a Mellanox CX5 NIC assigned to a datanetwork, it fails. Upon checking the datanetwork resources in the /etc/pcidp/config.json file, the flag isRdma is missing there which is required for this scenario.

Severity
--------
Critical: System/Feature is not usable due to the defect

Steps to Reproduce
------------------
- Set an interface that uses a Mellanox port to pci-sriov class

$ system host-if-modify controller-0 data0 -c pci-sriov -n sriov0 -N 8 --vf-driver=netdevice

- Make sure the interface is assigned to a datanetwork

$ system interface-datanetwork-list controller-0
+--------------+--------------------------------------+---------+------------------+
| hostname | uuid | ifname | datanetwork_name |
+--------------+--------------------------------------+---------+------------------+
| controller-0 | bd55850c-69ef-4b1f-9a2e-c90bddc43287 | sriov0 | group0-data0 |
+--------------+--------------------------------------+---------+------------------+

- Check the /etc/pcidp/config.json file

$ sudo cat /etc/pcidp/config.json
"resourceList": [
    {
      "resourceName": "pci_sriov_net_group0_data0",
      "selectors": {
        "vendors": [
          "15b3"
        ],
        "drivers": [
          "mlx5_core"
        ],
        "devices": [
          "1018"
        ],
        "pfNames": [
          "ens802f0#0,1,2,3,4,5,6,7"
        ]
      }
    },
    ...

Expected Behavior
------------------
In the datanetwork resources the flag '"isRdma": true' should be listed.

Actual Behavior
----------------
There is no "isRdma" flag in the file, which is required so DPDK containers can work with the Mellanox NICs.

Reproducibility
---------------
Reproducible, but not in every host. If it happens in a host it is reproducible every time in this host.

System Configuration
--------------------
One node system, but should happen in every config.

Branch/Pull Time/Commit
-----------------------
Master branch as of May 2nd, 2022

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
N/A

Workaround
----------
Adding:

"isRdma": true

Under each datanetwork resource that will be used by the pod in the file /etc/pcidp/config.json. Example:

"resourceList": [
    {
      "resourceName": "pci_sriov_net_group0_data0",
      "selectors": {
        "vendors": [
          "15b3"
        ],
        "drivers": [
          "mlx5_core"
        ],
        "devices": [
          "1018"
        ],
        "pfNames": [
          "ens802f0#0,1,2,3,4,5,6,7"
        ],
        "isRdma": true
      }
    },

Changed in starlingx:
assignee: nobody → Antonio Augusto Vilas Boas Teixeira (aaugusto-wndrvr)
Ghada Khalil (gkhalil)
summary: - DPDK applications are not working with Mellanox cards
+ DPDK containerized applications are not working with Mellanox cards
tags: added: stx.7.0 stx.networking
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/841799

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/841799
Committed: https://opendev.org/starlingx/config/commit/5fdbc722e84f57e70d9af946c8ec7417fc96ab6d
Submitter: "Zuul (22348)"
Branch: master

commit 5fdbc722e84f57e70d9af946c8ec7417fc96ab6d
Author: Antonio Augusto Vilas Boas Teixeira <email address hidden>
Date: Fri May 13 15:58:59 2022 -0400

    Fix Mellanox device detection

    There are some instances where the is_a_mellanox_device() method could
    fail depending on how the driver of a port is detected. The driver
    parameter of a port can be a string of various driver names, separated
    by commas. The Mellanox detection method was not able to handle this and
    would erroneously identify a device as a non Mellanox in this scenario.

    TESTS:
    - Installed on a host using a Mellanox CX5 card
    - Set the interface using a port of the CX5 to pci-sriov class
    - Confirmed that the port driver is listed as 'mlx5_core,mlx5_core' when
      running 'system host-port-show'
    - Assigned this interface to a datanetwork
    - Confirmed that the flag 'isRdma' is now correctly appended to the
      datanetwork resource in the file /etc/pcidp/config.json

    Closes-Bug: 1973361

    Signed-off-by: Antonio Augusto Vilas Boas Teixeira
    <email address hidden>
    Change-Id: Ib12d104a61c3e78803e4387804cd53a98ee8f0ec

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/starlingx/config/+/842637
Committed: https://opendev.org/starlingx/config/commit/1fe2aca0230e2702dab362fc3bec8e848f8baa20
Submitter: "Zuul (22348)"
Branch: master

commit 1fe2aca0230e2702dab362fc3bec8e848f8baa20
Author: Antonio Augusto Vilas Boas Teixeira <email address hidden>
Date: Thu May 19 17:30:11 2022 -0400

    Fix mellanox device detection on vf interfaces

    The is_a_mellanox_device() method would not correctly identify a VF type
    interface as a Mellanox as it only checked the primary interface,
    without checking the lower interfaces the primary one used.

    This change fixes this. Now VF type interfaces will be checked
    recursively until the interface tied to the actual device ports is
    located.

    TESTS:
    - Installed on a host using a Mellanox CX5 card
    - Set the interface using a port of the CX5 to pci-sriov class
    - Created a vf type interface using the previously mentioned pci-sriov
    - Created two datanetworks and assigned each interface to one
    - Confirmed that the 'isRdma' flag is now correctly appended to the
      datanetwork resources in /etc/pcidp/config.json both for the pci-sriov
      interface as well as the vf interface
    - The same steps were reproduced in an Intel X710 NIC to validate that
      the is_a_mellanox_device() method is not yielding false positives. The
      isRdma flag was not appended for this, as expected

    Closes-Bug: 1973361

    Signed-off-by: Antonio Augusto Vilas Boas Teixeira
    <email address hidden>
    Change-Id: I232d4e7fa62b8f4200328edf66f81bbb996966d0

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.