Configuration for SR-IOV network device plugin has changed in upstream

Bug #1835020 reported by ChenjieXu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Steven Webster

Bug Description

Brief Description
-----------------
The configuration for SR-IOV network device plugin has changed as following:
https://github.com/intel/sriov-network-device-plugin#configurations
For now StarlingX uses its own docker image and doesn’t need to change the configuration. But in the future, this will be a bug when StarlingX updates the docker image to newer version. Because the old version configuration is not supported by latest code.

Severity
--------
Major

Steps to Reproduce
------------------
1.Set up StarlingX AIO Simplex
2. assign the sriovdp label
   system host-lock controller-0
   system host-label-assign controller-0 sriovdp=enabled
   system host-unlock controller-0
   vi /etc/pcidp/config.json

Expected Behavior
------------------
The configuration file has changed in upstream as following:
{
    "resourceList": [{
            "resourceName": "intel_sriov_netdevice",
            "selectors": {
                "vendors": ["8086"],
                "devices": ["154c", "10ed"],
                "drivers": ["i40evf", "ixgbevf"]
            }
        },
        {
            "resourceName": "intel_sriov_dpdk",
            "selectors": {
                "vendors": ["8086"],
                "devices": ["154c", "10ed"],
                "drivers": ["vfio-pci"],
                "pfNames": ["enp0s0f0","enp2s2f1"]
            }
        },
        {
            "resourceName": "mlnx_sriov_rdma",
            "isRdma": true,
            "selectors": {
                "vendors": ["15b3"],
                "devices": ["1018"],
                "drivers": ["mlx5_ib"]
            }
        }
    ]
}

Actual Behavior
----------------
The configuration file in StarlingX is still old version:
{
  "resourceList": [
    {
      "resourceName": "pci_sriov_net_physnet1",
      "rootDevices": [
        "0000:41:00.0"
      ],
      "sriovMode": true,
      "deviceType": "vfio"
    }
  ]
}

Reproducibility
---------------
100%

System Configuration
--------------------
AIO Simplex

Branch/Pull Time/Commit
-----------------------
stx master as of 20190611T160613Z

Last Pass
---------
No

Timestamp/Logs
--------------
{
  "resourceList": [
    {
      "resourceName": "pci_sriov_net_physnet1",
      "rootDevices": [
        "0000:41:00.0"
      ],
      "sriovMode": true,
      "deviceType": "vfio"
    }
  ]
}

Test Activity
-------------
Developer Testing

Revision history for this message
Ghada Khalil (gkhalil) wrote :

At some point, we will discuss whether we should move to a new version of the sriov device plugin. But this is not considered stx.2.0 gating as the current image included in stx is working.

tags: added: stx.networking
Changed in starlingx:
importance: Undecided → Low
status: New → Triaged
assignee: nobody → Steven Webster (swebster-wr)
Ghada Khalil (gkhalil)
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

We'll look at picking up a new sriov device plugin for stx.3.0. LP has been tagged accordingly.

tags: added: stx.3.0
Changed in starlingx:
importance: Low → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/685498

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/685499

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/685500

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/685498
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=dac417bd31ed36d455e94db4aabe5916367654d4
Submitter: Zuul
Branch: master

commit dac417bd31ed36d455e94db4aabe5916367654d4
Author: Steven Webster <email address hidden>
Date: Wed Sep 25 15:02:09 2019 -0500

    Uprev SR-IOV CNI and device plugin image base

    Currently, StarlingX uses a version of the SR-IOV CNI and device
    plugin container images that are based on a certain commit reference.
    This is done to ensure reliable and predicable behaviour until the
    images can be locked down on a stable release version.

    It is desirable to move to a later version of these images for
    a couple of reasons (aside from bug fixes, etc):

    1. The SR-IOV CNI image now uses an alpine base, rather than
       a Redhat base.
    2. The SR-IOV device plugin allows a DPDK enabled pod with
       Mellanox NICs to run unprivileged.

    This commit moves the image base forward.

    Testing has been performed with netdevice and DPDK based
    pod applications with various combinations of the following
    devices:

    Mellanox MT27700 Family [ConnectX-4]
    Intel 82599ES 10-Gigabit SFI/SFP+ Network Connection
    Intel Ethernet Controller X710 for 10GbE SFP+

    Change-Id: Ia74e135b3e4b1a00465d4a8fd0b4650efdcfd2c5
    Closes-Bug: 1843963
    Closes-Bug: 1835020
    Signed-off-by: Steven Webster <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/685499
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=be63b5f7907a69dbe6677c3e716e0ba121cbeb32
Submitter: Zuul
Branch: master

commit be63b5f7907a69dbe6677c3e716e0ba121cbeb32
Author: Steven Webster <email address hidden>
Date: Mon Sep 23 14:17:16 2019 -0500

    Align SR-IOV CNI templates with master

    Currently, StarlingX uses a version of the SR-IOV CNI and device
    plugin container images that are based on a certain commit reference.
    This is done to ensure reliable and predicable behaviour until the
    images can be locked down on a stable release version.

    This commit aligns the ansible templates with the latest version
    of these plugins, as they are being updated to pull in various
    bug fixes and features.

    Change-Id: If5330f794ded901cd88aa9cf2d6e13cbfbb5f062
    Depends-on: https://review.opendev.org/685498
    Partial-Bug: 1835020
    Signed-off-by: Steven Webster <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/685500
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=919d1c98fe40e9e043cb59c9cb33125788951932
Submitter: Zuul
Branch: master

commit 919d1c98fe40e9e043cb59c9cb33125788951932
Author: Steven Webster <email address hidden>
Date: Mon Sep 23 14:27:48 2019 -0500

    Align with latest SR-IOV CNI and device plugin images

    Currently, StarlingX uses a version of the SR-IOV CNI and device
    plugin container images that are based on a certain commit reference.
    This is done to ensure reliable and predicable behaviour until the
    images can be locked down on a stable release version.

    It is desirable to move to a later version of these images for
    a couple of reasons (aside from bug fixes, etc):

    1. The SR-IOV CNI image now uses an alpine base, rather than
       a Redhat base.
    2. The SR-IOV device plugin allows a DPDK enabled pod with
       Mellanox NICs to run unprivileged.

    This commit makes the necessary changes to handle the change
    to the SR-IOV device plugin configuration file.

    An example of the new format is as follows:

    {
        "resourceList": [{
                "resourceName": "pci_sriov_net_network1",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["154c", "10ed"],
                    "drivers": ["i40evf", "ixgbevf"],
      "pfNames": ["ens802f0", "ens801f0"]
                }
        }]
    }

    As such, it can be seen that the vendor and device ids must be
    read, rather than the old format which just needed the PCI root
    device address(es). To achieve this, the sysinv agent must
    read and store any SR-IOV virtual function device ids. The
    vendor id can be read from the existing pvendor id field of the
    port with the addition of the -nn flag to the lspci command.

    Change-Id: Id73eaa1cf8ff39643c113d4787417f3b44b1560f
    Depends-on: https://review.opendev.org/685498
    Partial-Bug: 1835020
    Signed-off-by: Steven Webster <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.