No overlap allowed in PCI device due to change in sriov-dp version

Bug #2059960 reported by Caio Bruchert
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Caio Bruchert

Bug Description

Brief Description

We wre using multiple datanetwork in STX with single pci-sriov PF/VF interface in STX 6.0 and was able to allocate the VF's to the multiple pods. However from STX 8.0, it has been observed that the only 1 data-network can be mapped to single sriov interface PF/VF.

successfully working in STX 6.0->

created 3 datanetwork va1nw, va2nw, bnw and assigned to single sriov PF on the node.
datanetwork va1nw and va2nw assinged to POD A
datanetwork bnw assinged to POD B .
Check the steps to reproduce for the details.

Severity

Major

Steps to Reproduce

1) Configure the SRIOV in STX 6.0

[sysadmin@controller-0 ~(keystone_admin)]$ system host-if-show controller-1 ens1f1
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| ifname | ens1f1 |
| iftype | ethernet |
| ports | [u'ens1f1'] |
| imac | 40:a6:b7:66:dc:91 |
| imtu | 1500 |
| ifclass | pci-sriov |
| ptp_role | none |
| aemode | None |
| schedpolicy | None |
| txhashpolicy | None |
| primary_reselect | None |
| uuid | 78466b76-0f85-4cae-9ec9-2771f5b923f0 |
| ihost_uuid | 53a54452-8161-4c2a-a85c-6e3fc581eb58 |
| vlan_id | None |
| uses | [] |
| used_by | [] |
| created_at | |
| updated_at | |
| sriov_numvfs | 10 |
| sriov_vf_driver | vfio |
| max_tx_rate | None |
| accelerated | [True] |
+------------------+--------------------------------------+

[sysadmin@controller-0 ~(keystone_admin)]$ system datanetwork-list
+--------------------------------------+-----------+--------------+------+
| uuid | name | network_type | mtu |
+--------------------------------------+-----------+--------------+------+
| 13783888-a11a-412b-8309-4e2a61aad4bc | bnw | flat | 1500 |
| b1c48346-fb0d-4933-a737-29a6a2e8401d | datanet-1 | vlan | 1500 |
| f43ff8c7-35d8-45d3-bb77-6f776fc24fad | va1nw | flat | 1500 |
| de5075ae-2dbe-4115-bbe6-6c5a0f7aeda5 | va2nw | flat | 1500 |
+--------------------------------------+-----------+--------------+------+

[sysadmin@controller-0 ~(keystone_admin)]$ system interface-datanetwork-list controller-1
+--------------+--------------------------------------+--------+------------------+
| hostname | uuid | ifname | datanetwork_name |
+--------------+--------------------------------------+--------+------------------+
| controller-1 | 6f27b58c-67b5-46f9-a6e8-9c1c4019deda | ens1f1 | bnw |
| controller-1 | edd39d1f-a41f-450d-b25e-6572bfdd0ecf | ens1f1 | va2nw |
| controller-1 | f9a50a24-25f2-435f-8dee-ac57f9d832ee | ens1f1 | va1nw |
+--------------+--------------------------------------+--------+------------------+

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get nodes controller-1 -o custom-columns=:.status.allocatable | xargs -n1
map[cpu:104
ephemeral-storage:9391196145
hugepages-1Gi:0
hugepages-2Mi:0
intel.com/pci_sriov_net_bnw:10 -> ALL OK HERE
intel.com/pci_sriov_net_va1nw:10. -> ALL OK HERE
intel.com/pci_sriov_net_va2nw:10. --> ALL OK HERE
memory:253839864Ki
pods:110]

-> As from sriovdp pod logs it can be seen that there was no issue in creating resourcelist.

I0304 08:14:07.729729 1 manager.go:116] Creating new ResourcePool: pci_sriov_net_va1nw
I0304 08:14:07.729734 1 manager.go:117] DeviceType: netDevice
I0304 08:14:07.905484 1 factory.go:108] device added: [pciAddr: 0000:1f:06.0, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.905493 1 factory.go:108] device added: [pciAddr: 0000:1f:06.1, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.905496 1 factory.go:108] device added: [pciAddr: 0000:1f:06.2, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.905498 1 factory.go:108] device added: [pciAddr: 0000:1f:06.3, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.905502 1 factory.go:108] device added: [pciAddr: 0000:1f:06.4, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.905504 1 factory.go:108] device added: [pciAddr: 0000:1f:06.5, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.905506 1 factory.go:108] device added: [pciAddr: 0000:1f:06.6, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.905508 1 factory.go:108] device added: [pciAddr: 0000:1f:06.7, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.905513 1 factory.go:108] device added: [pciAddr: 0000:1f:07.0, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.905516 1 factory.go:108] device added: [pciAddr: 0000:1f:07.1, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.905524 1 manager.go:145] New resource server is created for pci_sriov_net_va1nw ResourcePool
I0304 08:14:07.905527 1 manager.go:115]
I0304 08:14:07.905529 1 manager.go:116] Creating new ResourcePool: pci_sriov_net_bnw
I0304 08:14:07.905531 1 manager.go:117] DeviceType: netDevice
I0304 08:14:07.908401 1 factory.go:108] device added: [pciAddr: 0000:1f:06.0, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.908409 1 factory.go:108] device added: [pciAddr: 0000:1f:06.1, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.908411 1 factory.go:108] device added: [pciAddr: 0000:1f:06.2, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.908415 1 factory.go:108] device added: [pciAddr: 0000:1f:06.3, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.908418 1 factory.go:108] device added: [pciAddr: 0000:1f:06.4, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.908421 1 factory.go:108] device added: [pciAddr: 0000:1f:06.5, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.908424 1 factory.go:108] device added: [pciAddr: 0000:1f:06.6, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.908428 1 factory.go:108] device added: [pciAddr: 0000:1f:06.7, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.908432 1 factory.go:108] device added: [pciAddr: 0000:1f:07.0, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.908437 1 factory.go:108] device added: [pciAddr: 0000:1f:07.1, vendor: 8086, device: 154c, driver: vfio-pci]
I0304 08:14:07.908444 1 manager.go:145] New resource server is created for pci_sriov_net_bnw ResourcePool

2) Create the SRIOV in STX 8.0

[sysadmin@controller-0 ~(keystone_admin)]$ system host-if-show controller-1 enp23s0f1
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| ifname | enp23s0f1 |
| iftype | ethernet |
| ports | ['enp23s0f1'] |
| imac | 64:9d:99:ff:e7:2d |
| imtu | 1500 |
| ifclass | pci-sriov |
| ptp_role | none |
| aemode | None |
| schedpolicy | None |
| txhashpolicy | None |
| primary_reselect | None |
| uuid | 9ee03852-77e4-4e7d-a67d-cb9879db38b0 |
| ihost_uuid | 9e29ae02-7c1c-4816-9850-102ac0370e47 |
| vlan_id | None |
| uses | [] |
| used_by | [] |
| created_at | |
| updated_at | |
| sriov_numvfs | 10 |
| sriov_vf_driver | vfio |
| max_tx_rate | None |
| accelerated | [False] |

[sysadmin@controller-0 ~(keystone_admin)]$ system datanetwork-list
+--------------------------------------+--------+--------------+------+
| uuid | name | network_type | mtu |
+--------------------------------------+--------+--------------+------+
| 46f062f2-6c91-4028-97c4-5e7d49a5af51 | bnw | flat | 1500 |
| 3cc0378e-5c1d-4328-8ae4-6cd16eee7ce4 | va1nw | flat | 1500 |
| e9e9102a-62f1-408c-a7ce-7d7808a48fca | va2nw | flat | 1500 |

+--------------------------------------+--------+--------------+------+

[sysadmin@controller-0 ~(keystone_admin)]$ system interface-datanetwork-list controller-1
+--------------+--------------------------------------+-----------+------------------+
| hostname | uuid | ifname | datanetwork_name |
+--------------+--------------------------------------+-----------+------------------+
| controller-1 | 95aa5d31-44c4-4b6c-9c70-0b03600dcc1f | enp23s0f1 | va1nw |
| controller-1 | 9aa833e6-c1d2-4713-b1f1-ed8d2efcd3be | enp23s0f1 | va2nw |
| controller-1 | c618578e-2afb-4730-b061-2cb11df694e8 | enp23s0f1 | bnw |
+--------------+--------------------------------------+-----------+------------------+

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get nodes controller-1 -o custom-columns=:.status.allocatable | xargs -n1
map[cpu:64
ephemeral-storage:95003729354
hugepages-1Gi:0
hugepages-2Mi:0
intel.com/pci_sriov_net_va1nw:10. --> ISSUE HERE, only 1 resourceList memory:169700988Ki
pods:110]

-> From the sriovdp logs, it can be seen that the sriovdp plugin doesnt allow pci-device to be reused. Out of 3 resourceList ( pci_sriov_net_va1nw, pci_sriov_net_va2nw, pci_sriov_net_bnw) only 1 is created. Error "Cannot add PCI Address [0000:17:12.1]. Already allocated._"

_0304 06:57:37.385457 1 manager.go:110] Creating new ResourcePool: pci_sriov_net_va1nw_

_I0304 06:57:37.385461 1 manager.go:111] DeviceType: netDevice_

_I0304 06:57:37.388770 1 factory.go:106] device added: [pciAddr: 0000:17:11.0, vendor: 8086, device: 1889, driver: vfio-pci]_

_I0304 06:57:37.388779 1 factory.go:106] device added: [pciAddr: 0000:17:11.1, vendor: 8086, device: 1889, driver: vfio-pci]_

_I0304 06:57:37.388783 1 factory.go:106] device added: [pciAddr: 0000:17:11.2, vendor: 8086, device: 1889, driver: vfio-pci]_

_I0304 06:57:37.388786 1 factory.go:106] device added: [pciAddr: 0000:17:11.3, vendor: 8086, device: 1889, driver: vfio-pci]_

_I0304 06:57:37.388789 1 factory.go:106] device added: [pciAddr: 0000:17:11.4, vendor: 8086, device: 1889, driver: vfio-pci]_

_I0304 06:57:37.388792 1 factory.go:106] device added: [pciAddr: 0000:17:11.5, vendor: 8086, device: 1889, driver: vfio-pci]_

_I0304 06:57:37.388797 1 factory.go:106] device added: [pciAddr: 0000:17:11.6, vendor: 8086, device: 1889, driver: vfio-pci]_

_I0304 06:57:37.388801 1 factory.go:106] device added: [pciAddr: 0000:17:11.7, vendor: 8086, device: 1889, driver: vfio-pci]_

_I0304 06:57:37.388806 1 factory.go:106] device added: [pciAddr: 0000:17:12.0, vendor: 8086, device: 1889, driver: vfio-pci]_

_I0304 06:57:37.388810 1 factory.go:106] device added: [pciAddr: 0000:17:12.1, vendor: 8086, device: 1889, driver: vfio-pci]_

_I0304 06:57:37.388821 1 manager.go:139] New resource server is created for pci_sriov_net_va1nw ResourcePool_

_W0304 06:57:37.391894 1 manager.go:152]_ *_Cannot add PCI Address [0000:17:11.0]. Already allocated._*

_W0304 06:57:37.391978 1 manager.go:152] Cannot add PCI Address [0000:17:11.1]. Already allocated._

_W0304 06:57:37.391983 1 manager.go:152] Cannot add PCI Address [0000:17:11.2]. Already allocated._

_W0304 06:57:37.391985 1 manager.go:152] Cannot add PCI Address [0000:17:11.3]. Already allocated._

_W0304 06:57:37.391987 1 manager.go:152] Cannot add PCI Address [0000:17:11.4]. Already allocated._

_W0304 06:57:37.391989 1 manager.go:152] Cannot add PCI Address [0000:17:11.5]. Already allocated._

_W0304 06:57:37.391991 1 manager.go:152] Cannot add PCI Address [0000:17:11.6]. Already allocated._

_W0304 06:57:37.391994 1 manager.go:152] Cannot add PCI Address [0000:17:11.7]. Already allocated._

_W0304 06:57:37.391997 1 manager.go:152] Cannot add PCI Address [0000:17:12.0]. Already allocated._

_W0304 06:57:37.392000 1 manager.go:152] Cannot add PCI Address [0000:17:12.1]. Already allocated._

_I0304 06:57:37.392003 1 manager.go:125] no devices in device pool, skipping creating resource server for pci_sriov_net_va2nw_

Expected Behavior

From further analysis it can be seen that the changes are because of new sriovdp image.

STX 8.0-> ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin v3.5.1

STX 6.0 -> ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin v3.3.2

-> As the change is introduced in STX 8.0, the user should not be allowed to map multiple datanetwork to single sriov PF/VF interface and the command "interface datanetwork assing" to assign multiple datanetwork must throw an error. This will avoid the confusion for the customer.

Actual Behavior

We were allowed to assign multiple datanetwork to the single sriov PF/VF interface as it was allowed in the platform version STX 6.0, because of the new change in the platform, the feature is not working in STX 8.0.

Reproducibility

100%

System Configuration

Workaround

No workaround.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/914833

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/914833
Committed: https://opendev.org/starlingx/config/commit/f6158f5b02124871c74f23c12228b533db7132af
Submitter: "Zuul (22348)"
Branch: master

commit f6158f5b02124871c74f23c12228b533db7132af
Author: Caio Bruchert <email address hidden>
Date: Mon Apr 1 16:59:08 2024 -0300

    Prevent multiple datanetworks to same interface

    Since sriov-network-device-plugin upgrade to v3.5.1, assigning multiple
    datanetworks to the same interface is not possible anymore.

    This change restricts the system interface-datanetwork-assign command to
    prevent that from happening.

    Test Plan:
    PASS: assign datanetwork1 to sriov0 interface: ok
    PASS: assign datanetwork2 to same sriov0 interface: fails
    PASS: create new vf0 interface on top of sriov0: ok
    PASS: assign datanetwork1 to vf0: ok
    PASS: assign datanetwork2 to vf0: fails
    PASS: create new vf1 interface on top of sriov0: ok
    PASS: assign datanetwork2 to vf1: ok
    PASS: assign datanetwork1 to vf1: fails

    Closes-Bug: 2059960

    Change-Id: If3ab95594917089f01475f9595c9059edeae85f5
    Signed-off-by: Caio Bruchert <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.10.0 stx.networking
Changed in starlingx:
assignee: nobody → Caio Bruchert (cbrucher)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.