ACC100: Device information missing on "system host-device-show" when configured by sriov-fec-operator

Bug #1996106 reported by Lucas Wizer da Silva
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Medium
Balendu Mouli Burla

Bug Description

Brief Description
-----------------
Configuring ACC100 using sriov-fec-operator, after a lock/unlock the device information on "system host-device-show" still show the default values. Tested with N3000 device and it worked.

Severity
--------
<Minor: System/Feature is usable with minor issue>

Steps to Reproduce
------------------
Configure ACC100 using sriov-fec-operator
Lock/unlock controller
system host-device-show controller-0 <pci name or address>

Expected Behavior
------------------
Device information appearing on "system host-device-show" after lock/unlock

Actual Behavior
----------------
Device information not appearing on "system host-device-show" after lock/unlock

Reproducibility
---------------
Reproducible

System Configuration
--------------------
One node system

Branch/Pull Time/Commit
-----------------------
BUILD_ID="2022-11-07_19-44-30"

Last Pass
---------
First time seeing this issue.

Timestamp/Logs
--------------
ACC100
[sysadmin@controller-0 ~(keystone_admin)]$ system host-device-show controller-0 0000:c3:00.0
+-----------------------+-----------------------------------------------------------------------------+
| Property | Value |
+-----------------------+-----------------------------------------------------------------------------+
| name | pci_0000_c3_00_0 |
| address | 0000:c3:00.0 |
| class id | 120001 |
| vendor id | 8086 |
| device id | 0d5c |
| class name | Processing accelerators |
| vendor name | Intel Corporation |
| device name | Device 0d5c |
| numa_node | 0 |
| enabled | True |
| sriov_totalvfs | 16 |
| sriov_numvfs | 0 |
| sriov_vfs_pci_address | |
| sriov_vf_pdevice_id | None |
| extra_info | {'expected_numvfs': 0, 'expected_driver': None, 'expected_vf_driver': None} |
| created_at | 2022-11-08T13:04:23.513342+00:00 |
| updated_at | 2022-11-08T17:13:57.829506+00:00 |
| root_key | None |
| revoked_key_ids | None |
| boot_page | None |
| bitstream_id | None |
| bmc_build_version | None |
| bmc_fw_version | None |
| retimer_a_version | None |
| retimer_b_version | None |
| driver | None |
| sriov_vf_driver | None |
+-----------------------+-----------------------------------------------------------------------------+
----------------------------------------
N3000
[sysadmin@controller-0 ~(keystone_admin)]$ system host-device-show controller-0 0000:b4:00.0
+-----------------------+-----------------------------------------------------------------------------+
| Property | Value |
+-----------------------+-----------------------------------------------------------------------------+
| name | pci_0000_b4_00_0 |
| address | 0000:b4:00.0 |
| class id | 120000 |
| vendor id | 8086 |
| device id | 0d8f |
| class name | Processing accelerators |
| vendor name | Intel Corporation |
| device name | Device 0d8f |
| numa_node | 1 |
| enabled | True |
| sriov_totalvfs | 8 |
| sriov_numvfs | 2 |
| sriov_vfs_pci_address | 0000:b4:00.1,0000:b4:00.2 |
| sriov_vf_pdevice_id | 0d90 |
| extra_info | {'expected_numvfs': 0, 'expected_driver': None, 'expected_vf_driver': None} |
| created_at | 2022-11-02T21:36:05.583958+00:00 |
| updated_at | 2022-11-09T13:16:40.539023+00:00 |
| root_key | None |
| revoked_key_ids | None |
| boot_page | None |
| bitstream_id | None |
| bmc_build_version | None |
| bmc_fw_version | None |
| retimer_a_version | None |
| retimer_b_version | None |
| driver | igb_uio |
| sriov_vf_driver | vfio-pci |
+-----------------------+-----------------------------------------------------------------------------+

Test Activity
-------------
Feature Testing

Workaround
----------
Get device configuration info using "kubectl get sriovfecnodeconfigs.sriovfec.intel.com -n sriov-fec-system controller-0 -o yaml"

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Balendu Mouli Burla (balendu)
Revision history for this message
Ghada Khalil (gkhalil) wrote :
Changed in starlingx:
importance: Undecided → High
importance: High → Low
tags: added: stx.8.0 stx.networking
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Low → Medium
Revision history for this message
Nidhi Shivashankara Belur (nshivash) wrote :

sriov-fec-operator method of configuring ACC100 or N3000 does not populate device status info in the "system host-device-show" attributes.
The recommended method of applying the configuration is by creating a custom resource "sriovfecclusterconfig" and once applied, the inventory section of the "sriovfecnodeconfig" gets updated.

The below command is the only recommended method to check device configuration status and info.
kubectl get sriovfecnodeconfigs.sriovfec.intel.com -n sriov-fec-system controller-0 -o yaml

Please refer to the document below.
https://docs.starlingx.io/node_management/kubernetes/hardware_acceleration_devices/configure-sriov-fec-operator-to-enable-hw-accelerators-for-hosted-vran-containarized-workloads.html

Revision history for this message
Lucas Wizer da Silva (lwizerda) wrote :

I configured both ACC100 and N3000 using sriov-fec-operator method following this document https://docs.starlingx.io/node_management/kubernetes/hardware_acceleration_devices/configure-sriov-fec-operator-to-enable-hw-accelerators-for-hosted-vran-containarized-workloads.html

After lock/unlock the controller, ACC100 behavior was like you commented, without any status info in the "system host-device-show" and I can check the status and info using "kubectl get sriovfecnodeconfigs.sriovfec.intel.com -n sriov-fec-system controller-0 -o yaml".

I opened this launchpad because for N3000 after lock/unlock, the status info appear in the "system host-device-show".

So the wrong behavior is populate device status info for N3000, I think this https://bugs.launchpad.net/starlingx/+bug/1996109 can be used to fix it.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Nidhi, Based on your comment above, should this LP be marked as Invalid?

Revision history for this message
Nidhi Shivashankara Belur (nshivash) wrote :

@Ghada Yes, this LP is invalid. Did not observe the described behavior in our testing. Configuring any of the 3 devices ACC100/N3000/ACC200 does not populate the attributes of "system host-device-show". Verified on my setup today. This must be due to some old configuration using "system host-device-modify" which must be removed prior to using sriov-fec-operator.

Ghada Khalil (gkhalil)
Changed in starlingx:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.