worker node cannot be unlocked at first attempt when configure pci-sriov class of interface

Bug #1847573 reported by Litao Gao
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Won't Fix
Low
Unassigned

Bug Description

Brief Description
-----------------
System always fails the 1st unlock request if the node with worker subfunction configure class of pci-sriov class of interface, with error message "Expecting number of interface sriov_numvfs=xx. Please wait a few minutes for inventory update and retry host-unlock."

And even you wait for quite a long time, this error still pops up and fails the 1st unlock operation.
But a rerun of the unlock operation can go through, and the node can be unlocked successfully.

Severity
--------
<Minor: System/Feature is usable with minor issue>
Since unlock the node 2nd time can be completed.

Steps to Reproduce
------------------
1. Provisioning 1st controller node with SRIOV capable NIC, like: Intel Ethernet Controller 10-Gigabit X540-AT2.
2. configure SRIOV class of interface

    system datanetwork-add physnet1 flat
    system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 12 --vf-driver=vfio controller-0 ens785f1
    system interface-datanetwork-assign controller-0 sriov1 physnet1

    ## check the VFs have been enabled in controller-0, by running "ip link show ens785f1"
    ## and then request the unlock operation
    system host-unlock controller-0

Expected Behavior
------------------
The unlock operation should be successful at the 1st attempt.

Actual Behavior
----------------
System always fails the 1st unlock request if the node with worker subfunction configure class of pci-sriov class of interface, with error message "Expecting number of interface sriov_numvfs=xx. Please wait a few minutes for inventory update and retry host-unlock."

Reproducibility
---------------
the issue is 100% reproducible

System Configuration
--------------------
Two node system, this can happen in both nodes in the duplex, but may also happen in other system configurations.

Branch/Pull Time/Commit
-----------------------
BUILD_ID="2019-10-01_20-00-00"

Last Pass
---------
N/A

Timestamp/Logs
--------------
sysinv.log

Test Activity
-------------
Evaluation

Revision history for this message
Litao Gao (gaolitao) wrote :
Ghada Khalil (gkhalil)
tags: removed: pci-sriov unlock
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Is this specific to this particular NIC? Or have you seen this issue on other NICs as well?

Changed in starlingx:
status: New → Incomplete
Ghada Khalil (gkhalil)
tags: added: stx.networking
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as low priority / not gating given a retry of the unlock cmd passes

Changed in starlingx:
importance: Undecided → Low
status: Incomplete → Triaged
tags: added: stx.config
tags: added: stx.helpwanted
Revision history for this message
John Kung (john-kung) wrote :

The sriov config requires a runtime config to be completed and align with the expected number of sriov before allowing the host-unlock. For the manual operation, it may be required to rerun the host-unlock.

The orchestrators, (VIM, DC) have built-in retry of the host-unlock operation.

Revision history for this message
Ramaswamy Subramanian (rsubrama) wrote :

No progress on this bug for more than 2 years. Candidate for closure.

If there is no update, this issue is targeted to be closed as 'Won't Fix' in 2 weeks.

Revision history for this message
Ramaswamy Subramanian (rsubrama) wrote :

Changing the status to 'Won't Fix' as there is no activity.

Changed in starlingx:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.