worker node cannot be unlocked at first attempt when configure pci-sriov class of interface

Bug #1847573 reported by Litao Gao on 2019-10-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Low
Unassigned

Bug Description

Brief Description
-----------------
System always fails the 1st unlock request if the node with worker subfunction configure class of pci-sriov class of interface, with error message "Expecting number of interface sriov_numvfs=xx. Please wait a few minutes for inventory update and retry host-unlock."

And even you wait for quite a long time, this error still pops up and fails the 1st unlock operation.
But a rerun of the unlock operation can go through, and the node can be unlocked successfully.

Severity
--------
<Minor: System/Feature is usable with minor issue>
Since unlock the node 2nd time can be completed.

Steps to Reproduce
------------------
1. Provisioning 1st controller node with SRIOV capable NIC, like: Intel Ethernet Controller 10-Gigabit X540-AT2.
2. configure SRIOV class of interface

    system datanetwork-add physnet1 flat
    system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 12 --vf-driver=vfio controller-0 ens785f1
    system interface-datanetwork-assign controller-0 sriov1 physnet1

    ## check the VFs have been enabled in controller-0, by running "ip link show ens785f1"
    ## and then request the unlock operation
    system host-unlock controller-0

Expected Behavior
------------------
The unlock operation should be successful at the 1st attempt.

Actual Behavior
----------------
System always fails the 1st unlock request if the node with worker subfunction configure class of pci-sriov class of interface, with error message "Expecting number of interface sriov_numvfs=xx. Please wait a few minutes for inventory update and retry host-unlock."

Reproducibility
---------------
the issue is 100% reproducible

System Configuration
--------------------
Two node system, this can happen in both nodes in the duplex, but may also happen in other system configurations.

Branch/Pull Time/Commit
-----------------------
BUILD_ID="2019-10-01_20-00-00"

Last Pass
---------
N/A

Timestamp/Logs
--------------
sysinv.log

Test Activity
-------------
Evaluation

Ghada Khalil (gkhalil) on 2019-10-10
tags: removed: pci-sriov unlock
Ghada Khalil (gkhalil) wrote :

Is this specific to this particular NIC? Or have you seen this issue on other NICs as well?

Changed in starlingx:
status: New → Incomplete
Ghada Khalil (gkhalil) on 2019-10-18
tags: added: stx.networking
Ghada Khalil (gkhalil) wrote :

Marking as low priority / not gating given a retry of the unlock cmd passes

Changed in starlingx:
importance: Undecided → Low
status: Incomplete → Triaged
tags: added: stx.config
tags: added: stx.helpwanted
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers