SRIOV VF driver definition is missing after host lock/unlock

Bug #1901968 reported by Ghada Khalil
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Steven Webster

Bug Description

Brief Description
-----------------
SR-IOV driver disappeared from the definition making 0 FEC devices allocatable for Kubernetes

1> created 8 VFs on FEC device with igb_uio driver
2> displayed the device info for FEC device (0d8f)
driver igb_uio
sriov_vf_driver igb_uio
3> checked the resource of FEC device from k8s
Allocatable:
intel.com/intel_fpga_fec: 8
4> check drivers (lspci) on host.
b6:01.0 Processing accelerators: Intel Corporation Device 0d90 (rev 01)
Subsystem: Intel Corporation Device e001
Kernel driver in use: igb_uio
5> after waiting for certain time, repeat the step <2> :
sriov_vf_driver is None;
driver igb_uio
sriov_vf_driver None
6> system lock and unlock the host
7> checked the resource of FEC device from k8s
Allocatable:
intel.com/intel_fpga_fec: 0

Severity
--------
Medium - severity is high when the issue occurs, but it's very intermittent

Steps to Reproduce
------------------
See above

Expected Behavior
------------------
SRIOV is configured properly after unlock

Actual Behavior
----------------
SRIOV VF driver definition is missing after unlock

Reproducibility
---------------
Intermittent; frequency is unknown

System Configuration
--------------------
Seen on AIO-SX

Branch/Pull Time/Commit
-----------------------
stx.4.0

Last Pass
---------
N/A - Issue is intermittent

Timestamp/Logs
--------------

Test Activity
-------------
System Testing

Workaround
----------
Re-attempt the lock/unlock

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium priority since the issue is intermittent and there is a workaround

Changed in starlingx:
assignee: nobody → Steven Webster (swebster-wr)
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.5.0 stx.networking
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/761176

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/761176
Committed: https://opendev.org/starlingx/config/commit/86aea65f8ea8ad37be1e67f8d305ab10a9142fd3
Submitter: Zuul
Branch: master

commit 86aea65f8ea8ad37be1e67f8d305ab10a9142fd3
Author: Steven Webster <email address hidden>
Date: Tue Nov 3 09:19:20 2020 -0500

    Report port and device inventory after the worker manifest

    Normally, the SR-IOV configuration of a network or device is retained
    across reboots. In most cases, this is sufficient for the
    sysinv-agent to report port and device inventory at any time after it
    is started.

    A problem can occur however for the case of an N3000 FPGA device.
    This device requires a reset on every reboot, which clears the SR-IOV
    configuration until the puppet worker manifest has completed the
    (re)configuration. In this case, there is a small chance that the
    sysinv-agent audit (every 60 seconds) will run in-between the reset
    and the driver configuration. Since the agent will only actually
    report the port and device inventory once, a problem can occur
    after a second host lock-unlock, as the SR-IOV configuration data is
    not accurately reflected in the db.

    This commit follows a similar method used by the sysinv agent's
    hugepage reporting, in that the port and device configuration are
    only reported after the worker manifest has completed.

    Change-Id: Id4af97e0175482d561745a5cb34650dc6b5653b8
    Closes-Bug: 1901968
    Signed-off-by: Steven Webster <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.