ACC100 device unavailable to kubernetes after lock/unlock
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
New
|
Undecided
|
Steven Webster |
Bug Description
Brief Description
-----------------
After a restore operation, and running for a few weeks, a system was locked/unlocked. After unlock, pods using an ACC100 FEC device were not able to obtain an SR-IOV VF from the FEC device.
Note that this is not easily reproducible.
Severity
--------
Minor: System/Feature is usable with minor issue
Steps to Reproduce
------------------
Note that this is not easily reproducible, but the following steps were done on an affected system.
- Perform a backup/restore on an AIO-SX system
- pci_device inventory for the ACC100 device has entries for sriov_numvfs and sriov_vf_driver cleared.
- Perform a lock/unlock (this will probably require a second unlock attempt)
- After the system comes up, any pods that were previously making use of a SR-IOV VF of an ACC100 FEC device fail to start.
Expected Behavior
------------------
Pods that were using an SR-IOV VF from an ACC100 device can start.
Actual Behavior
----------------
Pods that were using an SR-IOV VF from an ACC100 device can't start.
Reproducibility
---------------
Seen once. From looking at logs, I have reproduced the scenario by modifying the database directly to force the issue.
System Configuration
-------
AIO-SX
Branch/Pull Time/Commit
-------
stx-8.0
Test Activity
-------------
Field operation
Workaround
----------
If this scenario is encountered, an extra lock/unlock would likely resolve the issue.