ACC100 device unavailable to kubernetes after lock/unlock

Bug #2045148 reported by Steven Webster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
New
Undecided
Steven Webster

Bug Description

Brief Description
-----------------
After a restore operation, and running for a few weeks, a system was locked/unlocked. After unlock, pods using an ACC100 FEC device were not able to obtain an SR-IOV VF from the FEC device.

Note that this is not easily reproducible.

Severity
--------
Minor: System/Feature is usable with minor issue

Steps to Reproduce
------------------
Note that this is not easily reproducible, but the following steps were done on an affected system.

- Perform a backup/restore on an AIO-SX system
- pci_device inventory for the ACC100 device has entries for sriov_numvfs and sriov_vf_driver cleared.
- Perform a lock/unlock (this will probably require a second unlock attempt)
- After the system comes up, any pods that were previously making use of a SR-IOV VF of an ACC100 FEC device fail to start.

Expected Behavior
------------------
Pods that were using an SR-IOV VF from an ACC100 device can start.

Actual Behavior
----------------
Pods that were using an SR-IOV VF from an ACC100 device can't start.

Reproducibility
---------------
Seen once. From looking at logs, I have reproduced the scenario by modifying the database directly to force the issue.

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
stx-8.0

Test Activity
-------------
Field operation

Workaround
----------
If this scenario is encountered, an extra lock/unlock would likely resolve the issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.