Removing FEC device settings causes multiple reboots and the node goes to degraded state

Bug #1967887 reported by Steven Webster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Steven Webster

Bug Description

Brief Description
-----------------
Removing Intel FPGA settings causes multiple reboots and then the node goes to a degraded state

Severity
--------
Critical: System/Feature is not usable due to the defect

Steps to Reproduce
------------------
This has been reproduced on an Intel N3000 FPGA, but should apply to any device which needs it's PCI PF device to be bound to an appropriate driver before VFs are able to be provisioned:

Provision device:
system host-device-modify controller-0 pci_0000_b4_00_0 --driver igb_uio --vf-driver igb_uio -N 4

Try to de-provision device:
system host-device-modify controller-0 pci_0000_b4_00_0 --driver none --vf-driver none -N 0

system host-unlock

Expected Behavior
------------------
The FEC device should be able to be de-provisioned. That is, the driver, vf driver and number of VFs should be able to reset to None, None, 0.

Actual Behavior
----------------
Node goes degraded after de-provision.

Reproducibility
---------------
100%

System Configuration
--------------------
Seen on AIO-SX, but should be present in all system types.

Branch/Pull Time/Commit
-----------------------
master.

Last Pass
---------
N/A

Test Activity
-------------
Testing.

Workaround
----------
There is currently no workaround available

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/836662

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)
Download full text (3.2 KiB)

Reviewed: https://review.opendev.org/c/starlingx/config/+/836662
Committed: https://opendev.org/starlingx/config/commit/8cac0266c79a89fcac3b68d2d555205a80301add
Submitter: "Zuul (22348)"
Branch: master

commit 8cac0266c79a89fcac3b68d2d555205a80301add
Author: Steven Webster <email address hidden>
Date: Tue Apr 5 10:15:26 2022 -0400

    Allow de-provisioning of FEC device

    This commit addresses an issue that has been seen in the
    'de-provisioning' of an FEC PCI device.

    Typically, an FEC FPGA device must bind it's PF PCI device
    driver before it can be enabled for SR-IOV.

    The typical procedure for this looks something like:

    system host-device-modify controller-0 pci_0000_b4_00_0 \
        --driver igb_uio --vf-driver igb_uio -N 4

    To actually enable the 4 VFs on the device, puppet must
    first bind the PF driver to igb_uio.

    To 'de-provision' this device, a user may think they could do
    something like:

    system host-device-modify controller-0 pci_0000_b4_00_0 \
        --driver none --vf-driver none -N 0

    The problem here is that the PF driver would be set to none
    before the number of VFs are reset to 0. After the PF driver
    is set to none, the VFs will still exist, and the user will
    not be able to configure the VFs back to 0.

    The fix for this is to add some semantic checks to guide
    the user to remove the VF driver and number of VFs before
    they are able to remove the PF driver.

    Because the de-provisioning of the vf-driver and number
    of VFs will kick of a runtime manifest application, there
    is also a check that will notify the user if they try to
    remove the PF driver before the runtime manifest has
    completed and the updated inventory reported to the
    conductor.

    This commit also adds some semantic checks which ensure
    that for an FEC device, the VFs are not to be configured
    before a valid PF driver has been configured.

    Testing:

    PASS: Ensure a user can still provision the device as usual
    PASS: Ensure the user is notified that they must remove the
          VF parameters first if they try to remove all parameters
          in one command.
    PASS: Ensure the user is notified they must wait if they try
          to reset the PF driver before the VF parameters have
          been reset and inventory reported to the conductor.
    PASS: Ensure the user is able to reset the PF driver to
          none after the VF parameters have been reset.
    PASS: Ensure the user cannot set a PF driver to an invalid
          value.
    PASS: Ensure the user cannot set a VF driver to an invalid
          value.
    PASS: Ensure that if a PF driver has not been set that the number
          of VFs are not able to be set.
    PASS: Ensure that if a PF driver has not been set that the number
          of VFs and the VF driver are not able to be set.
    PASS: Ensure that if a PF driver has been set that if a user tries
          to set a VF driver but not the number of VFs, that it is not
          allowed

    Closes-Bug: #1967887

    Change-Id: Ibe1ae7ba1a8268946b44b1ffb79d...

Read more...

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.