FEC sriov-vf-driver lost after unexpected reboot

Bug #1966471 reported by Douglas Henrique Koerich
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Douglas Henrique Koerich

Bug Description

Brief Description
-----------------
SRIOV VF driver of FEC disappeared on the host after the node rebooted itself.

Severity
--------
Major.

Steps to Reproduce
------------------
The behavior is similar to resetting manually the VF driver to "null" at the database.

Expected Behavior
------------------
SRIOV VF driver persisted in the system after FEC configuration.

Actual Behavior
----------------
SRIOV VF driver reset to "none" after unexpected reboot.

Reproducibility
---------------
Seen once.

System Configuration
--------------------
Seen in DC subcloud, but could happen in any configuration.

Branch/Pull Time/Commit
-----------------------
r/stx4.0.

Last Pass
---------
This is a new test scenario.

Timestamp/Logs
--------------
After reboot, sysinv got this inventory report:
sysinv 2022-02-16 19:59:30.432 103738 INFO sysinv.conductor.manager [-] attr: {'sriov_numvfs': u'1\n', 'driver': u'igb_uio', 'sriov_vf_driver': None, 'sriov_vf_pdevice_id': u'0d5d', 'psvendor': u'Intel Corporation', 'extra_info': u"{'expected_numvfs': 1}", 'pdevice_id': u'0d5c', 'pclass': u'Processing accelerators', 'psdevice': u'Device 0000', 'sriov_vfs_pci_address': u'0000:4c:00.0', 'pvendor': u'Intel Corporation', 'pvendor_id': u'8086', 'pclass_id': u'120001', 'sriov_totalvfs': u'16\n'}

Test Activity
-------------
Other - normal operation.

Workaround
----------
Needs reconfiguration of VF driver.

Changed in starlingx:
status: New → In Progress
assignee: nobody → Douglas Henrique Koerich (dkoerich-wr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/835304

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/835304
Committed: https://opendev.org/starlingx/config/commit/a7b47f6197597a69c22c052f2c74b0a5fa1e541e
Submitter: "Zuul (22348)"
Branch: master

commit a7b47f6197597a69c22c052f2c74b0a5fa1e541e
Author: Douglas Henrique Koerich <email address hidden>
Date: Fri Mar 25 14:14:41 2022 -0300

    Prevent overwrite of sensible FEC configuration

    Regardless these precedent changes:
    https://review.opendev.org/c/starlingx/config/+/733724,
    https://review.opendev.org/c/starlingx/config/+/761176 and
    https://review.opendev.org/c/starlingx/config/+/801055
    that intended to avoid the overwrite of values in database for FEC
    device configuration with the actual (sometimes empty or temporary)
    values coming from PCI inventory report, it has been still noticed that
    under some unexpected reboot such sensible, stored data could be lost.

    This change implements for FEC driver and VF driver the same approach
    taken for number of configured VFs versus actual VFs introduced by the
    changes:
    https://review.opendev.org/c/starlingx/config/+/791531,
    https://review.opendev.org/c/starlingx/config/+/795850 and
    https://review.opendev.org/c/starlingx/config/+/808756

    From now on, the configured FEC driver and VF driver will be available
    as "extra information" (in the 'extra_info' field, together with
    configured number of VFs) that can be compared with the actual settings
    collected from PCI inventory for the host.

    Test Plan:

    PASS: Fresh install of r/stx7.0 with subsequent FEC configuration;
    PASS: Backup & restore of r/stx7.0 after FEC configuration;
    PASS: Patch of r/stx5.0 with previous FEC configuration;
    PASS: Upgrade from r/stx5.0 with previous FEC configuration to patched
          r/stx6.0.
    PASS: Upgrade from patched r/stx5.0 with previous FEC configuration to
          patched r/stx6.0.

    Closes-Bug: #1966471
    Signed-off-by: Douglas Henrique Koerich <email address hidden>
    Change-Id: I5e2e6f55856d5eef0a02117a10e36dc9c5aa9dda

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0 stx.networking
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.