Comment 4 for bug 2053149

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/909476
Committed: https://opendev.org/starlingx/config/commit/9c3bf050cd57916325a3e7218a4816ba575b63e4
Submitter: "Zuul (22348)"
Branch: master

commit 9c3bf050cd57916325a3e7218a4816ba575b63e4
Author: Tara Subedi <email address hidden>
Date: Thu Feb 8 14:51:50 2024 -0500

    Report port and device inventory after the worker manifest

    The SR-IOV configuration of a device is not retained across reboots,
    until puppet manifests bind/enable completes. The sysinv-agent should
    not report device inventory at any time after it is started, it should
    wait until puppet worker manifest completes. Though during bootstrap
    (fresh install), restore, network-boot and subsequent reboots in case
    of non-worker roles (controller, storage) sysinv-agent can report at
    any time it is started.

    Upon reboot, SR-IOV configuration (of ACC100) (sriov_numvfs=0) is
    updated to intended configuration by puppet worker manifest. In this
    case, there is a small chance that the sysinv-agent audit (every 60
    seconds) will run before the driver configuration. Since the agent will
    only actually report the port and device inventory once, the SR-IOV
    configuration data is not accurately reflected in the db, thus
    requiring additional lock/unlock(s) to force correction.

    After fresh-install/restore/network-boot and reboot, there was no
    /etc/platform/.initial_worker_config_complete and
    /var/run/.worker_config_complete files until puppet worker manifest
    completes. sysinv-agent audit happened to read device inventory before
    the driver configuration (i.e. before worker manifest completed), thus
    not accurately reflected in the db.

    This commit fixes such that port and device configuration are only
    reported after the worker manifest has completed, in case the host is
    being configured as worker subfunction.

    TEST PLAN:
       PASS: Fresh install node (that has ACC100 device) AIO, check
             host-device-list/show (before config/unlock) to see
             ACC100 device config:: driver:None, vf-driver:None, N:0.

       PASS: After above, update config (ACC100 device config::
             driver:igb_uio, vf-driver:igb_uio, N:1) and also use
             host-label-assign as sriovdp=enabled and unlock, for
             subsequent reboots validate device config as
             (driver:igb_uio, vf-driver:igb_uio, N:1) and validate
             content of /etc/pcidp/config.json.

       PASS: Restore node from backup (ACC100 device config::
             driver:igb_uio, vf-driver:igb_uio, N:1 and also
             host-label-assing as sriovdp=enabled), once node
             come back up, check host-device-list/show for after-boot
             update time and num_vfs = 1. Also validate content of
             /etc/pcidp/config.json.

        PASS: In AIO-DX setup, ports and devices can be listed and
             and second worker node can be unlocked, after the
             network-boot.

    Closes-Bug: 2053149
    Change-Id: I69d483041bd75ea0abbd68cedccfbc5f10062c75
    Signed-off-by: Tara Nath Subedi <email address hidden>