Report port and device inventory after the worker manifest
The SR-IOV configuration of a device is not retained across reboots,
until puppet manifests bind/enable completes. The sysinv-agent should
not report device inventory at any time after it is started, it should
wait until puppet worker manifest completes. Though during bootstrap
(fresh install), restore, network-boot and subsequent reboots in case
of non-worker roles (controller, storage) sysinv-agent can report at
any time it is started.
Upon reboot, SR-IOV configuration (of ACC100) (sriov_numvfs=0) is
updated to intended configuration by puppet worker manifest. In this
case, there is a small chance that the sysinv-agent audit (every 60
seconds) will run before the driver configuration. Since the agent will
only actually report the port and device inventory once, the SR-IOV
configuration data is not accurately reflected in the db, thus
requiring additional lock/unlock(s) to force correction.
After fresh-install/restore/network-boot and reboot, there was no
/etc/platform/.initial_worker_config_complete and
/var/run/.worker_config_complete files until puppet worker manifest
completes. sysinv-agent audit happened to read device inventory before
the driver configuration (i.e. before worker manifest completed), thus
not accurately reflected in the db.
This commit fixes such that port and device configuration are only
reported after the worker manifest has completed, in case the host is
being configured as worker subfunction.
TEST PLAN:
PASS: Fresh install node (that has ACC100 device) AIO, check host-device-list/show (before config/unlock) to see
ACC100 device config:: driver:None, vf-driver:None, N:0.
PASS: After above, update config (ACC100 device config:: driver:igb_uio, vf-driver:igb_uio, N:1) and also use host-label-assign as sriovdp=enabled and unlock, for subsequent reboots validate device config as (driver:igb_uio, vf-driver:igb_uio, N:1) and validate content of /etc/pcidp/config.json.
PASS: Restore node from backup (ACC100 device config:: driver:igb_uio, vf-driver:igb_uio, N:1 and also host-label-assing as sriovdp=enabled), once node
come back up, check host-device-list/show for after-boot
update time and num_vfs = 1. Also validate content of /etc/pcidp/config.json.
PASS: In AIO-DX setup, ports and devices can be listed and
and second worker node can be unlocked, after the network-boot.
Reviewed: https:/ /review. opendev. org/c/starlingx /config/ +/909476 /opendev. org/starlingx/ config/ commit/ 9c3bf050cd57916 325a3e7218a4816 ba575b63e4
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 9c3bf050cd57916 325a3e7218a4816 ba575b63e4
Author: Tara Subedi <email address hidden>
Date: Thu Feb 8 14:51:50 2024 -0500
Report port and device inventory after the worker manifest
The SR-IOV configuration of a device is not retained across reboots,
until puppet manifests bind/enable completes. The sysinv-agent should
not report device inventory at any time after it is started, it should
wait until puppet worker manifest completes. Though during bootstrap
(fresh install), restore, network-boot and subsequent reboots in case
of non-worker roles (controller, storage) sysinv-agent can report at
any time it is started.
Upon reboot, SR-IOV configuration (of ACC100) (sriov_numvfs=0) is
updated to intended configuration by puppet worker manifest. In this
case, there is a small chance that the sysinv-agent audit (every 60
seconds) will run before the driver configuration. Since the agent will
only actually report the port and device inventory once, the SR-IOV
configuration data is not accurately reflected in the db, thus
requiring additional lock/unlock(s) to force correction.
After fresh-install/ restore/ network- boot and reboot, there was no platform/ .initial_ worker_ config_ complete and run/.worker_ config_ complete files until puppet worker manifest
/etc/
/var/
completes. sysinv-agent audit happened to read device inventory before
the driver configuration (i.e. before worker manifest completed), thus
not accurately reflected in the db.
This commit fixes such that port and device configuration are only
reported after the worker manifest has completed, in case the host is
being configured as worker subfunction.
TEST PLAN:
host- device- list/show (before config/unlock) to see
PASS: Fresh install node (that has ACC100 device) AIO, check
ACC100 device config:: driver:None, vf-driver:None, N:0.
PASS: After above, update config (ACC100 device config::
driver: igb_uio, vf-driver:igb_uio, N:1) and also use
host- label-assign as sriovdp=enabled and unlock, for
subsequen t reboots validate device config as
( driver: igb_uio, vf-driver:igb_uio, N:1) and validate
content of /etc/pcidp/ config. json.
PASS: Restore node from backup (ACC100 device config::
driver: igb_uio, vf-driver:igb_uio, N:1 and also
host- label-assing as sriovdp=enabled), once node list/show for after-boot
/ etc/pcidp/ config. json.
come back up, check host-device-
update time and num_vfs = 1. Also validate content of
PASS: In AIO-DX setup, ports and devices can be listed and
network- boot.
and second worker node can be unlocked, after the
Closes-Bug: 2053149 a0abbd68cedccfb c5f10062c75
Change-Id: I69d483041bd75e
Signed-off-by: Tara Nath Subedi <email address hidden>