Report port and device inventory after the worker manifest
This is incremental fix of bug:2053149.
Upon network boot (first boot) of worker node, agent manager is
supposed to report ports/devices, without waiting for worker manifest,
as that would never run on first boot. Without this, after system
restore, it will be unable to unlock compute node due to sriov config
update.
kickstart records first boot as "/etc/platform/.first_boot". Agent
manager deletes this file. In case agent manager get crashed, it will
start again. This time, agent manager don't see .first_boot file, and
don't know this is still first boot and it won't report inventory for
the worker node.
This commit fixes this issue by creating volatile file
"/var/run/.first_boot" before deleting "/etc/platform/.first_boot", and
agent relies on both files to figure out it is first boot or not. This
present same logic for multiple crash/restart of agent manager.
TEST PLAN:
PASS: AIO-DX bootstrap has no issues. lock/unlock has no issues.
PASS: Network-boot worker node, before doing unlock, restart agent
manager (sysinv-agent), check sysinv.log to see ports are reported.
Reviewed: https:/ /review. opendev. org/c/starlingx /config/ +/914142 /opendev. org/starlingx/ config/ commit/ 933d3a3a73e923e fc86d7ac8b8a059 a598e6fbe1
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 933d3a3a73e923e fc86d7ac8b8a059 a598e6fbe1
Author: Tara Subedi <email address hidden>
Date: Mon Mar 25 13:51:26 2024 -0400
Report port and device inventory after the worker manifest
This is incremental fix of bug:2053149.
Upon network boot (first boot) of worker node, agent manager is
supposed to report ports/devices, without waiting for worker manifest,
as that would never run on first boot. Without this, after system
restore, it will be unable to unlock compute node due to sriov config
update.
kickstart records first boot as "/etc/platform/ .first_ boot". Agent
manager deletes this file. In case agent manager get crashed, it will
start again. This time, agent manager don't see .first_boot file, and
don't know this is still first boot and it won't report inventory for
the worker node.
This commit fixes this issue by creating volatile file run/.first_ boot" before deleting "/etc/platform/ .first_ boot", and
"/var/
agent relies on both files to figure out it is first boot or not. This
present same logic for multiple crash/restart of agent manager.
TEST PLAN:
PASS: AIO-DX bootstrap has no issues. lock/unlock has no issues.
PASS: Network-boot worker node, before doing unlock, restart agent
manager (sysinv-agent), check sysinv.log to see ports are reported.
Closes-Bug: 2053149 6ed3403590dbeec 545c25fc0e0
Change-Id: Iace5576575388a
Signed-off-by: Tara Nath Subedi <email address hidden>