Missed runtime config on active controller initialization leads to config out of date alarm
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
John Kung |
Bug Description
Brief Description
-----------------
During initialization on the active controller, runtime config can be missed due to the agent not being ready to handle the config request.
This results in a missed config and stuck 250.001 "Config out of date alarm".
Severity
--------
Major: System/Feature is usable but degraded
Major config out of date 250.001 alarm raised can be cleared by manual action.
Steps to Reproduce
------------------
Write down the steps to reproduce the issue
Expected Behavior
------------------
No config out of date due to missed runtime config.
Actual Behavior
----------------
Missed runtime config on active controller can lead to 250.001 config out of date alarm.
Reproducibility
---------------
Intermittent
Could be over 10% in large sample of lock/unlock subcloud.
System Configuration
-------
AIO-SX (including Distributed Cloud subcloud)
could also affect multinode system
Branch/Pull Time/Commit
-------
2020-11-14_20-00-08
Last Pass
---------
Intermittent issue
Timestamp/Logs
--------------
sysinv 2020-11-16 18:21:59.169 95977 INFO sysinv.
sysinv 2020-11-16 18:22:03.777 94959 INFO sysinv.agent.rpcapi [-] config_
{'classes': ['platform:
to agent
sysinv 2020-11-16 18:22:03.794 95977 INFO sysinv.
{u'certtype': u'admin-
host:[fd01:
However, the agent was starting up at the time of the fanout rpc call; and consequently runtime config is missed:
sysinv 2020-11-16 18:21:47.449 10551 ERROR sysinv.
Test Activity
-------------
Regression Testing
Workaround
----------
reapply the runtime config operation or
perform host-lock/unlock
Gerrit review created: https:/ /review. opendev. org/c/starlingx /config/ +/775710