Distributed Cloud: The subcloud standby controller remain offline after install
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Joseph Richard |
Bug Description
Brief Description
-----------------
In a Distributed Cloud system, the subclouds standby controller remains offline after the initial install. After the active side was booted and configured successfully, the standby controller was booted from the activate controller but remained offline, after the boot was completed. As a result, the subcloud install could not be configured
Severity
--------
Minor
Steps to Reproduce
------------------
1) Bring up DC system Controller
2) Attempt to add a multi-node subcloud by booting active controller first
3) Bring up the standby controller by net booting from the Active controller.
4) The standby controller remain offline after the install completes
Expected Behavior
------------------
The subcloud standby controller to become online after initial install
Actual Behavior
----------------
The subcloud standby controller remain offline even after successful install
Reproducibility
---------------
first attempt
System Configuration
-------
All-in-one duplex plus worker, DC system controller
Branch/Pull Time/Commit
-------
2019-12-13_19-03-42
Last Pass
--------
Last successful install on DC was using 2019-12-08_20-00-00
Timestamp/Logs
--------------
Test Activity
-------------
DC install
Workaround
----------
Only issue the subcloud manage command when all nodes on the subcloud are fully configured and are in an unlocked enabled state
Changed in starlingx: | |
assignee: | nobody → Joseph Richard (josephrichard) |
tags: | added: stx.distcloud |
tags: | added: stx.4.0 |
Changed in starlingx: | |
importance: | Undecided → Medium |
status: | New → Triaged |
tags: | added: stx.retestneeded |
tags: | removed: stx.4.0 stx.distcloud stx.retestneeded |
tags: | added: stx.4.0 stx.distcloud |
When a subcloud is managed, this results in the certificates being propagated down to that subcloud. This in turn triggers force-applying the runtime manifest, which will generate hieradata for all nodes, even if the nodes have not been inventoried (by sysinv-agent) yet. This hieradata will have an incomplete interface configuration, with only the loopback interface present.
When those nodes (that were previously not inventoried) finish installing and come up the first time, they will check for the existence of the hieradata, and because it exists, try applying it. This will update the network config with only the loopback interface present, resulting in the node going offline and losing all network connectivity.