After DOR, configuring a route causes interfaces to go down on system controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Don Penney |
Bug Description
Brief Description
-----------------
It was observed that the DC system controllers went for a reboot loop after adding a subcloud. Further investigation showed that the networking scripts on the controllers were empty, which resulted in all the interfaces going down and the system becoming un-usable. The condition which triggers this bug involves an earlier DOR on the system.
From Don Penney:
The route-add runtime manifest is relying on cached networking puppet data from previous manifest apply. For a standard controller, if it reboots without an active controller (simplex controller reboot or a duplex DOR), the manifests do not get applied during the init, so no cached networking puppet data is stored. Then you do a route add, the config script runs, and it sees no interfaces in the puppet data, thinks that means they've all been deleted, and shuts them all down.
This was introduced by: https:/
Severity
--------
Major
Steps to Reproduce
------------------
- Setup a duplex DC system controllers (or just duplex controllers)
- Perform a DOR
- In DC, add a new subcloud which will add a new route on the system controller
- If testing on a non-DC system, add a route using the system CLI cmd
Expected Behavior
------------------
system continues to be usable
Actual Behavior
----------------
The networking scripts are removed from the system controller, resulting in it going into a reboot loop
Reproducibility
---------------
Reproducible when following the exact steps above
System Configuration
-------
Duplex controllers or DC system
Branch/Pull Time/Commit
-------
stx master, but issue exists in stx.4.0 as well as the code introducing the issue was introduced in that release
Last Pass
---------
This particular test was never intentionally run previously.
Timestamp/Logs
--------------
Test Activity
-------------
DC lab usage
Workaround
----------
none
Changed in starlingx: | |
assignee: | nobody → Don Penney (dpenney) |
tags: | added: stx.config stx.networking |
description: | updated |
tags: | added: stx.5.0 |
Marking for both stx.5.0 & stx.4.0 given the system is not recoverable when the issue is hit.