Subclouds out-of-sync indefinitely due to dc-cert_sync_status=unknown after mgmt network connectivity interruption
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Low
|
Jessica Castelino |
Bug Description
Brief Description
-----------------
In a DC env, when management network connectivity is disrupted for some time, cert-mon's audit green threads hang. The online audit requests sent by dcmanager are queued in cert-mon but cert-mon audit doesn't take place. This causes all subclouds to be out-of-sync indefinitely as dc-cert_sync_status remains unknown.
Severity
--------
Minor
Steps to Reproduce
------------------
Can reproduce by breaking MGMT IP connectivity between system controller and subclouds.
Expected Behavior
------------------
cert-mon should audit
Actual Behavior
----------------
cert-mon green threads hang and audit doesn't take place
Reproducibility
---------------
Intermittent
System Configuration
-------
DC
Branch/Pull Time/Commit
-------
Branch and the time when code was pulled or git commit or cengn load info
Last Pass
---------
N/A
Test Activity
-------------
Developer Testing
Workaround
----------
Swact or restart of cert-mon service
Changed in starlingx: | |
assignee: | nobody → Jessica Castelino (jcasteli) |
Fix proposed to branch: master /review. opendev. org/c/starlingx /config/ +/790243
Review: https:/