The confirm_stop function of some OCF scripts has a flaw
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Li Zhu |
Bug Description
Brief Description
-----------------
A bug in DC OCF scripts could lead to orphaned processes of the affected services following service restart or controller swact. The orphan processes would eventually exit. This bug was introduced when switching from CentOS (python2) to Debian (python3).
Severity
--------
Minor
Steps to Reproduce
------------------
- Deploy a DC system with many subclouds
- Soak the system for a few hours
- Restart an audit service such as dcmanager-
Expected Behavior
------------------
The dcmanager-
Actual Behavior
----------------
Sometimes the old worker processes did not get stopped/killed over the restart and become orphans. These processes would linger around, auditing the same subclouds as the new worker processes until their queues are empty. They eventually exit.
Reproducibility
---------------
Infrequently
System Configuration
-------
Disributed Cloud system
Branch/Pull Time/Commit
-------
Apr. 29th, 2024 master load
Last Pass
---------
StarlingX 7.0
Timestamp/Logs
--------------
N/A
Test Activity
-------------
Other - issue discovered by chance
Workaround
----------
Manually kill the orphan processes.
summary: |
- Flaw in the confirm stop of DC OCF scripts + Flaw in some OCF scripts can lead to orphan processes |
summary: |
- Flaw in some OCF scripts can lead to orphan processes + The confirm_stop function of some OCF scripts has a flaw |
Changed in starlingx: | |
importance: | Undecided → Medium |
Changed in starlingx: | |
assignee: | nobody → Tee Ngo (teewrs) |
assignee: | Tee Ngo (teewrs) → Li Zhu (lzhu1) |
tags: | added: stx.10.0 stx.distcloud |
Fix proposed to branch: master /review. opendev. org/c/starlingx /distcloud/ +/917825
Review: https:/