2830 patch-alarm-manager processes on active controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Don Penney |
Bug Description
Brief Description
-----------------
Due to networking issues PV1 spent many hours with neither controller able to become active. controller-1 kept attempting to go active but would then revert to standby. Appears that during this period patch-alarm-manager processes kept being created but were never cleaned up on failure leaving 2830 processes running.
Severity
--------
Minor
Steps to Reproduce
------------------
Not sure how to reproduce it myself.
Bin Qian described the networking problem:
Looks like it is a network issue. Msg with uuid=09cd9b6b-
Similar msg received by controller-1.
2019-04-
2019-04-
Expected Behavior
------------------
There should only be 1 patch-alarm-manager process.
Actual Behavior
----------------
There were 2830 patch-alarm-manager processes.
Reproducibility
---------------
Not sure how likely this scenario is.
System Configuration
-------
Multi-node (2+10)
Branch/Pull Time/Commit
-------
cengn load: 20190421T233001Z
Last Pass
---------
n/a
Timestamp/Logs
--------------
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
<snip>
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
<snip>
date; ps -ef | grep patch-alarm-manager | grep python
Thu May 2 05:05:15 UTC 2019
root 368 1 0 Apr23 ? 00:00:13 python /usr/bin/
root 457 1 0 Apr23 ? 00:00:12 python /usr/bin/
root 468 1 0 Apr23 ? 00:00:14 python /usr/bin/
root 507 1 0 Apr23 ? 00:00:17 python /usr/bin/
root 508 1 0 Apr23 ? 00:00:16 python /usr/bin/
root 516 1 0 Apr23 ? 00:00:16 python /usr/bin/
root 527 1 0 Apr23 ? 00:00:19 python /usr/bin/
<snip>
root 147355 1 0 Apr23 ? 00:00:20 python /usr/bin/
root 147357 1 0 Apr23 ? 00:00:11 python /usr/bin/
root 147378 1 0 Apr23 ? 00:00:17 python /usr/bin/
root 147422 1 0 Apr23 ? 00:00:12 python /usr/bin/
root 147427 1 0 Apr23 ? 00:00:16 python /usr/bin/
Can provide full logs on request.
Test Activity
-------------
System Engineering
summary: |
- 2830 patch-alarm-manager proceses on active controller + 2830 patch-alarm-manager processes on active controller |
description: | updated |
tags: | added: stx.update |
Marking stx.2.0 gating as this is resource leak and can prevent system to function properly.