Brief Description
-----------------
Due to networking issues PV1 spent many hours with neither controller able to become active. controller-1 kept attempting to go active but would then revert to standby. Appears that during this period patch-alarm-manager processes kept being created but were never cleaned up on failure leaving 2830 processes running.
Severity
--------
Minor
Steps to Reproduce
------------------
Not sure how to reproduce it myself.
Bin Qian described the networking problem:
Looks like it is a network issue. Msg with uuid=09cd9b6b-f098-4468-a81e-f292ce0dd20d was received by controller-0 from a unknown lab over multicast ip 239.1.1.1. TCPDUMP show that the msg was sent from mac 00:1e:67:68:0b:f0.
Similar msg received by controller-1.
Brief Description
-----------------
Due to networking issues PV1 spent many hours with neither controller able to become active. controller-1 kept attempting to go active but would then revert to standby. Appears that during this period patch-alarm-manager processes kept being created but were never cleaned up on failure leaving 2830 processes running.
Severity
--------
Minor
Steps to Reproduce f098-4468- a81e-f292ce0dd2 0d was received by controller-0 from a unknown lab over multicast ip 239.1.1.1. TCPDUMP show that the msg was sent from mac 00:1e:67:68:0b:f0.
------------------
Not sure how to reproduce it myself.
Bin Qian described the networking problem:
Looks like it is a network issue. Msg with uuid=09cd9b6b-
Similar msg received by controller-1.
2019-04- 23T19:39: 35.000 controller-0 sm: debug time[42573.674] log<653628> INFO: sm[88764]: sm_msg.c(367): Message instance (cfaf2360- 1568-4103- 812e-4a64152832 68) changed for node (controller-1), now=09cd9b6b- f098-4468- a81e-f292ce0dd2 0d. 23T19:39: 35.000 controller-0 sm: debug time[42573.674] log<653629> INFO: sm[88764]: sm_msg.c(367): Message instance (09cd9b6b- f098-4468- a81e-f292ce0dd2 0d) changed for node (controller-1), now=cfaf2360- 1568-4103- 812e-4a64152832 68.
2019-04-
Expected Behavior
------------------
There should only be 1 patch-alarm-manager process.
Actual Behavior
----------------
There were 2830 patch-alarm-manager processes.
Reproducibility
---------------
Not sure how likely this scenario is.
System Configuration ------- ------
-------
Multi-node (2+10)
Branch/Pull Time/Commit ------- ------- --
-------
cengn load: 20190421T233001Z
Last Pass
---------
n/a
Timestamp/Logs 23T09:34: 16.923 | 557854 | service-scn | patch-alarm-manager | enabled-active | disabling | disable state requested 23T09:34: 16.930 | 557871 | service-scn | patch-alarm-manager | disabling | disabled | disable success 23T09:34: 21.940 | 557932 | service-scn | patch-alarm-manager | disabled | enabling | enabled-active state requested 23T09:34: 22.405 | 557966 | service-scn | patch-alarm-manager | enabling | enabled-active | enable success 23T09:34: 22.947 | 558010 | service-scn | patch-alarm-manager | enabled-active | disabling | disable state requested 23T09:34: 22.953 | 558041 | service-scn | patch-alarm-manager | disabling | disabled | disable success 23T09:34: 24.443 | 558073 | service-scn | patch-alarm-manager | disabled | enabling | enabled-active state requested 23T09:34: 24.902 | 558077 | service-scn | patch-alarm-manager | enabling | enabled-active | enable success 23T16:14: 50.394 | 965978 | service-scn | patch-alarm-manager | enabled-active | disabling | disable state requested 23T16:14: 50.398 | 965984 | service-scn | patch-alarm-manager | disabling | disabled | disable success 23T16:14: 55.433 | 966005 | service-scn | patch-alarm-manager | disabled | enabling | enabled-active state requested 23T16:14: 55.846 | 966013 | service-scn | patch-alarm-manager | enabling | enabled-active | enable success 23T16:14: 58.981 | 966026 | service-scn | patch-alarm-manager | enabled-active | disabling | disable state requested 23T16:14: 58.985 | 966028 | service-scn | patch-alarm-manager | disabling | disabled | disable success 23T16:15: 01.492 | 966033 | service-scn | patch-alarm-manager | disabled | enabling | enabled-active state requested 23T16:15: 01.877 | 966034 | service-scn | patch-alarm-manager | enabling | enabled-active | enable success 23T16:15: 03.999 | 966040 | service-scn | patch-alarm-manager | enabled-active | disabling | disable state requested
--------------
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
<snip>
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
| 2019-04-
<snip>
date; ps -ef | grep patch-alarm-manager | grep python patch-alarm- manager start patch-alarm- manager start patch-alarm- manager start patch-alarm- manager start patch-alarm- manager start patch-alarm- manager start patch-alarm- manager start patch-alarm- manager start patch-alarm- manager start patch-alarm- manager start patch-alarm- manager start patch-alarm- manager start
Thu May 2 05:05:15 UTC 2019
root 368 1 0 Apr23 ? 00:00:13 python /usr/bin/
root 457 1 0 Apr23 ? 00:00:12 python /usr/bin/
root 468 1 0 Apr23 ? 00:00:14 python /usr/bin/
root 507 1 0 Apr23 ? 00:00:17 python /usr/bin/
root 508 1 0 Apr23 ? 00:00:16 python /usr/bin/
root 516 1 0 Apr23 ? 00:00:16 python /usr/bin/
root 527 1 0 Apr23 ? 00:00:19 python /usr/bin/
<snip>
root 147355 1 0 Apr23 ? 00:00:20 python /usr/bin/
root 147357 1 0 Apr23 ? 00:00:11 python /usr/bin/
root 147378 1 0 Apr23 ? 00:00:17 python /usr/bin/
root 147422 1 0 Apr23 ? 00:00:12 python /usr/bin/
root 147427 1 0 Apr23 ? 00:00:16 python /usr/bin/
Test Activity
-------------
System Engineering