contrail-status shows contrail-analytics-api as initializing when one of the analytics nodes is rebooted or shutdown

Bug #1733027 reported by Sai Chakravarthy Alikapati
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.0
Won't Fix
High
Zhiqiang Cui
R4.1
Won't Fix
High
Zhiqiang Cui
R5.0
Won't Fix
High
Zhiqiang Cui
Trunk
Fix Committed
High
Zhiqiang Cui

Bug Description

Analytics Nodes: 10.87.121.78, 10.87.121.81, 10.87.121.84

Shutting down or Rebooting the 10.87.121.78 node or a service in it is affecting the services in the other nodes. These Nodes are a part of a Scale set up. contrail-topology, contrail-alarm-gen and contrail-api are the affected services. contrail-topology timeout happens and the other two services go to initializing state with a connection down message beside.

== Contrail Analytics ==
contrail-alarm-gen initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-api initializing (UvePartitions:UVE-Aggregation[Partitions:15], Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology timeout

Revision history for this message
Arvind (arvindv) wrote :

"UvePartitions:UVE-Aggregation[Partitions:15] connection down"

This message is transient and goes away after the partitions get rebalanced. The output below shows, that as I bring down the analytics container in one of the nodes; contrail-status
transitions from initializing to active:

root@5b7s1-4-vm2(analytics):/# date;contrail-status
Tue Nov 21 22:01:27 UTC 2017
== Contrail Analytics ==
contrail-alarm-gen active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

root@5b7s1-1-vm2:~# date;docker stop analytics
Tue Nov 21 22:01:15 UTC 2017
analytics

root@5b7s1-4-vm2(analytics):/# date;contrail-status
Tue Nov 21 22:01:32 UTC 2017
== Contrail Analytics ==
contrail-alarm-gen initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-api initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology timeout

root@5b7s1-4-vm2(analytics):/# date;contrail-status
Tue Nov 21 22:01:43 UTC 2017
== Contrail Analytics ==
contrail-alarm-gen initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-api initializing (UvePartitions:UVE-Aggregation[Partitions:15], Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology timeout

root@5b7s1-4-vm2(analytics):/# date;contrail-status
Tue Nov 21 22:01:53 UTC 2017
== Contrail Analytics ==
contrail-alarm-gen initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-api initializing (UvePartitions:UVE-Aggregation[Partitions:21], Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology initializing

root@5b7s1-4-vm2(analytics):/# date;contrail-status
Tue Nov 21 22:02:02 UTC 2017
== Contrail Analytics ==
contrail-alarm-gen initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-api initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

Revision history for this message
Arvind (arvindv) wrote :

Redis-UVE:172.17.90.2:6381[None] connection down
message will be seen in contrail-status as long as the redis is down in one of the nodes.
This is by design. Once all the redis-server instances are up, this message will go away.

root@5b7s1-1-vm2:~# date;docker stop analytics
Tue Nov 21 22:01:15 UTC 2017
analytics

root@5b7s1-4-vm2(analytics):/# date;contrail-status
Tue Nov 21 22:02:07 UTC 2017
== Contrail Analytics ==
contrail-alarm-gen initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-api initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

root@5b7s1-4-vm2(analytics):/# date;contrail-status
Tue Nov 21 22:02:27 UTC 2017
== Contrail Analytics ==
contrail-alarm-gen initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-api initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology timeout

root@5b7s1-1-vm2:~# date;docker start analytics
Tue Nov 21 22:02:24 UTC 2017
analytics

root@5b7s1-4-vm2(analytics):/# date;contrail-status
Tue Nov 21 22:03:07 UTC 2017
== Contrail Analytics ==
contrail-alarm-gen initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-api initializing (Redis-UVE:172.17.90.2:6381[None] connection down)
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology timeout

root@5b7s1-4-vm2(analytics):/# date;contrail-status
Tue Nov 21 22:03:12 UTC 2017
== Contrail Analytics ==
contrail-alarm-gen active
contrail-analytics-api initializing (UvePartitions:UVE-Aggregation[Partitions:24] connection down)
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

Revision history for this message
Arvind (arvindv) wrote :

1) UvePartitions:UVE-Aggregation[Partitions:15] connection down

This will be a transient message and it happens because of trying to own
partitions that belong to the Alarmgen of the node that went down. Its
an issue if the message persists, but that was not the case; so, we can live
with this message.

2) Redis-UVE:172.17.90.2:6381[None] connection down
This message will appear as long as the analytics node owning the redis
is down. We update the list of redis servers as long as we did not
terminate them with SIGHUP. This list is static for a reason and contrail-status
will show this message as long as that redis-server is down.

Changed in juniperopenstack:
status: New → Invalid
Revision history for this message
Sundaresan Rajangam (srajanga) wrote :

We shouldn't show the contrail-analytics-api and contrail-alarm-gen as initializing if only of the analytics nodes goes down.

Changed in juniperopenstack:
status: Invalid → New
summary: - Shutting down or Rebooting one analytic node effects the service on
- another analytic node in a scale setup
+ Shutting down or Rebooting one analytics node effects the service on
+ another analytics node
Vineet Gupta (vineetrf)
tags: added: releasenote
summary: - Shutting down or Rebooting one analytics node effects the service on
- another analytics node
+ contrail-status shows contrail-analytics-api as initializing when one of
+ the analytics nodes is rebooted or shutdown
Revision history for this message
Sundaresan Rajangam (srajanga) wrote :

Release-note: In a multi-node analytics cluster, when one of the analytics nodes is shutdown or rebooted, then the contrail-status on the other analytics nodes shows the status of contrail-analytics-api and contrail-alarm-gen as initializing with error message "Redis-UVE:<ip-address of analytics node that was shutdown/rebooted>:6381[None] connection down". This does not affect the functionality of the analytics cluster.

Jeba Paulaiyan (jebap)
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/44339
Submitter: Zhiqiang Cui (<email address hidden>)

1 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/44370
Submitter: Zhiqiang Cui (<email address hidden>)

1 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/44372
Submitter: Zhiqiang Cui (<email address hidden>)

3 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/44370
Submitter: Zhiqiang Cui (<email address hidden>)

1 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/44339
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/44372
Submitter: Zhiqiang Cui (<email address hidden>)

1 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/44482
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/44483
Submitter: Zhiqiang Cui (<email address hidden>)

1 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/44370
Submitter: Zhiqiang Cui (<email address hidden>)

1 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/44339
Submitter: Zhiqiang Cui (<email address hidden>)

1 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/44370
Submitter: Zhiqiang Cui (<email address hidden>)

1 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/44482
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/44370
Submitter: Zhiqiang Cui (<email address hidden>)

5 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/44339
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/44372
Submitter: Zhiqiang Cui (<email address hidden>)

1 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/44482
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/44483
Submitter: Zhiqiang Cui (<email address hidden>)

68 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/44482
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/44483
Submitter: Zhiqiang Cui (<email address hidden>)

5 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/44482
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/44339
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/44482
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/44339
Submitter: Zhiqiang Cui (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/44482
Committed: http://github.com/Juniper/contrail-common/commit/d1e0ff0fe2832961b9f2decce02f30a5149582d2
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit d1e0ff0fe2832961b9f2decce02f30a5149582d2
Author: zcui <email address hidden>
Date: Tue Jul 3 22:38:10 2018 -0700

Need API to create/delete zookeeper node.

Solution:
Added zookeeper client code to create/delete znode.
Znode type of child is ephemeral(parent bening persistent) because at
end of session which created the znode, we want to delete it.

expand zookeeper C++ API to support:
Create node
Delete node
Check node exist or not
Shutdown session

Use Case:
Contrail collector publishing collector ip list to zookeeper
which alarmgen/analytics_api can use.

Change-Id: I4090bbe68614c5e4fa673ba3ceb065cc4c2e6be3
Partial-bug: 1733027

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/44372
Submitter: Zhiqiang Cui (<email address hidden>)

1 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/44483
Submitter: Zhiqiang Cui (<email address hidden>)

3 comments hidden view all 127 comments
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/44483
Committed: http://github.com/Juniper/contrail-analytics/commit/fc282fce06a322ca7b831be051afa60664470017
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit fc282fce06a322ca7b831be051afa60664470017
Author: zcui <email address hidden>
Date: Wed Jul 4 21:44:48 2018 -0700

Add contrail-collector to zookeeper

Problem description:
To multi nodes cluster, one node down, all other nodes'
analytics_api/alarmgen state is initing but no active

Solution:
Currently we use redis to get both redis state and collecor state.
We need seperate them.
The solution is add collector to zookeeper, analytics_api/alarmgen
to get collector state with zookeeper, and use redis ping to monitor
redis state.
The znode is like
/analytics-discovery-/Collector/{collector_ip1}
/analytics-discovery-/Collector/{collector_ip2}
......
/analytics-discovery-/Collector is PERSISTENT znode, {collector_ip1}
is EPHEMERAL znode, and znode value is
{"hostname": hostname,
"instance_id": process_id,
"ip_address": collector ip,
"module_id":"contrail-collector",
"type_name":"Analytics"}

This is second commit to implment in contrail-analytics

Test case:
3 nodes cluster: n1, n2, n3 run contrail-analytics
(1) n1 stop redis-server:
result: all analytics-api/alarm-gen in initializing state,
n1 collector in initializing state
(2) n1 stop collector:
result: n1 analytics-api/alarm-gen in initializing state
all other nodes analytics-api/alarm-gen is active
(3) n1 start redis-server:
result: all analytics-api/alarm-gen in active state
(4) n1 start collector:
result: all analytics-api/alarm-gen in active state
(5) shudown n1:
result: all other analytics-api/alarm-gen in active state

Pending issue 1:
when alarmgen state changed from active to init, will not update
AlarmgenPartition UVE. It will be fixed seperately.(Bug #1794632)
Peding issue 2:
Shutdown one server lead kafka lib callback to down other server's
alarmgen. (Bug #1794904)

Closes-bug: 1733027

Conflicts:
 contrail-collector/main.cc
 contrail-collector/viz_collector.cc
 contrail-collector/viz_collector.h

Change-Id: I46c97553b303cdaa85ac3cac06587690a3771e44

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

4.1 and 5.0 reviews were abandoned. Hence marking the scope as New

Displaying first 40 and last 40 comments. View all 127 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.