Build 2686: Analytics services not able to connect to remote collector when local collector goes down

Bug #1528770 reported by Ankit Jain
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Fix Committed
Medium
Sundaresan Rajangam

Bug Description

Two problems here:

Problem 1) All analytics services become non functional when collector running on the analytics node goes down even though other analytics nodes in the system are fully functional

contrail-collector down here and analytics services not able to connect to remote collector

Contrail status on local collector:
== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen active
contrail-analytics-api initializing (Collector connection down)
contrail-analytics-nodemgr active
contrail-collector inactive
contrail-query-engine initializing (Collector connection down)
contrail-snmp-collector initializing (Collector connection down)
contrail-topology initializing (Collector connection down)

Contrail status on remote collector:

root@nodeg13:~# contrail-status
== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-config-nodemgr active
contrail-device-manager active
contrail-discovery:0 active
contrail-schema active
contrail-svc-monitor active
ifmap active

== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-webui-middleware active

== Contrail Database ==
contrail-database: active
supervisor-database: active
contrail-database-nodemgr active
kafka active

== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active

Problem 2) ProcessConnectivity alarms not shown for the non functional processes of the local analytics node
Problem 2 appearing because of problem 1.

http://nodea21:8081/analytics/alarms

"analytics-node": [

{

    "name": "nodea21",
    "value":

{

    "UVEAlarms":

{

    "alarms":

[

{

    "severity": ​3,
    "ack": false,
    "timestamp": ​1450853232369965,
    "rules":

[

{

    "oper": "!=",
    "operand1":

{

    "name": "NodeStatus.process_info[].process_state",
    "json_value": "{\"process_name\": \"contrail-collector\", \"process_state\": \"PROCESS_STATE_STOPPED\", \"last_stop_time\": \"1450853231579217\", \"start_count\": 1, \"core_file_list\": [], \"last_start_time\": \"1450853098215246\", \"stop_count\": 1, \"last_exit_time\": null, \"exit_count\": 0}"

},
"operand2":

            {
                "json_value": "\"PROCESS_STATE_RUNNING\""
            }
        }
    ],
    "token": "eyJ0aW1lc3RhbXAiOiAxNDUwODUzMjMyMzY5OTY1LCAiaHR0cF9wb3J0IjogNTk5NSwgImhvc3RfaXAiOiAiMTAuMjA0LjIxNy41MyJ9",
    "type": "ProcessStatus"

},
{

    "severity": ​4,
    "ack": false,
    "timestamp": ​1450853244854531,
    "rules":

[

{

    "oper": "!",
    "operand1":

                                    {
                                        "name": "CollectorState",
                                        "json_value": "null"
                                    }
                                }
                            ],
                            "token": "eyJ0aW1lc3RhbXAiOiAxNDUwODUzMjQ0ODU0NTMxLCAiaHR0cF9wb3J0IjogNTk5NSwgImhvc3RfaXAiOiAiMTAuMjA0LjIxNy41MyJ9",
                            "type": "PartialSysinfoAnalytics"
                        }
                    ]
                }
            }
        }
    ]

}

Ankit Jain (ankitja)
tags: added: alerts analytics
information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/16012
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/16014
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/16014
Committed: http://github.org/Juniper/contrail-controller/commit/90a75d502717195d90bef2766ecc2c1bda54450f
Submitter: Zuul
Branch: master

commit 90a75d502717195d90bef2766ecc2c1bda54450f
Author: Sundaresan Rajangam <email address hidden>
Date: Thu Dec 24 21:01:19 2015 -0800

Let Analytics services connect to any collector based on discovery

Presently, the analytics services such as analytics-api, query-engine,
snmp-collector and topology always connect to local collector. If the
local collector service is down, then these services do not connect to
other active collector service, if present.

Side-effect of this behavior:
If the collector service is down, then the PRouterEntry UVEs and the
PRouterLinkEntry UVEs originated by the snmp-collector and the topology
service respectively would be missing from the /analytics/uves/prouter
output.

Fix:
analytics-api, query-engine, snmp-collector and topology should get the
collector list from the discovery service instead of connecting to the
local collector service.

Change-Id: I62074e8516f4e7eb88ac1d1c727067e87578794f
Partial-Bug: #1528770

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/16099
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/16239
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/16239
Committed: http://github.org/Juniper/contrail-controller/commit/57874e3463e40be4501afce4c8d8349f39bc9c3a
Submitter: Zuul
Branch: master

commit 57874e3463e40be4501afce4c8d8349f39bc9c3a
Author: Sundaresan Rajangam <email address hidden>
Date: Tue Jan 12 08:46:46 2016 -0800

Set expected_connections in query-engine based on collector conf

Handle collector connection in query-engine as below:
- If collector list is configured, then query-engine should connect to
one of the collectors in the list irrespective of whether discovery
service is configured or not.
- If collector list is not configured and discovery server is
configured, then query-engine should subscribe to collector service and
connect to one of the collectors provided by the discovery server.
- If both collector list and discovery server are not configured, then
connect to the local collector.

expected_connections in ConnectionStateManager::Init should be set
appropriately based on the collector and discovery server config.

Change-Id: Idee0fa9c3fba76ce106e2f3152b033af4b057d7d
Partial-Bug: #1528770

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/16525
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/16570
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/16525
Committed: http://github.org/Juniper/contrail-provisioning/commit/72656ac91cecff44470a283888900219cf8dff64
Submitter: Zuul
Branch: master

commit 72656ac91cecff44470a283888900219cf8dff64
Author: Sundaresan Rajangam <email address hidden>
Date: Tue Jan 26 12:46:05 2016 -0800

Don't provision analytics services to connect to local Collector

o Presently, contrail-query-engine and contrail-analytics-api services
are provisioned to connect to local collector. Comment the collector
list from contrail-query-engine.conf and contrail-analytics-api.conf
so that they can connect to the collector provided by the discovery
service.

o Provision discovery ip address in contrail-query-engine.conf and
contrail-topology.conf

Change-Id: I902193178594b8ecb8c52dc53ccfea931b43a928
Partial-Bug: #1528770

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/16570
Committed: http://github.org/Juniper/contrail-puppet/commit/01f67916db949d11588bb3a52c9f43c499c15bf6
Submitter: Zuul
Branch: master

commit 01f67916db949d11588bb3a52c9f43c499c15bf6
Author: Sundaresan Rajangam <email address hidden>
Date: Wed Jan 27 15:17:54 2016 -0800

Don't provision analytics services to connect to local Collector

o Provision discovery ip address in contrail-query-engine.conf and
contrail-topology.conf

o Presently, contrail-query-engine and contrail-analytics-api services
are provisioned to connect to local collector. Don't provision the
collector list for analytics services so that they subscribe for the
collector service with the discovery server.

Change-Id: Ifa5909404d068c781d9de3b22618cb3bc141f4d6
Partial-Bug: #1528770

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/16941
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/16946
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/16946
Committed: http://github.org/Juniper/contrail-puppet/commit/f983e9bad477c9dbec7c36b4f66471403a0f506b
Submitter: Zuul
Branch: master

commit f983e9bad477c9dbec7c36b4f66471403a0f506b
Author: Sundaresan Rajangam <email address hidden>
Date: Fri Feb 5 22:25:26 2016 -0800

Handle removal of collector list from conf on upgrade

o Changes to handle the removal of collector list from
contrail-analytics-api.conf and contrail-query-engine.conf on upgrade.

Change-Id: I51407bfbb20a340edae05ac0eec653d010c015b0
Partial-Bug: #1528770

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/16941
Committed: http://github.org/Juniper/contrail-provisioning/commit/a94112fb06d7dade03dd9916307ace4d87bcb213
Submitter: Zuul
Branch: master

commit a94112fb06d7dade03dd9916307ace4d87bcb213
Author: Sundaresan Rajangam <email address hidden>
Date: Fri Feb 5 18:23:45 2016 -0800

Remove the collector config from analytics services on upgrade to
release >=3.0

o From release 3.0, analytics services no longer connect to the
local collector. All analytics services other than collector
would subscribe for the collector service with discovery server.

o Provision discovery ip address in contrail-query-engine.conf and
contrail-topology.conf

o Presently, contrail-query-engine and contrail-analytics-api services
are provisioned to connect to local collector. Remove the collector list
on upgrade to release >= 3.0

Change-Id: I3acd6002872a6f298746a17c079a2154e24d48ad
Partial-Bug: #1528770

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.