Vcenter: contrail-topology failed on one of controller

Bug #1549559 reported by Sarath
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Released
Critical
ted ghose
Trunk
Fix Released
Critical
ted ghose

Bug Description

This is Vcenter nonHA 3 controllers topology.
 When rebooted the controllers, i see this issue on only one of controller. this service never recovered and stay failed.
Please find below the topology logs during the timestamp of event,

root@oblocknode04:~#
root@oblocknode04:~# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr active
contrail-dns active
contrail-named active

== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology failed

== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-config-nodemgr active
contrail-device-manager failed
contrail-discovery:0 active
contrail-schema backup
contrail-svc-monitor failed
ifmap active

== Contrail Database ==
contrail-database: active
supervisor-database: active
contrail-database-nodemgr active
kafka active

== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active

root@oblocknode04:~# ssh root@172.16.80.105

>> topology log

root@oblocknode04:/var/log/contrail# tail -f contrail-topology.log
02/24/2016 05:07:04 PM [contrail-topology]: Cannot write http_port 5921 to /tmp/contrail-topology.1900.http_port
02/24/2016 05:07:04 PM [contrail-topology]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = oblocknode04 process_status = [ << module_id = contrail-topology instance_id = 0 state = Non-Functional connection_infos = [ << type = Discovery name = Collector server_addrs = [ 172.16.80.2:5998, ] status = Initializing description = Subscribe >>, << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Discovery name = ApiServer server_addrs = [ 172.16.80.2:5998, ] status = Up description = Subscribe Response >>, ] description = Discovery:Collector, Collector connection down >>, ] >>
02/24/2016 05:07:04 PM [contrail-topology]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = oblocknode04 process_status = [ << module_id = contrail-topology instance_id = 0 state = Non-Functional connection_infos = [ << type = Discovery name = Collector server_addrs = [ 172.16.80.2:5998, ] status = Up description = Subscribe Response >>, << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Discovery name = ApiServer server_addrs = [ 172.16.80.2:5998, ] status = Up description = Subscribe Response >>, ] description = Collector connection down >>, ] >>
02/24/2016 05:07:04 PM [contrail-topology]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = oblocknode04 process_status = [ << module_id = contrail-topology instance_id = 0 state = Non-Functional connection_infos = [ << type = Discovery name = Collector server_addrs = [ 172.16.80.2:5998, ] status = Up description = Subscribe Response >>, << type = Collector name = server_addrs = [ 172.16.80.4:8086, ] status = Initializing description = Idle to Connect on EvCollectorChange >>, << type = Discovery name = ApiServer server_addrs = [ 172.16.80.2:5998, ] status = Up description = Subscribe Response >>, ] description = Collector connection down >>, ] >>
02/24/2016 05:07:07 PM [contrail-topology]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = oblocknode04 process_status = [ << module_id = contrail-topology instance_id = 0 state = Non-Functional connection_infos = [ << type = Discovery name = Collector server_addrs = [ 172.16.80.2:5998, ] status = Initializing description = Subscribe >>, << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, ] description = Discovery:Collector, Collector connection down >>, ] >>
02/24/2016 05:07:07 PM [contrail-topology]: Starting Introspect on HTTP Port 5921
02/24/2016 05:07:07 PM [contrail-topology]: Cannot write http_port 5921 to /tmp/contrail-topology.1900.http_port
02/24/2016 05:07:07 PM [contrail-topology]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = oblocknode04 process_status = [ << module_id = contrail-topology instance_id = 0 state = Non-Functional connection_infos = [ << type = Discovery name = Collector server_addrs = [ 172.16.80.2:5998, ] status = Initializing description = Subscribe >>, << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Discovery name = ApiServer server_addrs = [ 172.16.80.2:5998, ] status = Up description = Subscribe Response >>, ] description = Discovery:Collector, Collector connection down >>, ] >>
02/24/2016 05:07:07 PM [contrail-topology]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = oblocknode04 process_status = [ << module_id = contrail-topology instance_id = 0 state = Non-Functional connection_infos = [ << type = Discovery name = Collector server_addrs = [ 172.16.80.2:5998, ] status = Up description = Subscribe Response >>, << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Discovery name = ApiServer server_addrs = [ 172.16.80.2:5998, ] status = Up description = Subscribe Response >>, ] description = Collector connection down >>, ] >>
02/24/2016 05:07:07 PM [contrail-topology]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = oblocknode04 process_status = [ << module_id = contrail-topology instance_id = 0 state = Non-Functional connection_infos = [ << type = Discovery name = Collector server_addrs = [ 172.16.80.2:5998, ] status = Up description = Subscribe Response >>, << type = Collector name = server_addrs = [ 172.16.80.4:8086, ] status = Initializing description = Idle to Connect on EvCollectorChange >>, << type = Discovery name = ApiServer server_addrs = [ 172.16.80.2:5998, ] status = Up description = Subscribe Response >>, ] description = Collector connection down >>, ] >>

>>topology stdout logs

27.0.0.1 - - [24/Feb/2016 16:19:50] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 2512
127.0.0.1 - - [24/Feb/2016 16:32:06] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 2512
127.0.0.1 - - [24/Feb/2016 16:33:01] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 2512
127.0.0.1 - - [24/Feb/2016 16:33:02] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 2512
127.0.0.1 - - [24/Feb/2016 16:33:02] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 2512
127.0.0.1 - - [24/Feb/2016 16:42:53] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 2512
127.0.0.1 - - [24/Feb/2016 16:51:51] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 2512
127.0.0.1 - - [24/Feb/2016 16:51:52] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 2512
127.0.0.1 - - [24/Feb/2016 16:51:53] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 2512
No handlers could be found for logger "kazoo.client"
02/24/2016 05:05:42 PM [contrail-topology]: SANDESH: CONNECT TO COLLECTOR: True
02/24/2016 05:05:42 PM [contrail-topology]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
Traceback (most recent call last):
  File "/usr/bin/contrail-topology", line 9, in <module>
    load_entry_point('contrail-topology==0.1.0', 'console_scripts', 'contrail-topology')()
  File "/usr/lib/python2.7/dist-packages/contrail_topology/main.py", line 17, in main
    controller = setup_controller(args or ' '.join(sys.argv[1:]))
  File "/usr/lib/python2.7/dist-packages/contrail_topology/main.py", line 14, in setup_controller
    return Controller(config)
  File "/usr/lib/python2.7/dist-packages/contrail_topology/controller.py", line 25, in __init__
    self._vnc = self._config.vnc_api()
  File "/usr/lib/python2.7/dist-packages/contrail_topology/config.py", line 260, in vnc_api
    raise e
SystemError: Cant connect to API server
02/24/2016 05:07:00 PM [contrail-topology]: SANDESH: CONNECT TO COLLECTOR: True
02/24/2016 05:07:00 PM [contrail-topology]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
Traceback (most recent call last):
  File "/usr/bin/contrail-topology", line 9, in <module>
    load_entry_point('contrail-topology==0.1.0', 'console_scripts', 'contrail-topology')()
  File "/usr/lib/python2.7/dist-packages/contrail_topology/main.py", line 17, in main
    controller = setup_controller(args or ' '.join(sys.argv[1:]))
  File "/usr/lib/python2.7/dist-packages/contrail_topology/main.py", line 14, in setup_controller
    return Controller(config)
  File "/usr/lib/python2.7/dist-packages/contrail_topology/controller.py", line 25, in __init__
    self._vnc = self._config.vnc_api()
  File "/usr/lib/python2.7/dist-packages/contrail_topology/config.py", line 260, in vnc_api
    raise e
SystemError: Cant connect to API server
02/24/2016 05:07:01 PM [contrail-topology]: SANDESH: CONNECT TO COLLECTOR: True
02/24/2016 05:07:01 PM [contrail-topology]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
Traceback (most recent call last):
  File "/usr/bin/contrail-topology", line 9, in <module>
    load_entry_point('contrail-topology==0.1.0', 'console_scripts', 'contrail-topology')()
  File "/usr/lib/python2.7/dist-packages/contrail_topology/main.py", line 17, in main
    controller = setup_controller(args or ' '.join(sys.argv[1:]))
  File "/usr/lib/python2.7/dist-packages/contrail_topology/main.py", line 14, in setup_controller
    return Controller(config)
  File "/usr/lib/python2.7/dist-packages/contrail_topology/controller.py", line 25, in __init__
    self._vnc = self._config.vnc_api()
  File "/usr/lib/python2.7/dist-packages/contrail_topology/config.py", line 260, in vnc_api
    raise e
SystemError: Cant connect to API server
02/24/2016 05:07:04 PM [contrail-topology]: SANDESH: CONNECT TO COLLECTOR: True
02/24/2016 05:07:04 PM [contrail-topology]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
Traceback (most recent call last):
  File "/usr/bin/contrail-topology", line 9, in <module>
    load_entry_point('contrail-topology==0.1.0', 'console_scripts', 'contrail-topology')()
  File "/usr/lib/python2.7/dist-packages/contrail_topology/main.py", line 17, in main
    controller = setup_controller(args or ' '.join(sys.argv[1:]))
  File "/usr/lib/python2.7/dist-packages/contrail_topology/main.py", line 14, in setup_controller
    return Controller(config)
  File "/usr/lib/python2.7/dist-packages/contrail_topology/controller.py", line 25, in __init__
    self._vnc = self._config.vnc_api()
  File "/usr/lib/python2.7/dist-packages/contrail_topology/config.py", line 260, in vnc_api
    raise e
SystemError: Cant connect to API server
02/24/2016 05:07:07 PM [contrail-topology]: SANDESH: CONNECT TO COLLECTOR: True
02/24/2016 05:07:07 PM [contrail-topology]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
Traceback (most recent call last):
  File "/usr/bin/contrail-topology", line 9, in <module>
    load_entry_point('contrail-topology==0.1.0', 'console_scripts', 'contrail-topology')()
  File "/usr/lib/python2.7/dist-packages/contrail_topology/main.py", line 17, in main
    controller = setup_controller(args or ' '.join(sys.argv[1:]))
  File "/usr/lib/python2.7/dist-packages/contrail_topology/main.py", line 14, in setup_controller
    return Controller(config)
  File "/usr/lib/python2.7/dist-packages/contrail_topology/controller.py", line 25, in __init__
    self._vnc = self._config.vnc_api()
  File "/usr/lib/python2.7/dist-packages/contrail_topology/config.py", line 260, in vnc_api

root@oblocknode04:~#
root@oblocknode04:~# contrail-version
Package Version Build-ID | Repo | Package Name
-------------------------------------- ------------------------------ ----------------------------------
contrail-analytics 3.0.0.0-2717 2717
contrail-config 3.0.0.0-2717 2717
contrail-control 3.0.0.0-2717 2717
contrail-dns 3.0.0.0-2717 2717
contrail-docs 3.0.0.0-2717 2717
contrail-f5 3.0.0.0-2717 2717
contrail-fabric-utils 3.0.0.0-2717 2717
contrail-install-packages 3.0.0.0-2717~vcenter 2717
contrail-install-vcenter-plugin 3.0.0.0-02242016 2717
contrail-lib 3.0.0.0-2717 2717
contrail-nodemgr 3.0.0.0-2717 2717
contrail-openstack-analytics 3.0.0.0-2717 2717
contrail-openstack-control 3.0.0.0-2717 2717
contrail-openstack-database 3.0.0.0-2717 2717
contrail-setup 3.0.0.0-2717 2717
contrail-utils 3.0.0.0-2717 2717
contrail-vmware-config 3.0.0.0-2717 2717
ifmap-python-client 0.1-2 2717
ifmap-server 0.3.2-1contrail2 2717
python-contrail 3.0.0.0-2717 2717
root@oblocknode04:~#

Revision history for this message
Sarath (nsarath) wrote :

-bash-4.1$
-bash-4.1$ pwd
/cs-shared/bugs/1549559
-bash-4.1$
-bash-4.1$ ls -l
total 2345500
-rwxrwxrwx 1 nsarath test 860743680 Feb 24 17:43 Ctrl-A-log.tar*
-rwxrwxrwx 1 nsarath test 717701120 Feb 24 17:43 Ctrl-B-log.tar*
-rwxrwxrwx 1 nsarath test 681738240 Feb 24 17:43 Ctrl-C-log.tar*
-rwxrwxrwx 1 nsarath test 17868800 Feb 24 17:43 Vrtr-0-log.tar*
-rwxrwxrwx 1 nsarath test 26480640 Feb 24 17:43 Vrtr-1-log.tar*
-rwxrwxrwx 1 nsarath test 18012160 Feb 24 17:43 Vrtr-3-log.tar*
-rwxrwxrwx 1 nsarath test 17387520 Feb 24 17:43 Vrtr-4-log.tar*
-rwxrwxrwx 1 nsarath test 17500160 Feb 24 17:43 Vrtr-5-log.tar*
-rwxrwxrwx 1 nsarath test 17469440 Feb 24 17:43 Vrtr-7-log.tar*
-rwxrwxrwx 1 nsarath test 17408000 Feb 24 17:43 Vrtr-8-log.tar*
-bash-4.1$

Raj Reddy (rajreddy)
Changed in juniperopenstack:
assignee: Raj Reddy (rajreddy) → ted ghose (rintu)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/17852
Submitter: ted ghose (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/17853
Submitter: ted ghose (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/17853
Committed: http://github.org/Juniper/contrail-controller/commit/7fefd118772578a49b6bfc77b537be699bc752a2
Submitter: Zuul
Branch: R3.0

commit 7fefd118772578a49b6bfc77b537be699bc752a2
Author: Ted Ghose <email address hidden>
Date: Thu Feb 25 12:22:13 2016 -0800

contrail-topology failed on no api-server

Discovery was yet to suppy the api server list during boot up, on
a multinode setup, causing topology to raise an unhandeled exception

Change-Id: Ib841f0867b3073b25acf9dd9f5cf40e84e2e9f5f
Closes-Bug: 1549559

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/17852
Committed: http://github.org/Juniper/contrail-controller/commit/0b83d1ee9d808b7168ce7f4827817d18c8a06b5b
Submitter: Zuul
Branch: master

commit 0b83d1ee9d808b7168ce7f4827817d18c8a06b5b
Author: Ted Ghose <email address hidden>
Date: Thu Feb 25 12:22:13 2016 -0800

contrail-topology failed on no api-server

Discovery was yet to suppy the api server list during boot up, on
a multinode setup, causing topology to raise an unhandeled exception

Change-Id: I76fb7683fdcccccd58a8850cb1d317b027d91db1
Closes-Bug: 1549559

Revision history for this message
Sarath (nsarath) wrote :

verified on build #2723

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.