K8S: Analytics services fail when 3 node HA setup is brought up using single yaml

Bug #1735874 reported by Sachchidanand Vaidya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.1
Fix Committed
High
Sundaresan Rajangam
Trunk
Fix Committed
High
Sundaresan Rajangam

Bug Description

3 Node HA setup with Kubernetes : bringup/provision using single yaml file.

(Ubuntu 16.04.2 LTS )

== Contrail Analytics ==
contrail-collector: active
contrail-analytics-api: active
contrail-query-engine: active
contrail-alarm-gen: initializing (Database:Cassandra[] connection down)
contrail-snmp-collector: initializing (Database:Cassandra[] connection down)
contrail-topology: initializing (Database:Cassandra[] connection down)

"contrail-alarm-gen", "contrail-snmp-collector" & "contrail-topology" services remain in initializing state on all 3 analytics nodes.

contrail-alarm-gen log shows following exception:

Dec 01 13:50:59 testbed-1-vm1 systemd[1]: Stopped "Contrail Alarm-gen".
Dec 01 13:50:59 testbed-1-vm1 systemd[1]: Started "Contrail Alarm-gen".
Dec 01 13:51:00 testbed-1-vm1 contrail-alarm-gen[3017]: 12/01/2017 01:51:00 PM [contrail-alarm-gen]: SANDESH: CONNECT TO COLLECTOR: True
Dec 01 13:51:00 testbed-1-vm1 contrail-alarm-gen[3017]: 12/01/2017 01:51:00 PM [contrail-alarm-gen]: Failed to import package "sandesh"
Dec 01 13:51:00 testbed-1-vm1 contrail-alarm-gen[3017]: 12/01/2017 01:51:00 PM [contrail-alarm-gen]: Failed to import package "sandesh"
Dec 01 13:51:00 testbed-1-vm1 contrail-alarm-gen[3017]: 12/01/2017 01:51:00 PM [contrail-alarm-gen]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: Traceback (most recent call last):
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 534, in run
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: result = self._run(*self.args, **self.kwargs)
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: File "/usr/lib/python2.7/dist-packages/opserver/config_handler.py", line 45, in start
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: credential=cassandra_credential)
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: File "/usr/lib/python2.7/dist-packages/cfgm_common/vnc_object_db.py", line 20, in __init__
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: obj_cache_exclude_types)
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: File "/usr/lib/python2.7/dist-packages/cfgm_common/vnc_cassandra.py", line 148, in __init__
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: self._cassandra_init(server_list)
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: File "/usr/lib/python2.7/dist-packages/cfgm_common/vnc_cassandra.py", line 553, in _cassandra_init
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: self._cassandra_init_conn_pools()
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: File "/usr/lib/python2.7/dist-packages/cfgm_common/vnc_cassandra.py", line 638, in _cassandra_init_conn_pools
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: **cf_kwargs)
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: File "/usr/lib/python2.7/dist-packages/pycassa/columnfamily.py", line 284, in __init__
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: self.load_schema()
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: File "/usr/lib/python2.7/dist-packages/pycassa/columnfamily.py", line 312, in load_schema
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: raise nfe
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: NotFoundException: NotFoundException(_message=None, why='Column family obj_shared_table not found.')
Dec 01 13:55:07 testbed-1-vm1 contrail-alarm-gen[3017]: <Greenlet at 0x7f433602fb90: <bound method AlarmGenConfigHandler.start of <opserver.alarmgen_config_handler.AlarmGenCon
Dec 01 14:24:43 testbed-1-vm1 contrail-alarm-gen[3017]: [Introspect:5995]127.0.0.1 - - [2017-12-01 14:24:43] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4835 0.002
Dec 01 14:24:47 testbed-1-vm1 contrail-alarm-gen[3017]: [Introspect:5995]127.0.0.1 - - [2017-12-01 14:24:47] "GET /Snh_SandeshUVECacheReq?x=NodeStatus HTTP/1.1" 200 4835 0.002

-------

There seems to be race condition with creating tables in Cassandra and alarm-gen reading it.

Tags: releasenote
information type: Proprietary → Public
Revision history for this message
Pulkit Tandon (pulkitt) wrote :

Workaround is to restart all the services

tags: added: releasenote
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/38322
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38322
Committed: http://github.com/Juniper/contrail-controller/commit/64ca93f66c4ec4db7cf8976e830202ac7ef4e623
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 64ca93f66c4ec4db7cf8976e830202ac7ef4e623
Author: Sundaresan Rajangam <email address hidden>
Date: Wed Dec 13 16:44:32 2017 -0800

Handle exception thrown by VncObjectDBClient

VncObjectDBClient would throw exception if vnc_cassandra tries to read
the config before the table is created in cassandra. vnc_cassandra
checks only if the keyspace is created. Catch the exception
and call exit.

Change-Id: I2dfb4acc15ae19bc6b7e8b7d7bfa5451130e007a
Closes-Bug: #1735874

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/38505
Submitter: Sundaresan Rajangam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38505
Committed: http://github.com/Juniper/contrail-analytics/commit/42ce5c19065975ae1f28675b96a41f8bcfe0b545
Submitter: Zuul (<email address hidden>)
Branch: master

commit 42ce5c19065975ae1f28675b96a41f8bcfe0b545
Author: Sundaresan Rajangam <email address hidden>
Date: Wed Dec 20 21:28:03 2017 -0800

Handle exception thrown by VncObjectDBClient

VncObjectDBClient would throw exception if vnc_cassandra tries to read
the config before the table is created in cassandra. vnc_cassandra
checks only if the keyspace is created. Catch the exception
and call exit.

Change-Id: I345ca249871ef815229d5b3290c00dbdce6cbeb3
Closes-Bug: #1735874

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.