R5.0-micro-services provision - config api container keeps restarting because of zk connections.

Bug #1756829 reported by Ritam Gangopadhyay
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
Trunk
Invalid
Critical
Ritam Gangopadhyay

Bug Description

Setup:-

On nodec7 -- /root/contrail-ansible-deployer/config/instances.yaml

config api container keeps restarting due to unavailable zk connections on 2181 and reports collector connection down:-

03/19/2018 09:09:08 AM [contrail-api]: SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = nodec7:Config:contrail-api:0 client_info = << status = Idle successful_connections = 0 pid = 1 http_port = 8084 start_time = 1521450548867617 collector_name = collector_ip = collector_list = [ 192.168.192.6:8086, 192.168.192.5:8086, 192.168.192.7:8086, ] >> sm_queue_count = 2 max_sm_queue_count = 2 >>
03/19/2018 09:09:08 AM [contrail-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec7 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Zookeeper name = Zookeeper server_addrs = [ 192.168.192.6:2181, 192.168.192.5:2181, 192.168.192.7:2181, ] status = Up description = >>, << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, ] description = Collector connection down >>, ] >>

The docker log for config api produces this traceback:-

A problem occurred in a Python script. Here is the sequence of
function calls leading up to the error, in the order they occurred.

 /usr/bin/contrail-api in <module>()
    6
    7 if __name__ == '__main__':
    8 sys.exit(
    9 load_entry_point('contrail-api-server==0.1dev', 'console_scripts', 'contrail-api')()
   10 )
load_entry_point = <function load_entry_point>

 /usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_cfg_api_server.py in server_main(args_str=None)
 4323 vnc_cgitb.enable(format='text')
 4324
 4325 main(args_str, VncApiServer(args_str))
 4326 #server_main
 4327
global main = <function main>
args_str = None
global VncApiServer = <class 'vnc_cfg_api_server.vnc_cfg_api_server.VncApiServer'>

 /usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_cfg_api_server.py in __init__(self=<vnc_cfg_api_server.vnc_cfg_api_server.VncApiServer object>, args_str='--conf_file /etc/contrail/contrail-api.conf --co...ontrail/contrail-keystone-auth.conf --worker_id 0')
 1701 else:
 1702 self._db_connect(self._args.reset_config)
 1703 self._db_init_entries()
 1704
 1705 # ZK quota counter initialization
self = <vnc_cfg_api_server.vnc_cfg_api_server.VncApiServer object>
self._db_init_entries = <bound method VncApiServer._db_init_entries of <...i_server.vnc_cfg_api_server.VncApiServer object>>

 /usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_cfg_api_server.py in _db_init_entries(self=<vnc_cfg_api_server.vnc_cfg_api_server.VncApiServer object>)
 3014 # create singleton defaults if they don't exist already in db
 3015 gsc = self.create_singleton_entry(GlobalSystemConfig(
 3016 autonomous_system=64512, config_version=CONFIG_VERSION))
 3017 gvc = self.create_singleton_entry(GlobalVrouterConfig(
 3018 parent_obj=gsc))
autonomous_system undefined
config_version undefined
global CONFIG_VERSION = '1.0'

 /usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_cfg_api_server.py in create_singleton_entry(self=<vnc_cfg_api_server.vnc_cfg_api_server.VncApiServer object>, singleton_obj=<vnc_cfg_api_server.gen.resource_common.GlobalSystemConfig object>, user_visible=True)
 3291 # for singleton START
 3292 try:
 3293 cass_uuid = self._db_conn._object_db.fq_name_to_uuid(obj_type, fq_name)
 3294 try:
 3295 zk_uuid = self._db_conn.fq_name_to_uuid(obj_type, fq_name)
cass_uuid undefined
self = <vnc_cfg_api_server.vnc_cfg_api_server.VncApiServer object>
self._db_conn = <vnc_cfg_api_server.vnc_db.VncDbClient object>
self._db_conn._object_db = <vnc_cfg_api_server.vnc_db.VncServerCassandraClient object>
self._db_conn._object_db.fq_name_to_uuid = <function wrapper>
obj_type = 'global_system_config'
fq_name = [u'default-global-system-config']

 /usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py in wrapper(*args=('global_system_config', [u'default-global-system-config']), **kwargs={})
  512
  513 self.start_time = datetime.datetime.now()
  514 return func(*args, **kwargs)
  515 except (AllServersUnavailable, MaximumRetryException) as e:
  516 if self._conn_state != ConnectionStatus.DOWN:
func = <bound method VncServerCassandraClient.fq_name_t...i_server.vnc_db.VncServerCassandraClient object>>
args = ('global_system_config', [u'default-global-system-config'])
kwargs = {}

 /usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py in fq_name_to_uuid(self=<vnc_cfg_api_server.vnc_db.VncServerCassandraClient object>, obj_type='global_system_config', fq_name=[u'default-global-system-config'])
 1424 raise NoIdError('%s %s' % (obj_type, fq_name_str))
 1425 if len(col_infos) > 1:
 1426 raise VncError('Multi match %s for %s' % (fq_name_str, obj_type))
 1427 fq_name_uuid = col_infos.popitem()[0].split(':')
 1428 if obj_type != 'route_target' and fq_name_uuid[:-1] != fq_name:
global VncError = <class 'vnc_api.exceptions.VncError'>
fq_name_str = u'default-global-system-config'
obj_type = 'global_system_config'
<class 'vnc_api.exceptions.VncError'>: Multi match default-global-system-config for global_system_config
    __class__ = <class 'vnc_api.exceptions.VncError'>
    __delattr__ = <method-wrapper '__delattr__' of VncError object>
    __dict__ = {}
    __doc__ = None
    __format__ = <built-in method __format__ of VncError object>
    __getattribute__ = <method-wrapper '__getattribute__' of VncError object>
    __getitem__ = <method-wrapper '__getitem__' of VncError object>
    __getslice__ = <method-wrapper '__getslice__' of VncError object>
    __hash__ = <method-wrapper '__hash__' of VncError object>
    __init__ = <method-wrapper '__init__' of VncError object>
    __module__ = 'vnc_api.exceptions'
    __new__ = <built-in method __new__ of type object>
    __reduce__ = <built-in method __reduce__ of VncError object>
    __reduce_ex__ = <built-in method __reduce_ex__ of VncError object>
    __repr__ = <method-wrapper '__repr__' of VncError object>
    __setattr__ = <method-wrapper '__setattr__' of VncError object>
    __setstate__ = <built-in method __setstate__ of VncError object>
    __sizeof__ = <built-in method __sizeof__ of VncError object>
    __str__ = <method-wrapper '__str__' of VncError object>
    __subclasshook__ = <built-in method __subclasshook__ of type object>
    __unicode__ = <built-in method __unicode__ of VncError object>
    __weakref__ = None
    args = (u'Multi match default-global-system-config for global_system_config',)
    message = u'Multi match default-global-system-config for global_system_config'

The above is a description of an error in a Python program. Here is
the original traceback:

Traceback (most recent call last):
  File "/usr/bin/contrail-api", line 9, in <module>
    load_entry_point('contrail-api-server==0.1dev', 'console_scripts', 'contrail-api')()
  File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_cfg_api_server.py", line 4325, in server_main
    main(args_str, VncApiServer(args_str))
  File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_cfg_api_server.py", line 1703, in __init__
    self._db_init_entries()
  File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_cfg_api_server.py", line 3016, in _db_init_entries
    autonomous_system=64512, config_version=CONFIG_VERSION))
  File "/usr/lib/python2.7/site-packages/vnc_cfg_api_server/vnc_cfg_api_server.py", line 3293, in create_singleton_entry
    cass_uuid = self._db_conn._object_db.fq_name_to_uuid(obj_type, fq_name)
  File "/usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py", line 514, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py", line 1426, in fq_name_to_uuid
    raise VncError('Multi match %s for %s' % (fq_name_str, obj_type))
VncError: Multi match default-global-system-config for global_system_config

Changed in juniperopenstack:
assignee: Abhay Gupte (abhay) → Abhay Joshi (abhayj)
tags: added: config provisioning
Jeba Paulaiyan (jebap)
tags: added: sanityblocker
Revision history for this message
Ramprakash R (ramprakash) wrote :

Re-assigning to config team to take a look. Configs look Ok to me.

Revision history for this message
Édouard Thuleau (ethuleau) wrote :

When API server is initializing, it creates some global default resources if not exists like the global system config named 'default-global-system-config'. For that it looks if the resource was already initialized by fetching fq_name/uuid mapping from the cassandra 'obj_fq_name_table' table of the keyspace 'config_db_uuid'. But here, it seems you have a duplicate entry in that table for key 'global_system_config' and column start with 'default-global-system-config:'

You probably had a concurrent issue when the cluster was bootstrapped, where several API servers initialized global default resources in a mean time. Do you met that issue several times?

@Sachin: I can propose a patch to only make the DB init from the master API server (worker_id = 0) and put slave API servers in waiting state until a resource/event happen (it remains to determine what can provoke that?)

Revision history for this message
Sachin Bansal (sbansal) wrote :

Is this problem reproducible? We would like access to the setup in problem state or at least all zookeeper logs (server and client side).

Revision history for this message
Ritam Gangopadhyay (ritam) wrote :

Not seeing it with build 5.0.0-34. Will close if not seen in further builds.

tags: removed: sanityblocker
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.