Open stack HA:mainline3020:contrail analytics process gets stuck at initializing (Collector connection down)
1) Provisioned a cluster with Openstack HA as below
server-manager-client display server --select id,cluster_id,roles,ip_address
+---------+----------------+----------------+----------------------------------------------------------------------------+
| id | cluster_id | ip_address | roles |
+---------+----------------+----------------+----------------------------------------------------------------------------+
| nodec28 | cluster5sanity | 10.204.217.13 | [u'config', u'control', u'collector', u'database', u'webui', u'openstack'] |
| nodeg37 | cluster5sanity | 10.204.217.77 | [u'config', u'control', u'collector', u'database', u'webui', u'openstack'] |
| nodec10 | cluster5sanity | 10.204.217.176 | [u'config', u'control', u'collector', u'database', u'webui', u'openstack'] |
| nodei17 | cluster5sanity | 10.204.217.129 | [u'compute'] |
| nodei19 | cluster5sanity | 10.204.217.131 | [u'compute'] |
| nodei20 | cluster5sanity | 10.204.217.132 | [u'compute'] |
+---------+----------------+----------------+----------------------------------------------------------------------------+
root@nodej3:~#
2) After the provision gets completed, contrail analytics process gets stuck at initializing (Collector connection down)
1) the contrail-analytics processes in nodec10 gets stuck at initializing state (contrail-analytics logs, that complains of uve port is attached)
nodec10
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr initializing (Collector connection down)
contrail-dns active
contrail-named active
== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen:0 failed
contrail-analytics-api initializing (Collector connection down)
contrail-analytics-nodemgr initializing (Collector connection down)
contrail-collector active
contrail-query-engine active
contrail-snmp-collector initializing (Collector connection down)
contrail-topology initializing (Collector connection down)
root@nodeg37:~# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr initializing (Collector connection down)
contrail-dns active
contrail-named active
== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen:0 active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active
root@nodec28:~# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr initializing (Collector connection down)
contrail-dns active
contrail-named active
== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen:0 active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active
root@nodec10:~# vi /var/log/contrail/contrail-analytics-api.log
01/03/2017 12:36:38 AM [contrail-analytics-api]: Initializing UVE Cache
01/03/2017 12:36:38 AM [contrail-analytics-api]: updated redis_uve_list {('127.0.0.1', 6379, 0): None}
01/03/2017 12:36:38 AM [contrail-analytics-api]: Cannot write http_port 8090 to /tmp/contrail-analytics-api.2282.http_port
01/03/2017 12:36:38 AM [contrail-analytics-api]: Starting Introspect on HTTP Port 8090
01/03/2017 12:36:38 AM [contrail-analytics-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec10 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Initializing >>, ] description = UvePartitions:UVE-Aggregation[None] connection down >>, ] >>
01/03/2017 12:36:38 AM [contrail-analytics-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec10 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Initializing >>, << type = Redis-UVE name = 127.0.0.1:6379 server_addrs = [ ] status = Initializing >>, ] description = UvePartitions:UVE-Aggregation[None], Redis-UVE:127.0.0.1:6379[None] connection down >>, ] >>
01/03/2017 12:36:38 AM [contrail-analytics-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec10 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Initializing >>, << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Redis-UVE name = 127.0.0.1:6379 server_addrs = [ ] status = Initializing >>, ] description = UvePartitions:UVE-Aggregation[None], Collector, Redis-UVE:127.0.0.1:6379[None] connection down >>, ] >>
01/03/2017 12:36:38 AM [contrail-analytics-api]: SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = nodec10:Analytics:contrail-analytics-api:0 client_info = << status = Idle successful_connections = 0 pid = 2423 http_port = 8090 start_time = 1483432598225932 collector_name = collector_ip = >> sm_queue_count = 3 max_sm_queue_count = 3 >>
01/03/2017 12:36:38 AM [contrail-analytics-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec10 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Initializing >>, << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Redis-UVE name = 127.0.0.1:6379 server_addrs = [ ] status = Up >>, ] description = UvePartitions:UVE-Aggregation[None], Collector connection down >>, ] >>
01/03/2017 12:36:38 AM [contrail-analytics-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec10 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Up description = Partitions:30 >>, << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Redis-UVE name = 127.0.0.1:6379 server_addrs = [ ] status = Up >>, ] description = Collector connection down >>, ] >>
01/03/2017 12:36:38 AM [contrail-analytics-api]: Exception: get_cql_session Failure ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9160)]. Last error: Connection refused")})
01/03/2017 12:36:38 AM [contrail-analytics-api]: Starting UveStreamer
01/03/2017 12:36:38 AM [contrail-analytics-api]: Exception: [Errno 98] Address already in use
01/03/2017 12:36:38 AM [contrail-analytics-api]: stopping everythin
Discovery server: 192.168.100.10
Discovery port: 5998
Collector address: []
01/03/2017 12:37:01 AM [contrail-analytics-nodemgr]: SANDESH: CONNECT TO COLLECTOR: True
01/03/2017 12:37:01 AM [contrail-analytics-nodemgr]: Cannot write http_port 8104 to /tmp/contrail-analytics-nodemgr.4985.http_port
01/03/2017 12:37:01 AM [contrail-analytics-nodemgr]: Starting Introspect on HTTP Port 8104
01/03/2017 12:37:01 AM [contrail-analytics-nodemgr]: Processing event[EvStart] in state[none]
01/03/2017 12:37:01 AM [contrail-analytics-nodemgr]: Sandesh Client: Event[EvStart] => State[none] -> State[Idle]
01/03/2017 12:37:01 AM [contrail-analytics-nodemgr]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_INFO]
01/03/2017 12:37:01 AM [contrail-analytics-nodemgr]: SANDESH: Logging: FILE: [None] -> [<stdout>]
01/03/2017 12:37:01 AM [contrail-analytics-nodemgr]: SANDESH: Logging: SYSLOG: [None] -> [LOG_LOCAL0]
01/03/2017 12:37:01 AM [contrail-analytics-nodemgr]: SANDESH: Trace: PRINT: [None] -> [False]
01/03/2017 12:37:01 AM [contrail-analytics-nodemgr]: SANDESH: Flow Logging: [None] -> [False]
01/03/2017 12:37:02 AM [contrail-analytics-nodemgr]: send_nodemgr_process_status_base: Sending UVE:NodeStatusUVE(_context='', _scope='', _category='', _send_queue_enabled=True, _seqnum=0, _versionsig=2524127670, _source='nodec10', _instance_id='0', _client=None, _type=6, _hints=1, _http_server=None, _logger=None, _more=False, _node_type='Analytics', data=NodeStatus(status=None, name='nodec10', installed_package_version=None, deleted=None, disk_usage_info=None, build_info=None, running_package_version=None, process_mem_cpu_usage=None, system_cpu_info=None, system_mem_usage=None, process_status=[ProcessStatus(instance_id='0', module_id='contrail-analytics-nodemgr', state='Functional', description='', connection_infos=None)], all_core_file_list=None, system_cpu_usage=None, _table='ObjectCollectorInfo', process_info=None, description=None), _module='contrail-analytics-nodemgr', _level=2147483647, _timestamp=1483432622237417, _client_context='', _connect_to_collector=True, _role=0)
01/03/2017 12:37:02 AM [contrail-analytics-nodemgr]: Processing event[EvSandeshUVESend] in state[Idle]
01/03/2017 12:37:02 AM [contrail-analytics-nodemgr]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec10 build_info = {"build-info" : [{"build-version" : "4.0.0.0", "build-time" : "2016-12-23 11:45:53.527374", "build-user" : "contrail-builder", "build-hostname" : "ubuntu", "build-id" : "4.0.0.0-3020", "build-number" : "3020"}]} system_cpu_info = << num_socket = 1 num_cpu = 4 num_core_per_socket = 4 num_thread_per_core = 1 >> running_package_version = 4.0.0.0-3020 installed_package_version = 4.0.0.0-3020 >>
01/03/2017 12:37:02 AM [contrail-analytics-nodemgr]: Discarding event[EvSandeshUVESend] in state[Idle]
01/03/2017 12:37:02 AM [contrail-analytics-nodemgr]: Processing event[EvSandeshUVESend] in state[Idle]
01/03/2017 12:37:02 AM [contrail-analytics-nodemgr]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec10 process_status = [ << module_id = contrail-analytics-nodemgr instance_id = 0 state = Functional description = >>, ] >>
01/03/2017 12:37:02 AM [contrail-analytics-nodemgr]: Discarding event[EvSandeshUVESend] in state[Idle]
wokeup and found a line
wokeup and found a line
wokeup and found a line
01/03/2017 12:37:02 AM [contrail-analytics-nodemgr]: Received discovery update [{u'partcount': u'{ "1":[0,1], "2":[1,3], "3":[4,8], "4":[12,8], "5":[20,10]}', u'@publisher-id': u'nodec10', u'pid': u'2419', u'ip-address': u'192.168.100.13', u'redis-gen': u'1', u'port': u'8086'}] for collector service
01/03/2017 12:37:02 AM [contrail-analytics-nodemgr]: Processing event[EvSandeshUVESend] in state[Idle]
@ @
"/var/log/contrail/contrail-analytics-nodemgr-stderr.log" 462 lines, 68989
Alarm gen failure
------------------
01/03/2017 05:09:49 AM [contrail-alarm-gen]: Agg unexpected key ObjectConfigNode:nodeg37 from inst:part 0:0
01/03/2017 05:09:49 AM [contrail-alarm-gen]: Agg unexpected rows [OutputRow(key='ObjectConfigNode:nodeg37', typ='NodeStatus', val=None)]
01/03/2017 05:09:49 AM [contrail-alarm-gen]: AlarmGen stopping everything
01/03/2017 05:09:49 AM [contrail-alarm-gen]: Stopped http server
Notes :
------
1) Issue not seen in Single node setup
2) service contrail-collector restart
service supervisor-analytics restart
recovers the collector to active state
3) Issue seen with cluster provisioned using Fab/SM
Sundar, are you still seeing this issue, if not, we can close it.