Config services svc-monitor, device-manager and schema are not coming up in 5.1.0

Bug #1785051 reported by musharani
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Released
High
Michael Henkel
Trunk
Fix Released
Critical
Michael Henkel

Bug Description

The setup is freshly brought with 5.1.0 -215 - ocata. In this build config services like svc-monitor, device-manager and schema are not coming to active while checking through contrail-status.

Setup also available. It is a multi node multi interface setup.

nodec7 (root/c0ntrail123)

instances:
  nodec57:
      ip: 10.204.216.153
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          openstack: null
          webui: null
  nodec7:
      ip: 10.204.216.64
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          openstack: null
          webui: null
  nodec8:
      ip: 10.204.216.65
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          openstack: null
          webui: null

  nodei1:
      ip: 10.204.216.150
      provider: bms
      roles:
          openstack_compute: null
          vrouter:
              PHYSICAL_INTERFACE: eno2
  nodei2:
      ip: 10.204.217.114
      provider: bms
      roles:
          openstack_compute: null
          vrouter:
              PHYSICAL_INTERFACE: eno2
  nodei3:
      ip: 10.204.217.115
      provider: bms
      roles:
          openstack_compute: null
          vrouter:
              PHYSICAL_INTERFACE: eno2

The same error is seen in device-mgr and schema logs. Complete logs for both of the services are attached

== Contrail control ==
control: active
nodemgr: active
named: active
dns: active

== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active

== Contrail database ==
kafka: active
nodemgr: active
zookeeper: active
cassandra: active

== Contrail analytics ==
snmp-collector: active
query-engine: active
api: active
alarm-gen: active
nodemgr: active
collector: active
topology: active

== Contrail webui ==
web: active
job: active

== Contrail config ==
HTTPSConnectionPool(host='nodec7', port=8088): Max retries exceeded with url: /Snh_SandeshUVECacheReq?x=NodeStatus (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fdcf43afc50>: Failed to establish a new connection: [Errno 111] Connection refused',))
svc-monitor: initializing
nodemgr: active
HTTPSConnectionPool(host='nodec7', port=8096): Max retries exceeded with url: /Snh_SandeshUVECacheReq?x=NodeStatus (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fdcf43afdd0>: Failed to establish a new connection: [Errno 111] Connection refused',))
device-manager: initializing
api: active
HTTPSConnectionPool(host='nodec7', port=8087): Max retries exceeded with url: /Snh_SandeshUVECacheReq?x=NodeStatus (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fdcf43aff10>: Failed to establish a new connection: [Errno 111] Connection refused',))
schema: initializing

device-mgr log:
===============
    dm_logger, args)
  File "/usr/lib/python2.7/site-packages/cfgm_common/zkclient.py", line 505, in master_election
    self._election.run(func, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/kazoo/recipe/election.py", line 54, in run
    func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/device_manager/device_manager.py", line 746, in run_device_manager
    DeviceManager(dm_logger, args)
  File "/usr/lib/python2.7/site-packages/device_manager/device_manager.py", line 306, in __init__
    self._object_db = DMCassandraDB.get_instance(self, _zookeeper_client)
  File "/usr/lib/python2.7/site-packages/device_manager/db.py", line 1953, in get_instance
    cls.dm_object_db_instance = DMCassandraDB(manager, zkclient)
  File "/usr/lib/python2.7/site-packages/device_manager/db.py", line 1984, in __init__
    ca_certs=self._args.cassandra_ca_certs)
  File "/usr/lib/python2.7/site-packages/cfgm_common/vnc_object_db.py", line 34, in __init__
    ca_certs,
  File "/usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py", line 154, in __init__
    self._cassandra_init(server_list)
  File "/usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py", line 558, in _cassandra_init
    self._cassandra_ensure_keyspace(keyspace, cf_dict)
  File "/usr/lib/python2.7/site-packages/cfgm_common/vnc_cassandra.py", line 628, in _cassandra_ensure_keyspace
    **create_cf_kwargs)
  File "/usr/lib/python2.7/site-packages/pycassa/system_manager.py", line 300, in alter_column_family
    self._system_update_column_family(cfdef)
  File "/usr/lib/python2.7/site-packages/pycassa/system_manager.py", line 280, in _system_update_column_family
    return self._schema_update(self._conn.system_update_column_family, cfdef)
  File "/usr/lib/python2.7/site-packages/pycassa/system_manager.py", line 459, in _schema_update
    schema_version = schema_func(*args)
  File "/usr/lib/python2.7/site-packages/pycassa/cassandra/Cassandra.py", line 1754, in system_update_column_family
    return self.recv_system_update_column_family()
  File "/usr/lib/python2.7/site-packages/pycassa/cassandra/Cassandra.py", line 1765, in recv_system_update_column_family
    (fname, mtype, rseqid) = self._iprot.readMessageBegin()
  File "/usr/lib64/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin
    sz = self.readI32()
  File "/usr/lib64/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 206, in readI32
    buff = self.trans.readAll(4)
  File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll
    chunk = self.read(sz - have)
  File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 271, in read
    self.readFrame()
  File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 275, in readFrame
    buff = self.__trans.readAll(4)
  File "/usr/lib64/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll
    chunk = self.read(sz - have)
  File "/usr/lib64/python2.7/site-packages/thrift/transport/TSocket.py", line 103, in read
    buff = self.handle.recv(sz)
  File "/usr/lib64/python2.7/site-packages/gevent/_socket2.py", line 280, in recv
    self._wait(self._read_event)
  File "/usr/lib64/python2.7/site-packages/gevent/_socket2.py", line 179, in _wait
    self.hub.wait(watcher)
  File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 630, in wait
    result = waiter.get()
  File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 878, in get
    return self.hub.switch()
  File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 609, in switch
    return greenlet.switch(self)
timeout: timed out

Revision history for this message
musharani (musharani) wrote :
Changed in juniperopenstack:
milestone: none → r5.1.0
Revision history for this message
musharani (musharani) wrote :
Revision history for this message
vimal (vappachan) wrote :
Download full text (4.3 KiB)

This issue is seen in 5.0 173

[root@nodem14 ~]# contrail-status
Pod Service Original Name State Status
analytics alarm-gen contrail-analytics-alarm-gen running Up 24 hours
analytics api contrail-analytics-api running Up 2 days
analytics collector contrail-analytics-collector running Up 2 days
analytics nodemgr contrail-nodemgr running Up 2 days
analytics query-engine contrail-analytics-query-engine running Up 2 days
analytics snmp-collector contrail-analytics-snmp-collector running Up 24 hours
analytics topology contrail-analytics-topology running Up 2 days
config api contrail-controller-config-api running Up 2 days
config device-manager contrail-controller-config-devicemgr running Up 24 hours
config nodemgr contrail-nodemgr running Up 2 days
config schema contrail-controller-config-schema running Up 24 hours
config svc-monitor contrail-controller-config-svcmonitor running Up 24 hours
config-database cassandra contrail-external-cassandra running Up 2 days
config-database nodemgr contrail-nodemgr running Up 2 days
config-database rabbitmq contrail-external-rabbitmq running Up 2 days
config-database zookeeper contrail-external-zookeeper running Up 2 days
control control contrail-controller-control-control running Up 2 days
control dns contrail-controller-control-dns running Up 2 days
control named contrail-controller-control-named running Up 2 days
control nodemgr contrail-nodemgr running Up 2 days
database cassandra contrail-external-cassandra running Up 2 days
database kafka contrail-external-kafka running Up 2 days
database nodemgr contrail-nodemgr running Up 2 days
database zookeeper contrail-external-zookeeper running Up 2 days
webui job contrail-controller-webui-job running Up 2 days
webui web contrail-controller-webui-web running Up 2 days

== Contrail control ==
control: active
nodemgr: active
named: active
dns: active

== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active

== Contrail database ==
kafka: active
nodemgr: active
zookeeper: active
cassandra: active

== Contrail analytics ==
snmp-collector: active
query-engine: active
api: initializing (Redis-UVE:10.204.216.96:6379[None] connection down)
alarm-gen: initializing (Redis-UVE:10.204.216.96:6379[None], Zookeeper:AlarmGenerator[] connection down)
nodemgr: active
collector: active
topology: active

== Contrail webui ==
web: active
job: active

== Contrail config ==
HTTPSConnectionPool(host='nodem...

Read more...

Revision history for this message
Abhay Joshi (abhayj) wrote :

Bug seems to have been reported with 5.1.0.Why is there blindly 5.0.1 instance created?

Revision history for this message
Nagendra Prasath (npchandran) wrote :

npchandran@daily-ub1604-1:~$ ssh 10.204.216.65 -l root
ssh: connect to host 10.204.216.65 port 22: Connection refused
npchandran@daily-ub1604-1:~$
npchandran@daily-ub1604-1:~$ ssh 10.204.216.153 -l root
ssh: connect to host 10.204.216.153 port 22: Connection refused
npchandran@daily-ub1604-1:~$ ssh 10.204.216.64 -l root (hung)

Could you get console info please.

Revision history for this message
Sudheendra Rao (sudheendra-k) wrote :

Abhay, the problem is seen in R5.0 also, check audit-trail #3.

Jeba Paulaiyan (jebap)
tags: added: contrail-networking
Revision history for this message
Sandip Dey (sandipd) wrote :

Some of the services not coming up after provisioning.For example schema.

Schema logs says this

08/07/2018 03:51:58 PM [contrail-schema] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec4 process_status = [ << module_id = contrail-schema instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ 10.204.216.61:8086, ] status = Initializing description = Established to Connect on EvTcpClose >>, << type = Zookeeper name = Zookeeper server_addrs = [ 10.204.216.61:2181, 10.204.216.62:2181, 10.204.216.63:2181, ] status = Up description = >>, << type = Database name = RabbitMQ server_addrs = [ 10.204.216.61:5673, 10.204.216.62:5673, 10.204.216.63:5673, ] status = Up description = >>, ] description = Collector connection down >>, ] >>

But the collector is up.

Tried restarting the schema and collector.Did not help

Instance.yml : https://github.com/Juniper/contrail-tools/blob/master/yamls/vcenter/c4_clutser/multi-interface/instances.yaml

deployer at : 10.204.216.61:/root/
Build : queens-5.0-176

== Contrail config ==
svc-monitor: initializing
nodemgr: active
device-manager: active
api: active
schema: initializing

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R5.0

Review in progress for https://review.opencontrail.org/45565
Submitter: Michael Henkel (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/45565
Committed: http://github.com/Juniper/contrail-container-builder/commit/4cf91f958bc06910b24cba4002d2b3148aa6416b
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit 4cf91f958bc06910b24cba4002d2b3148aa6416b
Author: Michael Henkel <email address hidden>
Date: Tue Aug 14 11:50:10 2018 -0700

fixes contrail-status output

Change-Id: I75118e2b02aae55a2ea0cb9e0df97c67f3e1a440
Closes-Bug: 1785051

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/45647
Submitter: Andrey Pavlov (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/45647
Committed: http://github.com/Juniper/contrail-container-builder/commit/a404409cb8071411c02070c9564f78a2e292d2d5
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit a404409cb8071411c02070c9564f78a2e292d2d5
Author: Michael Henkel <email address hidden>
Date: Tue Aug 14 11:50:10 2018 -0700

fixes contrail-status output

Change-Id: I75118e2b02aae55a2ea0cb9e0df97c67f3e1a440
Closes-Bug: 1785051
(cherry picked from commit 4cf91f958bc06910b24cba4002d2b3148aa6416b)

Revision history for this message
Sandip Dey (sandipd) wrote :
Download full text (17.7 KiB)

This is still a problem.Schema not up for any of the 3 nodes

Schema logs nodec4:
====================
09/11/2018 06:40:58 PM [contrail-schema] [WARNING]: Initializing RabbitMQ connection, urls ['pyamqp://guest:guest@10.204.216.61:5673//', 'pyamqp://guest:guest@10.204.216.62:5673//', 'pyamqp://guest:guest@10.204.216.63:5673//']
09/11/2018 06:40:58 PM [contrail-schema] [WARNING]: RabbitMQ connection down
09/11/2018 06:40:58 PM [contrail-schema] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec4 process_status = [ << module_id = contrail-schema instance_id = 0 state = Non-Functional connection_infos = [ << type = Database name = RabbitMQ server_addrs = [ 10.204.216.61:5673, 10.204.216.62:5673, 10.204.216.63:5673, ] status = Initializing description = >>, ] description = Database:RabbitMQ[] connection down >>, ] >>
09/11/2018 06:40:58 PM [contrail-schema] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec4 process_status = [ << module_id = contrail-schema instance_id = 0 state = Non-Functional connection_infos = [ << type = Database name = RabbitMQ server_addrs = [ 10.204.216.61:5673, 10.204.216.62:5673, 10.204.216.63:5673, ] status = Down description = >>, ] description = Database:RabbitMQ[] connection down >>, ] >>
09/11/2018 06:40:58 PM [contrail-schema] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec4 process_status = [ << module_id = contrail-schema instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Database name = RabbitMQ server_addrs = [ 10.204.216.61:5673, 10.204.216.62:5673, 10.204.216.63:5673, ] status = Down description = >>, ] description = Collector, Database:RabbitMQ[] connection down >>, ] >>
09/11/2018 06:40:58 PM [contrail-schema] [ERROR]: SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = nodec4:Config:contrail-schema:0 client_info = << status = Idle successful_connections = 0 pid = 1 start_time = 1536671458632083 collector_name = collector_ip = collector_list = [ 10.204.216.62:8086, 10.204.216.63:8086, 10.204.216.61:8086, ] >> sm_queue_count = 3 max_sm_queue_count = 3 >>
09/11/2018 06:40:58 PM [contrail-schema] [WARNING]: RabbitMQ connection ESTABLISHED <Connection: amqp://guest:**@10.204.216.61:5673// at 0x7f0a32719cd0>
09/11/2018 06:40:58 PM [contrail-schema] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec4 process_status = [ << module_id = contrail-schema instance_id = 0 state = Non-Functional connection_infos = [ << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Database name = RabbitMQ server_addrs = [ 10.204.216.61:5673, 10.204.216.62:5673, 10.204.216.63:5673, ] status = Up description = >>, ] description = Collector connection down >>, ] >>
09/11/2018 06:40:58 PM [contrail-schema] [ERROR]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec4 pr...

Jeba Paulaiyan (jebap)
tags: added: blocker
Revision history for this message
aswani kumar (aswanikumar90) wrote :

verified not seeing in build 5.0 299

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.