SM:mainline:3036:centos: contrail-control process gets into timeout state with error in cassandra

Bug #1662417 reported by sundarkh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Won't Fix
High
Dheeraj Gautam
R3.2
Won't Fix
High
Dheeraj Gautam
R4.0
Won't Fix
High
Dheeraj Gautam
Trunk
Won't Fix
High
Dheeraj Gautam

Bug Description

SM:mainline:3036:Centos: contrail-control process gets into timeout state with error in cassandra

1) Install SM mitaka mainline build 3036 ;
2) add a cluster with roles as follows

root@nodej5:~# server-manager-client display server --select id,cluster_id,mac_address,ip_address,roles
+---------+---------------+---------------+---------------------------------------------------+-------------------+
| id | cluster_id | ip_address | roles | mac_address |
+---------+---------------+---------------+---------------------------------------------------+-------------------+
| nodec57 | cluster_multi | 10.204.221.61 | [u'compute'] | 00:25:90:C5:58:6E |
| nodec33 | cluster_multi | 10.204.221.59 | [u'webui', u'database', u'control', u'collector'] | 00:25:90:C4:82:28 |
| nodec35 | cluster_multi | 10.204.221.58 | [u'config', u'control', u'openstack'] | 00:25:90:C4:7A:70 |
| nodea4 | cluster_multi | 10.204.221.60 | [u'compute'] | 00:25:90:A5:3B:12 |
+---------+---------------+---------------+---------------------------------------------------+-----------------

3) Reimage the target with centos72; Reimaged succesfully;

4) Issue Provision; Provision gets completed;
5) But the contrail control node gets into timeout state

root@nodec33 ~]# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control timeout
contrail-control-nodemgr active

6) restart of contrail-control did not help to recover the process to active state

7) /var/log/contrail/contrail-control.log of nodec33

2017-02-06 Mon 19:55:54:918.111 PST nodec33 [Thread 47244092547904, Pid 31972]: DisconnectSync:controller/src/database/cassandra/cql/cql_if.cc:1877: DisconnectSync FAILED
2017-02-06 Mon 19:55:54:918.017 PST nodec33 [Thread 47244265494272, Pid 31972]: SANDESH: Send FAILED: 1486439754917900 ConfigCassSm [SYS_DEBUG]: ConfigCassInitErrorMessage: Database initialization failed controller/src/ifmap/client/config_cassandra_client.cc 339
2017-02-06 Mon 19:55:59:918.509 PST nodec33 [Thread 47244092547904, Pid 31972]: SyncFutureWait:controller/src/database/cassandra/cql/cql_if.cc:1355: SyncWait: FAILED: Error initializing session

7) Another issue seen in this topology is , contrail-status shows following traceback

root@nodec35 ~]# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr active
contrail-dns active
contrail-named active

== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-config-nodemgr active
contrail-device-manager active
contrail-discovery active
contrail-schema active
contrail-svc-monitor active
ifmap active

== Contrail Database ==
contrail-database: active

supervisor-database: inactive (disabled on boot)
Traceback (most recent call last):
  File "/usr/bin/contrail-status", line 547, in <module>
    main()
  File "/usr/bin/contrail-status", line 524, in main
    contrail_service_status('database', options)
  File "/usr/bin/contrail-status", line 442, in contrail_service_status
    check_status(svc_name, options)
  File "/usr/bin/contrail-status", line 418, in check_status
    check_svc_status(svc_name, options.debug, options.detail, options.timeout)
  File "/usr/bin/contrail-status", line 358, in check_svc_status
    raise Exception("%s does not exist! Cannot check supervisor status." % service_sock)
Exception: /var/run/supervisord_database.sock does not exist! Cannot check supervisor status.
[root@nodec35 ~]#

Notes:
------
1) This is seen in kilo/liberty/mitaka
2) Issue not seen in Single node setup
3) Seen in both ubuntu/centos Distros

sundarkh (sundar-kh)
description: updated
Revision history for this message
sundarkh (sundar-kh) wrote :

Seen with mainline build 3039 also

sundarkh (sundar-kh)
summary: - SM:mainline:3036:Centos: contrail-control process gets into timeout
- state with error in cassandra
+ SM:mainline:3036: contrail-control process gets into timeout state with
+ error in cassandra
description: updated
sundarkh (sundar-kh)
tags: added: sanity
tags: added: blocker
Revision history for this message
sundarkh (sundar-kh) wrote : Re: SM:mainline:3036: contrail-control process gets into timeout state with error in cassandra

Seen with build 3043 also

Revision history for this message
sundarkh (sundar-kh) wrote :

Seen in Ubuntu R3.2 build 28

Jeba Paulaiyan (jebap)
tags: removed: blocker sanity
tags: added: sanity
tags: removed: sanity
Revision history for this message
Dheeraj Gautam (dgautam) wrote :

no plans to fix into R3.2

summary: - SM:mainline:3036: contrail-control process gets into timeout state with
- error in cassandra
+ SM:mainline:3036:centos: contrail-control process gets into timeout
+ state with error in cassandra
Revision history for this message
Dheeraj Gautam (dgautam) wrote :

Needs to be re-tested with R4.0 container approach.

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

Centos containers testing is planned for R4.0.1.0

Jeba Paulaiyan (jebap)
Changed in juniperopenstack:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.