mainline: CB 61 contrail-collector fails to start

Bug #1719830 reported by Sudheendra Rao
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.1
Fix Committed
Critical
Megh Bhatt
Trunk
Fix Committed
Critical
Megh Bhatt

Bug Description

mainline build 61 and above mitaka contrail-controller service fails to start with the error:

root@nodei21:/var/log/contrail/analytics# docker exec -it analytics contrail-status
== Contrail Analytics ==
contrail-alarm-gen active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector initializing (Database:nodei21:Global connection down)
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

root@nodei21:/var/log/contrail/analytics#

Following TCs are failing due to this bug:
1. AnalyticsTestSanity.test_contrail_status
2. AnalyticsBasicTestSanity.test_verify_object_logs
3. AnalyticsTestSanity3.test_verify_process_status_analytics_node

Collector log:

2017-09-27 Wed 07:00:30:842.722 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: MessageIndexTableInsert: Addition of message: VncApiLatencyStatsLog, message UUID: 4d65ab8a-0be5-423d-88c2-ae783e5a871c to table: MessageTableSource FAILED
2017-09-27 Wed 07:00:30:842.742 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: MessageIndexTableInsert: Addition of message: VncApiLatencyStatsLog, message UUID: 4d65ab8a-0be5-423d-88c2-ae783e5a871c to table: MessageTableModuleId FAILED
2017-09-27 Wed 07:00:30:842.761 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: MessageIndexTableInsert: Addition of message: VncApiLatencyStatsLog, message UUID: 4d65ab8a-0be5-423d-88c2-ae783e5a871c to table: MessageTableCategory FAILED
2017-09-27 Wed 07:00:30:842.780 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: MessageIndexTableInsert: Addition of message: VncApiLatencyStatsLog, message UUID: 4d65ab8a-0be5-423d-88c2-ae783e5a871c to table: MessageTableMessageType FAILED
2017-09-27 Wed 07:00:30:842.807 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: MessageIndexTableInsert: Addition of message: VncApiLatencyStatsLog, message UUID: 4d65ab8a-0be5-423d-88c2-ae783e5a871c to table: MessageTableTimestamp FAILED
2017-09-27 Wed 07:00:30:842.965 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: ObjectTableInsert: Addition of issu-vm6, message UUID 4d65ab8a-0be5-423d-88c2-ae783e5a871c into table ObjectConfigNode FAILED
2017-09-27 Wed 07:00:30:843.081 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: StatTableWrite: Addition of VncApiLatencyStatsLog, api_latency_stats tag Source: into table StatsTableByStrTagV3 FAILED
2017-09-27 Wed 07:00:30:843.112 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: StatTableWrite: Addition of VncApiLatencyStatsLog, api_latency_stats tag api_latency_stats.application: into table StatsTableByStrTagV3 FAILED
2017-09-27 Wed 07:00:30:843.137 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: StatTableWrite: Addition of VncApiLatencyStatsLog, api_latency_stats tag api_latency_stats.identifier: into table StatsTableByStrTagV3 FAILED
2017-09-27 Wed 07:00:30:843.160 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: StatTableWrite: Addition of VncApiLatencyStatsLog, api_latency_stats tag api_latency_stats.operation_type: into table StatsTableByStrTagV3 FAILED
2017-09-27 Wed 07:00:30:843.181 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: StatTableWrite: Addition of VncApiLatencyStatsLog, api_latency_stats tag api_latency_stats.response_size: into table StatsTableByU64TagV3 FAILED
2017-09-27 Wed 07:00:30:843.201 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: StatTableWrite: Addition of VncApiLatencyStatsLog, api_latency_stats tag api_latency_stats.response_time_in_usec: into table StatsTableByDblTagV3 FAILED
2017-09-27 Wed 07:00:30:843.224 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: StatTableWrite: Addition of VncApiLatencyStatsLog, api_latency_stats tag name: into table StatsTableByStrTagV3 FAILED
2017-09-27 Wed 07:00:30:843.247 IST nodei21 [Thread 47493819664128, Pid 19103]: nodei21:Global: StatTableWrite: Addition of VncApiLatencyStatsLog, api_latency_stats tag node_name: into table StatsTableByStrTagV3 FAILED

Setup details:
DISTRO : "Ubuntu 14.04.5 LTS"
SKU : mitaka
Config Nodes : [u'nodei21', u'nodei22', u'nodei23']
Control Nodes : [u'nodei21', u'nodei22', u'nodei23']
Compute Nodes : [u'nodei24', u'nodei25', u'nodei26']
Openstack Node : [u'nodea1']
WebUI Node : [u'nodei21', u'nodei22', u'nodei23']
Analytics Nodes : [u'nodei21', u'nodei22', u'nodei23']
Database Nodes : [u'nodei21', u'nodei22', u'nodei23'] !
Physical Devices : [u'hooper', u"'hooper'"]

Revision history for this message
Sudheendra Rao (sudheendra-k) wrote :

there is a collector core also due to this, the core file is copied to /cs-shared/test_runs/1719830

The backtrace is:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-collector'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00002b98afdb0c37 in _quicksort (pbase=0x0, total_elems=<optimized out>, size=32, cmp=0x2b98afefa3b8, arg=0x93ead8) at qsort.c:125
125 qsort.c: No such file or directory.
(gdb) bt
#0 0x00002b98afdb0c37 in _quicksort (pbase=0x0, total_elems=<optimized out>, size=32, cmp=0x2b98afefa3b8, arg=0x93ead8) at qsort.c:125
#1 0x00002b98b782be00 in ?? ()
#2 0x0000000000000000 in ?? ()
(gdb) quit

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/36133
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/36134
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/36133
Committed: http://github.com/Juniper/contrail-controller/commit/2302f77d123c4588bbb8923a9278d9bf017910cf
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 2302f77d123c4588bbb8923a9278d9bf017910cf
Author: Megh Bhatt <email address hidden>
Date: Fri Sep 29 15:04:04 2017 -0700

Maintain zookeeper lock during the whole database initialization

To ensure cassandra schema consistency, zookeeper lock is
used so that only one collector node is creating schema at
any given point of time. However if the creation of schema
fails then we were releasing the zookeeper lock and retrying.
This resulted in a situation where from cassandra perspective
schema was created concurrently and caused column family ID
mismatch. So now we will only release the zookeeper lock when
the schema creation is successful.

Change-Id: I1af4ef147ec31d44bce258b5af319589e27eb64e
Closes-Bug: #1719830

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/36134
Committed: http://github.com/Juniper/contrail-controller/commit/24fb66226309c4768134485227ccf13eb04c20e2
Submitter: Zuul (<email address hidden>)
Branch: master

commit 24fb66226309c4768134485227ccf13eb04c20e2
Author: Megh Bhatt <email address hidden>
Date: Fri Sep 29 15:04:04 2017 -0700

Maintain zookeeper lock during the whole database initialization

To ensure cassandra schema consistency, zookeeper lock is
used so that only one collector node is creating schema at
any given point of time. However if the creation of schema
fails then we were releasing the zookeeper lock and retrying.
This resulted in a situation where from cassandra perspective
schema was created concurrently and caused column family ID
mismatch. So now we will only release the zookeeper lock when
the schema creation is successful.

Change-Id: I1af4ef147ec31d44bce258b5af319589e27eb64e
Closes-Bug: #1719830

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.