contrail analytics services in initializing with connection down

Bug #1721416 reported by wenqing liang
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.0
Fix Committed
High
mkheni
R4.1
Fix Committed
High
mkheni
Trunk
Fix Committed
High
mkheni

Bug Description

CB r4.1-8 newton:

== Contrail Analytics ==
contrail-collector: initializing (Database:server6:Global connection down)
contrail-analytics-api: initializing (Redis-UVE:10.10.0.7:6381[None], Redis-UVE:10.10.0.8:6381[None] connection down)
contrail-alarm-gen: initializing (Redis-UVE:10.10.0.7:6381[None], Redis-UVE:10.10.0.8:6381[None] connection down)
contrail-snmp-collector: initializing (Database:RabbitMQ[] connection down)
contrail-topology: initializing (Database:RabbitMQ[] connection down)

2017-10-04 Wed 22:26:18:617.281 UTC server6 [Thread 140106626250496, Pid 3363]: server6:Global: MessageIndexTableInsert: Addition of message: AnalyticsApiStats, message UUID: f6301180-f3cb-4505-90b6-0ee26ebfb695 to table: MessageTableTimestamp FAILED
2017-10-04 Wed 22:26:18:618.250 UTC server6 [Thread 140106626250496, Pid 3363]: server6:Global: ObjectTableInsert: Addition of server4, message UUID f6301180-f3cb-4505-90b6-0ee26ebfb695 into table ObjectCollectorInfo FAILED

10/04/2017 10:17:55 AM [contrail-analytics-api]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = server6 process_status = [ << module_id = contrail-analytics-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Redis-UVE name = 10.10.0.9:6381 server_addrs = [ 10.10.0.9:6381, ] status = Initializing >>, << type = Collector name = server_addrs = [ , ] status = Down description = none to Idle on EvStart >>, << type = Zookeeper name = OpServer server_addrs = [ 10.10.0.8:2181, 10.10.0.9:2181, 10/04/2017 10:18:11 AM [contrail-snmp-collector]: Sandesh Client: Event[EvIdleHoldTimerExpired] => State[Idle] -> State[Connect]
10/04/2017 10:18:11 AM [contrail-snmp-collector]: Processing event[EvSandeshUVESend] in state10.10.0.7:2181, ] status = Initializing description = >>, << type = Redis-UVE name = 10.10.0.7:6381 server_addrs = [ 10.10.0.7:6381, ] status = Initializing >>, << type = UvePartitions name = UVE-Aggregation server_addrs = [ ] status = Initializing >>, << type = Redis-UVE name = 10.10.0.8:6381 server_addrs = [ 10.10.0.8:6381, ] status = Initializing >>, ] description = Redis-UVE:10.10.0.9:6381[None], Collector, Zookeeper:OpServer[], Redis-UVE:10.10.0.7:6381[None], UvePartitions:UVE-Aggregation[None], Redis-UVE:10.10.0.8:6381[None] connection down >>, ] >>
10/04/2017 10:17:55 AM [contrail-analytics-api]: SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = server6:Analytics:contrail-analytics-api:0 client_info = << status = Idle successful_connections = 0 pid = 3329 http_port = 8090 start_time = 1507112275378607 collector_name = collector_ip = collector_list = [ 10.10.0.8:8086, 10.10.0.7:8086, 10.10.0.9:8086, ] >> sm_queue_count = 6 max_sm_queue_count = 6 >>
10/04/2017 10:17:55 AM [contrail-analytics-api]: Exception: get_cql_session Failure ('Unable to connect to any servers', {'10.10.0.7': error(111, "Tried connecting to [('10.10.0.7', 9042)]. Last error: Connection refused"), '10.10.0.9': TypeError('ref() does not take keyword arguments',), '10.10.0.8': TypeError('ref() does not take keyword arguments',)})

10/04/2017 10:18:07 AM [contrail-snmp-collector]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = server6 process_status = [ << module_id = contrail-snmp-collector instance_id = 0 state = Non-Functional connection_infos = [ << type = Zookeeper name = Zookeeper server_addrs = [ 10.10.0.8:2181, 10.10.0.9:2181, 10.10.0.7:2181, ] status = Up description = >>, << type = Collector name = server_addrs = [ 10.10.0.7:8086, ] status = Down description = Connect to Idle on EvTcpConnectFail >>, << type = Database name = RabbitMQ server_addrs = [ 10.10.0.5:5672 10.10.0.6:5672 10.10.0.4:5672, ] status = Initializing description = >>, ] description = Collector, Database:RabbitMQ[] connection down >>, ] >>
10/04/2017 10:18:07 AM [contrail-snmp-collector]: Discarding event[EvSandeshUVESend] in state[Idle]
10/04/2017 10:18:07 AM [contrail-snmp-collector]: Processing event[EvSandeshUVESend] in state[Idle]
10/04/2017 10:18:07 AM [contrail-snmp-collector]: SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = server6:Analytics:contrail-snmp-collector:0 client_info = << status = Idle successful_connections = 0 pid = 3445 http_port = 5920 start_time = 1507112283024981 collector_name = collector_ip = 10.10.0.7:8086 collector_list = [ 10.10.0.7:8086, 10.10.0.9:8086, 10.10.0.8:8086, ] >> sm_queue_count = 1 max_sm_queue_count = 3 >>
10/04/2017 10:18:07 AM [contrail-snmp-collector]: Discarding event[EvSandeshUVESend] in state[Idle]
10/04/2017 10:18:11 AM [contrail-snmp-collector]: Processing event[EvIdleHoldTimerExpired] in state[Idle]
10/04/2017 10:18:11 AM [contrail-snmp-collector]: Session Event: TCP Connected
[Connect]
10/04/2017 10:18:11 AM [contrail-snmp-collector]: SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = server6 process_status = [ << module_id = contrail-snmp-collector instance_id = 0 state = Non-Functional connection_infos = [ << type = Zookeeper name = Zookeeper server_addrs = [ 10.10.0.8:2181, 10.10.0.9:2181, 10.10.0.7:2181, ] status = Up description = >>, << type = Collector name = server_addrs = [ 10.10.0.9:8086, ] status = Initializing description = Idle to Connect on EvIdleHoldTimerExpired >>, << type = Database name = RabbitMQ server_addrs = [ 10.10.0.5:5672 10.10.0.6:5672 10.10.0.4:5672, ] status = Initializing description = >>, ] description = Collector, Database:RabbitMQ[] connection down >>, ] >>

Logs uploaded to /cs-shared/bugs/1721416.

Rudra Rugge (rrugge)
Changed in juniperopenstack:
milestone: r4.0.2.0 → r4.0.3.0
wenqing liang (wliang)
description: updated
wenqing liang (wliang)
tags: added: blocker
tags: added: saniyblocker
removed: blocker sanity
tags: added: sanityblocker
removed: saniyblocker
Revision history for this message
Anish Mehta (amehta00) wrote :

The logs do not have enough information.
Is this reproducible?
Can I get access to a system in this state?

Revision history for this message
Ritam Gangopadhyay (ritam) wrote :

Yes, this is reproducible and seen only in case of multiple analytics containers with Redis running on them.

The below Contrail HA setup is in error state and locked for debug. Let me know if any other details is needed.

Topology :
DISTRO : "Ubuntu 14.04.5 LTS"
SKU : mitaka
Config Nodes : [u'nodec7', u'nodec8', u'nodec57']
Control Nodes : [u'nodec7', u'nodec8', u'nodec57']
Compute Nodes : [u'nodei1', u'nodei2', u'nodei3']
Openstack Node : [u'nodec7']
WebUI Node : [u'nodec7', u'nodec8', u'nodec57']
Analytics Nodes : [u'nodec7', u'nodec8', u'nodec57']
Database Nodes : ! [u'nodec7', u'nodec8', u'nodec57']
Physical Devices : [u'hooper', u"'hooper'"]
LB Nodes : [u'nodeg36']

Revision history for this message
mkheni (mkheni) wrote :
Download full text (5.3 KiB)

Cassandra-log on nodec8:

INFO [Native-Transport-Requests-5] 2017-10-23 11:56:59,429 MigrationManager.java:343 - Create new table: org.apache.cassandra.config.CFMetaData@28f33c04[cfId=2cab7150-b7bb-11e7-9aa8-91d310b2446f,ksName=ContrailAnalyticsCql,cfName=objectvaluetable,flags=[COMPOUND],params=TableParams{comment=, read_repair_chance=0.0, dclocal_read_repair_chance=0.1, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@f3d4f4e, extensions={}, cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.Int32Type),partitionColumns=[[] | [value]],partitionKeyColumns=[key, key2],clusteringColumns=[column1],keyValidator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.UTF8Type),columnMetadata=[key2, value, key, column1],droppedColumns={},triggers=[],indexes=[]]
DEBUG [MigrationStage:1] 2017-10-23 11:56:59,438 ColumnFamilyStore.java:899 - Enqueuing flush of keyspaces: 0.517KiB (0%) on-heap, 0.000KiB (0%) off-heap
DEBUG [MessagingService-Outgoing-/192.168.192.6-Gossip] 2017-10-23 11:56:59,441 OutboundTcpConnection.java:495 - Unable to connect to /192.168.192.6
java.net.ConnectException: Connection refused
    at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_111]
    at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_111]
    at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_111]
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_111]
    at org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:146) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:132) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:397) [apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:234) [apache-cassandra-3.10.jar:3.10]

Cassandra-log on nodec57:
INFO [Native-Transport-Requests-12] 2017-10-23 11:57:21,465 MigrationManager.java:343 - Create new table: org.apache.cassandra.config.CFMetaData@6f744761[cfId=39cdde90-b7bb-11e7-8acd-d7aa517683be,ksName=ContrailAnalyticsCql,cfName=objectvaluetable,flags=[COMPOUND],params=TableParams{comment=, read_repair_chance=0.0, dclocal_read_repair_chance=0.1, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compr...

Read more...

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/36926
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/36926
Committed: http://github.com/Juniper/contrail-controller/commit/588c8d6752cb7b84d280587a2fb479bffca5c944
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 588c8d6752cb7b84d280587a2fb479bffca5c944
Author: Megh Bhatt <email address hidden>
Date: Fri Oct 27 11:48:03 2017 -0700

Fix concurrent analytics schema creation

Cassandra does not support concurrent schema creation from multiple
clients and hence contrail-collector uses zookeeper to allow only
one contrail-collector to create the schema. However in case of
multiple nodes, the CQL driver is given all the nodes and hence
it is possible that still the schema creation happens on different
nodes. Further it was observed that even when cassandra returns
failure on schema creation the node still creates the schema and this
causes issues on retry since the retry can happen on another node.

The fix is to create a schema session that only connects to the
first cassandra node given to contrail-collector with the assumption
that all contrail-collectors have the same configuration of cassandra.
This is achieved using the whitelist filtering provided by the CQL
driver. Once the schema is created, we will move to using the regular
session for queries.

Change-Id: I3948ffa22497226241ccbfd6708fe2d8989fa8c2
Closes-Bug: #1721416

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/37029
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/37029
Committed: http://github.com/Juniper/contrail-controller/commit/6b0344ab7c15d8add563be4df8cae7df7ba8b75f
Submitter: Zuul (<email address hidden>)
Branch: master

commit 6b0344ab7c15d8add563be4df8cae7df7ba8b75f
Author: Megh Bhatt <email address hidden>
Date: Fri Oct 27 11:48:03 2017 -0700

Fix concurrent analytics schema creation

Cassandra does not support concurrent schema creation from multiple
clients and hence contrail-collector uses zookeeper to allow only
one contrail-collector to create the schema. However in case of
multiple nodes, the CQL driver is given all the nodes and hence
it is possible that still the schema creation happens on different
nodes. Further it was observed that even when cassandra returns
failure on schema creation the node still creates the schema and this
causes issues on retry since the retry can happen on another node.

The fix is to create a schema session that only connects to the
first cassandra node given to contrail-collector with the assumption
that all contrail-collectors have the same configuration of cassandra.
This is achieved using the whitelist filtering provided by the CQL
driver. Once the schema is created, we will move to using the regular
session for queries.

Change-Id: I3948ffa22497226241ccbfd6708fe2d8989fa8c2
Closes-Bug: #1721416
(cherry picked from commit 588c8d6752cb7b84d280587a2fb479bffca5c944)

Revision history for this message
Ritam Gangopadhyay (ritam) wrote :

Topology :
DISTRO : "Ubuntu 16.04.2 LTS"
SKU : ocata
Config Nodes : [u'nodem16', u'nodem17', u'nodem18']
Control Nodes : [u'nodem16', u'nodem17', u'nodem18']
Comp! ute Nodes : [u'nodem19', u'nodem20']
Openstack Node : [u'nodem16',! u'nodem17', u'nodem18']
WebUI Node : [u'nodem16', u'nodem17', u'nodem18']
Analytics Nodes : [u'nodem16', u'nodem17', u'nodem18']
Database Nodes : [u'nodem16', u'nodem17', u'nodem18']
Physical Devices : [u'blr-mx2', u"'blr-mx2'"]
LB Nodes : [u'nodea10']

Still seen on this setup, available in error state.

Revision history for this message
mkheni (mkheni) wrote :

This looks like a provisioning issue and there were multiple cores created because of that.

in contrail-collector.conf, we can see:
[CONFIGDB]
# AMQP related configs
rabbitmq_server_list = 10.204.216.105 :5672 10.204.216.106:5672 10.204.216.107:5672

As can be see, there is a space between 10.204.216.105 and :5672, which is causing the collector to core.
Same can be seen in contrail-control.conf as well and control is core-ing as well

Looking at /etc/contrailctl/analytics.conf, we see
external_rabbitmq_servers = 10.204.216.105 , 10.204.216.106, 10.204.216.107
This causes the mis-formed rabbitmq_server_list in daemon conf files.

The latest issue is not related to this bug, can you please file a new bug, and also provide the combined_json used for provisioning

Revision history for this message
Ritam Gangopadhyay (ritam) wrote :

There is already a bug for the issue you are mentioning. Should we track it there?

https://bugs.launchpad.net/juniperopenstack/+bug/1715563

Revision history for this message
mkheni (mkheni) wrote :

Yes, that is the same issue. we should track it there.

Revision history for this message
Anish Mehta (amehta00) wrote :

We should not track against this bug. Please open a new one.

Revision history for this message
Abhay Joshi (abhayj) wrote :

This problem is caused by config : Please see below from cluster.json.

                       "global_config": {
                            "external_rabbitmq_servers": "10.204.216.105 , 10.204.216.106, 10.204.216.107",
                            "xmpp_auth_enable": true,
                            "xmpp_dns_auth_enable": true
                        },

The external_rabbitmq_servers list has a space in there. We need to remove that.

Revision history for this message
Ritam Gangopadhyay (ritam) wrote :

Build:- R4.1 build-42 Newton

Setup:-

nodei19 --- 10.204.217.131 --- openstack
nodec28 --- 10.204.217.13 --- controller, analytics, analyticsdb
nodec10 --- 10.204.217.176 --- controller, analytics, analyticsdb
nodec33 --- 10.204.217.168 --- controller, analytics, analyticsdb
nodeg37 --- 10.204.217.77 --- lb
nodei17 --- 10.204.217.129 --- compute
nodei20 --- 10.204.217.132 --- compute

I see cassandra error logs as :-

ERROR [Native-Transport-Requests-8] 2017-11-10 09:07:04,655 QueryMessage.java:129 - Unexpected error during query
java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found e0df9900-c538-11e7-a44e-879b1e3437da; expected d3ba6ca0-c538-11e7-a44e-879b1e3437da)

##########################################

##########################################

RabbitMQ configuration is fine:-
root@nodec28:~# docker exec -it controller cat /etc/contrail/contrail-control.conf | grep rabbitmq_server_list
rabbitmq_server_list = 192.168.100.15:5672
root@nodec28:~# docker exec -it analytics cat /etc/contrail/contrail-collector.conf | grep rabbitmq_server_list
rabbitmq_server_list = 192.168.100.15:5672
root@nodec28:~#

##########################################

##########################################

root@nodec28:~# docker exec -it analytics contrail-status
== Contrail Analytics ==
contrail-collector: initializing (Database:nodec28:Global connection down)
contrail-analytics-api: initializing (Redis-UVE:192.168.100.13:6381[None], Redis-UVE:192.168.100.17:6381[None] connection down)
contrail-query-engine: active
contrail-alarm-gen: initializing (Redis-UVE:192.168.100.13:6381[None], Redis-UVE:192.168.100.17:6381[None] connection down)
contrail-snmp-collector: active
contrail-topology: active
contrail-analytics-nodemgr: active
root@nodec28:~#

Revision history for this message
mkheni (mkheni) wrote :
Download full text (8.9 KiB)

INFO [Native-Transport-Requests-1] 2017-11-09 15:59:11,914 MigrationManager.java:343 - Create new table: org.apache.cassandra.config.CFMetaData@77ba86ad[cfId=d3ba6ca0-c538-11e7-a44e-879b1e3437da,ksName=ContrailAnalyticsCql,cfName=messagetabletimestamp,flags=[COMPOUND],params=TableParams{comment=, read_repair_chance=0.0, dclocal_read_repair_chance=0.1, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cd64306, extensions={}, cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.Int32Type, org.apache.cassandra.db.marshal.UUIDType),partitionColumns=[[] | []],partitionKeyColumns=[key, key2],clusteringColumns=[column1, column2],keyValidator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.Int32Type),columnMetadata=[key2, key, column1, column2],droppedColumns={},triggers=[],indexes=[]]

// no logs for ContrailAnalyticsCql keyspace for ~20 seconds

INFO [Native-Transport-Requests-4] 2017-11-09 15:59:33,968 MigrationManager.java:343 - Create new table: org.apache.cassandra.config.CFMetaData@6026e09d[cfId=e0df9900-c538-11e7-a44e-879b1e3437da,ksName=ContrailAnalyticsCql,cfName=messagetabletimestamp,flags=[COMPOUND],params=TableParams{comment=, read_repair_chance=0.0, dclocal_read_repair_chance=0.1, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={min_threshold=4, max_threshold=32}}, compression=org.apache.cassandra.schema.CompressionParams@cd64306, extensions={}, cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.Int32Type, org.apache.cassandra.db.marshal.UUIDType),partitionColumns=[[] | []],partitionKeyColumns=[key, key2],clusteringColumns=[column1, column2],keyValidator=org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.Int32Type),columnMetadata=[key2, key, column1, column2],droppedColumns={},triggers=[],indexes=[]]

INFO [MigrationStage:1] 2017-11-09 15:59:36,999 ColumnFamilyStore.java:406 - Initializing ContrailAnalyticsCql.messagetabletimestamp
DEBUG [MigrationStage:1] 2017-11-09 15:59:37,000 Schema.java:425 - Adding org.apache.cassandra.config.CFMetaData@323ed862[cfId=d3ba6ca0-c538-11e7-a44e-879b1e3437da,ksName=ContrailAnalyticsCql,cfName=messagetabletimestamp,flags=[COMPOUND],params=TableParams{comment=, read_repair_chance=0.0, dclocal_read_repair_chance=0.1, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, default_time_to_live=0, ...

Read more...

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/37425
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/37426
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/37425
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/37426
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/37425
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/37426
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/37425
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/37426
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/37425
Committed: http://github.com/Juniper/contrail-controller/commit/9fc9d8088daeef030fb09ca6334a4faaf2983c4c
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 9fc9d8088daeef030fb09ca6334a4faaf2983c4c
Author: mkheni <email address hidden>
Date: Fri Nov 10 15:40:31 2017 -0800

Increase request_timeout for Schema creation.

If a table creation returns a timeout error from library, collector will
retry again after 10 seconds while cassandra will go ahead and create the
table. This may lead to race condition if the second attempt is made before
the first request can finish successfully. Hence, increase the request
timeout for schema creation commands to 40 sec.

Change-Id: I86bbcd357d80c6bf25ae36e9f092ca212088b3e6
closes-bug: #1721416

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/37426
Committed: http://github.com/Juniper/contrail-controller/commit/0bc99a776cfdfb8a1007fcea7b0b12b108260e3a
Submitter: Zuul (<email address hidden>)
Branch: master

commit 0bc99a776cfdfb8a1007fcea7b0b12b108260e3a
Author: mkheni <email address hidden>
Date: Fri Nov 10 15:40:31 2017 -0800

Increase request_timeout for Schema creation.

If a table creation returns a timeout error from library, collector will
retry again after 10 seconds while cassandra will go ahead and create the
table. This may lead to race condition if the second attempt is made before
the first request can finish successfully. Hence, increase the request
timeout for schema creation commands to 40 sec.

Change-Id: I86bbcd357d80c6bf25ae36e9f092ca212088b3e6
closes-bug: #1721416

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/38040
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/38041
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38041
Committed: http://github.com/Juniper/contrail-controller/commit/8ed4466b3c543c54b5de32b5308ff004c5609658
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 8ed4466b3c543c54b5de32b5308ff004c5609658
Author: mkheni <email address hidden>
Date: Thu Nov 30 16:53:45 2017 -0800

increase the schema creation timeout to 2 mins.

Change-Id: Ibb8487b2ebfe33fefd7af8ac996b200a68fd15a1
Closes-bug: #1721416

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/38040
Committed: http://github.com/Juniper/contrail-controller/commit/3a866c9ac6054fe40ac91439f45fe56fab487d87
Submitter: Zuul (<email address hidden>)
Branch: master

commit 3a866c9ac6054fe40ac91439f45fe56fab487d87
Author: mkheni <email address hidden>
Date: Thu Nov 30 16:53:45 2017 -0800

increase the schema creation timeout to 2 mins.

Change-Id: Ibb8487b2ebfe33fefd7af8ac996b200a68fd15a1
Closes-bug: #1721416

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/38653
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38653
Committed: http://github.com/Juniper/contrail-controller/commit/dbed80635acc489ddf1f0b0ae2597d62bd24074e
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit dbed80635acc489ddf1f0b0ae2597d62bd24074e
Author: Megh Bhatt <email address hidden>
Date: Fri Sep 29 15:04:04 2017 -0700

Maintain zookeeper lock during the whole database initialization

To ensure cassandra schema consistency, zookeeper lock is
used so that only one collector node is creating schema at
any given point of time. However if the creation of schema
fails then we were releasing the zookeeper lock and retrying.
This resulted in a situation where from cassandra perspective
schema was created concurrently and caused column family ID
mismatch. So now we will only release the zookeeper lock when
the schema creation is successful.

Change-Id: I887db5a5ca4d2c8b40b5c50079ed005c3f4f4b4c
Partial-Bug: #1721416

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/38825
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38825
Committed: http://github.com/Juniper/contrail-controller/commit/a576c184fab8e5df670836e8fe1fcc731f71f6d8
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit a576c184fab8e5df670836e8fe1fcc731f71f6d8
Author: Megh Bhatt <email address hidden>
Date: Fri Oct 27 11:48:03 2017 -0700

Fix concurrent analytics schema creation

Cassandra does not support concurrent schema creation from multiple
clients and hence contrail-collector uses zookeeper to allow only
one contrail-collector to create the schema. However in case of
multiple nodes, the CQL driver is given all the nodes and hence
it is possible that still the schema creation happens on different
nodes. Further it was observed that even when cassandra returns
failure on schema creation the node still creates the schema and this
causes issues on retry since the retry can happen on another node.

The fix is to create a schema session that only connects to the
first cassandra node given to contrail-collector with the assumption
that all contrail-collectors have the same configuration of cassandra.
This is achieved using the whitelist filtering provided by the CQL
driver. Once the schema is created, we will move to using the regular
session for queries.

Change-Id: I3948ffa22497226241ccbfd6708fe2d8989fa8c2
Closes-Bug: #1721416
(cherry picked from commit 588c8d6752cb7b84d280587a2fb479bffca5c944)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/38972
Submitter: mkheni (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38972
Committed: http://github.com/Juniper/contrail-controller/commit/da8b071da4fc11d15026b9544206f09cdb0afb26
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit da8b071da4fc11d15026b9544206f09cdb0afb26
Author: mkheni <email address hidden>
Date: Fri Nov 10 15:40:31 2017 -0800

Increase request_timeout for Schema creation.

If a table creation returns a timeout error from library, collector will
retry again after 10 seconds while cassandra will go ahead and create the
table. This may lead to race condition if the second attempt is made before
the first request can finish successfully. Hence, increase the request
timeout for schema creation commands to 120 sec.

Change-Id: I86bbcd357d80c6bf25ae36e9f092ca212088b3e6
closes-bug: #1721416

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.