3073: ubuntu-14-04 mitaka SMLite single node containers restart as zookeeper connection is flapping

Bug #1689376 reported by Sudheendra Rao
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.0
Fix Committed
Critical
Hari Prasad Killi
Trunk
Fix Committed
Critical
Hari Prasad Killi

Bug Description

The containers (controller and analytics) are restarting as zookeeper connection is falpping on mainline build 3073 mitaka SMLite single node.

logs of contrail-api:

WARNING:api-0:Connection dropped: socket connection error: Connection refused
INFO:api-0:Connecting to 10.204.216.232:2181
WARNING:api-0:Connection dropped: socket connection error: Connection refused
ERROR:contrail-api:Session Event: TCP Connect Fail
ERROR:contrail-api:SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodek12 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Zookeeper name = Zookeeper server_addrs = [ 10.204.216.232:2181, ] status = Initializing description = >>, << type = Collector name = server_addrs = [ 10.204.216.232:8086, ] status = Initializing description = Idle to Connect on EvIdleHoldTimerExpired >>, ] description = Zookeeper:Zookeeper[], Collector connection down >>, ] >>
ERROR:contrail-api:SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodek12 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Zookeeper name = Zookeeper server_addrs = [ 10.204.216.232:2181, ] status = Initializing description = >>, << type = Collector name = server_addrs = [ 10.204.216.232:8086, ] status = Down description = Connect to Idle on EvTcpConnectFail >>, ] description = Zookeeper:Zookeeper[], Collector connection down >>, ] >>
ERROR:contrail-api:SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = nodek12:Config:contrail-api:0 client_info = << status = Idle successful_connections = 0 pid = 30511 http_port = 8084 start_time = 1494268876738826 collector_name = collector_ip = 10.204.216.232:8086 collector_list = [ 10.204.216.232:8086, ] >> sm_queue_count = 1 max_sm_queue_count = 3 >>
INFO:api-0:Connecting to 10.204.216.232:2181DEBUG:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=400000, session_id=0, passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\
x00\x00', read_only=None)INFO:api-0:Zookeeper connection established, state: CONNECTED

Rabbitmq connection is also down:

=WARNING REPORT==== 8-May-2017::23:22:55 ===
closing AMQP connection <0.5303.0> (127.0.0.1:54454 -> 127.0.0.1:5672):
connection_closed_abruptly

=WARNING REPORT==== 8-May-2017::23:22:55 ===
closing AMQP connection <0.2167.0> (127.0.0.1:54132 -> 127.0.0.1:5672):
connection_closed_abruptly

Revision history for this message
Abhay Joshi (abhayj) wrote :

This needs to be looked into by team working on container internal ansible.

Changed in juniperopenstack:
assignee: Abhay Joshi (abhayj) → Raj Reddy (rajreddy)
tags: added: blocker
removed: server-manager
Revision history for this message
Raj Reddy (rajreddy) wrote :

This is happening because after the dockers are started, we install vrouter package on the host, which does 'chown -R contrail. /var/log/contrail', that changes the ownership of /var/log/contrail/controller, etc which are mount points for controller etc. dockers.

Suggested solution is to replace the above with 'chown contrail. /var/log/contrail'

Revision history for this message
Raj Reddy (rajreddy) wrote :
tags: added: vrouter
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/31306
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/31307
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/31307
Committed: http://github.com/Juniper/contrail-packages/commit/d8e4d57bfbcab11aef49354fefb041d6f5ffbede
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit d8e4d57bfbcab11aef49354fefb041d6f5ffbede
Author: Hari Prasad Killi <email address hidden>
Date: Wed May 10 14:34:27 2017 +0530

Do not change the file ownership for /var/log/contrail recursively.

Change-Id: I5ebe7b8dfa4ae7f80db33cbcd2f4020b369f34d6
closes-bug: #1689376

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/31306
Committed: http://github.com/Juniper/contrail-packages/commit/11c86fd2ce83d661f1cf3743ab38b75dac4a1636
Submitter: Zuul (<email address hidden>)
Branch: master

commit 11c86fd2ce83d661f1cf3743ab38b75dac4a1636
Author: Hari Prasad Killi <email address hidden>
Date: Wed May 10 14:34:27 2017 +0530

Do not change the file ownership for /var/log/contrail recursively.

Change-Id: I5ebe7b8dfa4ae7f80db33cbcd2f4020b369f34d6
closes-bug: #1689376

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.