Collector hangs in zookeeper client

Bug #1551600 reported by Megh Bhatt
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
High
Megh Bhatt
Trunk
Fix Committed
High
Megh Bhatt

Bug Description

If the collector on the node that has created the /collector znode is restarted before it deletes the /collector znode, then since the data contains the hostname and uuid, on restart the collector will not delete the /collector node since the uuid it chooses after the restart will not be the same as the one stored in the /collector znode data. Due to this, no collector will be able to acquire the zk Lock and no progress will be made.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/18057
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
Ankit Jain (ankitja) wrote :
Download full text (6.5 KiB)

2016-03-01 Tue 10:48:05:798.121 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new Prod
2016-03-01 Tue 10:48:05:798.234 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-0 Err
2016-03-01 Tue 10:48:05:798.253 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-1 Err
2016-03-01 Tue 10:48:05:798.267 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-2 Err
2016-03-01 Tue 10:48:05:798.278 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-3 Err
2016-03-01 Tue 10:48:05:798.290 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-4 Err
2016-03-01 Tue 10:48:05:798.304 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-5 Err
2016-03-01 Tue 10:48:05:798.315 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-6 Err
2016-03-01 Tue 10:48:05:798.326 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-7 Err
2016-03-01 Tue 10:48:05:798.337 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-8 Err
2016-03-01 Tue 10:48:05:798.351 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-9 Err
2016-03-01 Tue 10:48:05:798.362 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-10 Err
2016-03-01 Tue 10:48:05:798.373 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-11 Err
2016-03-01 Tue 10:48:05:798.384 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-12 Err
2016-03-01 Tue 10:48:05:798.396 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-13 Err
2016-03-01 Tue 10:48:05:798.408 IST nodeg20 [Thread 139908419608832, Pid 17447]: Kafka new topic -uve-14 Err
2016-03-01 Tue 10:48:26:891.762 IST nodeg20 [Thread 139908194424576, Pid 17447]: CassLibrary: src/connection.cpp:791 void cass::Connection::notify_error(const string&, cass::Con
nection::ConnectionError)] Host 10.204.216.17 received invalid protocol response Invalid or unsupported protocol version: 4
2016-03-01 Tue 10:48:26:891.877 IST nodeg20 [Thread 139908194424576, Pid 17447]: CassLibrary: src/control_connection.cpp:204 virtual void cass::ControlConnection::on_close(cass:
:Connection*)] Lost control connection on host 10.204.216.17
2016-03-01 Tue 10:48:26:891.909 IST nodeg20 [Thread 139908194424576, Pid 17447]: CassLibrary: src/control_connection.cpp:223 virtual void cass::ControlConnection::on_close(cass:
:Connection*)] Host 10.204.216.17 does not support protocol version 4. Trying protocol version 3...
2016-03-01 Tue 10:48:26:977.719 IST nodeg20 [Thread 139908419608832, Pid 17447]: TCP [SYS_DEBUG]: TcpServerMessageLog: Server 0.0.0.0:19876 Initialization complete controller/s
rc/io/tcp_server.cc 102
2016-03-01 Tue 10:48:26:978.415 IST nodeg20 [Thread 139908419608832, Pid 17447]: SANDESH: No Client: 1456809506978316 SandeshModuleClientTrace: data= [ name = nodeg20:Analytics:
contrail-collector:0 client_info= [ status = Idle successful_connections = 0 pid = 17447 http_port = 8089 start_time = 1456809506978200 collector_name = primary = 0.0.0.0:0 sec
ondary = 0.0.0.0:0 rx_socket_stats= [ bytes = 0 calls = 0 average_bytes = 0 blo...

Read more...

information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/18057
Committed: http://github.org/Juniper/contrail-controller/commit/57b353118b078c2c94ecf9e8820269df08d39daa
Submitter: Zuul
Branch: R3.0

commit 57b353118b078c2c94ecf9e8820269df08d39daa
Author: Megh Bhatt <email address hidden>
Date: Tue Mar 1 01:01:08 2016 -0800

Zoo node data just contains hostname so that restart of collector can be handled

Change-Id: Ic76c41abeb373f7065198da6bafc479f61c43391
Closes-Bug: #1551600

Megh Bhatt (meghb)
Changed in juniperopenstack:
milestone: r3.0-fcs → r3.0.1.0
Revision history for this message
prasad miriyala (pmiriyala) wrote :

Recover steps, if you get into the bug issue:

1)/usr/share/zookeeper/bin/zkCli.sh
2)delete /collector
3)quit

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/18230
Submitter: Megh Bhatt (<email address hidden>)

tags: added: releasenote
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/18230
Committed: http://github.org/Juniper/contrail-controller/commit/9ccc1e37ec8c29b8c48993552dca33d653700154
Submitter: Zuul
Branch: master

commit 9ccc1e37ec8c29b8c48993552dca33d653700154
Author: Megh Bhatt <email address hidden>
Date: Tue Mar 1 01:01:08 2016 -0800

Zoo node data just contains hostname so that restart of collector can be handled

Change-Id: Ic76c41abeb373f7065198da6bafc479f61c43391
Closes-Bug: #1551600
(cherry picked from commit 57b353118b078c2c94ecf9e8820269df08d39daa)

Changed in juniperopenstack:
milestone: r3.0.1.0 → r3.1.0.0-fcs
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.