Comment 2 for bug 1495663

Revision history for this message
Rohit Jaiswal (rohit-jaiswal-3) wrote :

Ceilometer uses Tooz for agent coordination and configurable connection retries will be useful to build resilience against random connection failures.

For example i see this in notification agent logs:

(kazoo.client): 2015-09-11 18:49:35,331 DEBUG connection _submit Sending request(xid=2): Create(path=u'/tooz/ceilometer.notification/b279f2ed-fe04-4113-b374-4627745c711c', data='\xc4\x00', acl=[ACL(perms=31, acl_list=['ALL'], id=Id(scheme='world', id='anyone'))], flags=1)
(kazoo.client): 2015-09-11 18:49:38,485 Level 5 connection _submit Sending request(xid=-2): Ping()
(kazoo.client): 2015-09-11 18:49:41,450 WARNING connection _connect_attempt Connection dropped: outstanding heartbeat ping not received
(kazoo.client): 2015-09-11 18:49:41,450 WARNING connection _connect_attempt Transition to CONNECTING
(kazoo.client): 2015-09-11 18:49:41,450 INFO client _session_callback Zookeeper connection lost
(ceilometer.openstack.common.threadgroup): 2015-09-11 18:49:41,463 ERROR threadgroup wait
Traceback (most recent call last):
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/ceilometer/openstack/common/threadgroup.py", line 145, in wait
    x.wait()
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/ceilometer/openstack/common/threadgroup.py", line 47, in wait
    return self.thread.wait()
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait
    return self._exit_event.wait()
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait
    return hubs.get_hub().switch()
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
    return self.greenlet.switch()
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/ceilometer/openstack/common/service.py", line 491, in run_service
    service.start()
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/ceilometer/notification.py", line 143, in start
    self.partition_coordinator.join_group(self.group_id)
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/ceilometer/coordination.py", line 125, in join_group
    join_req.get()
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/tooz/drivers/zookeeper.py", line 427, in get
    return self._handler(self._kazoo_async_result, timeout, **self._kwargs)
  File "/opt/stack/venv/ceilometer-20150911T173109Z/lib/python2.7/site-packages/tooz/drivers/zookeeper.py", line 137, in _join_group_handler
    raise coordination.ToozError(utils.exception_message(e))
ToozError
(kazoo.client): 2015-09-11 18:49:41,550 WARNING connection zk_loop Failed connecting to Zookeeper within the connection retry policy.
(kazoo.client): 2015-09-11 18:49:41,551 INFO client _session_callback Zookeeper session lost, state: CLOSED
(kazoo.client): 2015-09-11 18:49:41,551 Level 5 connection zk_loop Connection stopped
(oslo_messaging._drivers.impl_rabbit): 2015-09-11 18:49:42,333 ERROR impl_rabbit _error_callback Failed to consume message from queue: