Pool manager doesn't start when coordination backend isn't reachable

Bug #1514602 reported by Rahman Syed
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Designate
Fix Released
Critical
Federico Ceratto

Bug Description

When starting up Designate Pool Manager with no coordination backend running, Pool Manager will refuse to start if the configuration has a coordination backend defined.

In this scenario, I would expect for the Pool Manager process to terminate on error. However the observed behavior is that logging ceases after being unable to connect to the coordination backend, but the process lives on.

Configuration element in /etc/designate/designate.conf:

# URL for the coordination backend to use.
[coordination]
backend_url = kazoo://127.0.0.1:2181

designate-pool-manager logs at startup:

2015-11-09_21:49:39.65540 No handlers could be found for logger "oslo_config.cfg"
2015-11-09_21:49:39.78285 2015-11-09 21:49:39.782 25982 WARNING oslo_config.cfg [-] Option "policy_file" from group "DEFAULT" is deprecated. Use option "policy_file" from group "oslo_policy".
2015-11-09_21:49:39.78357 2015-11-09 21:49:39.783 25982 INFO designate.policy [-] Using policy_file found at: /etc/designate/policy.json
2015-11-09_21:49:39.81386 2015-11-09 21:49:39.813 25982 INFO designate.pool_manager.service [-] Using topic pool_manager.794ccc2c-d751-44fe-b57f-8894c9f5c842 for this pool manager instance.
2015-11-09_21:49:39.81439 2015-11-09 21:49:39.814 25982 INFO designate.pool_manager.service [-] Using topic pool_manager.794ccc2c-d751-44fe-b57f-8894c9f5c842 for this pool manager instance.
2015-11-09_21:49:39.84061 2015-11-09 21:49:39.840 25982 INFO designate.policy [req-934b8a48-0a0e-463f-ae2e-d4e7a5e79bad - - - - -] Policy check succeeded for rule 'all_tenants' on target {}
2015-11-09_21:49:39.84113 2015-11-09 21:49:39.840 25982 INFO designate.pool_manager.service [req-934b8a48-0a0e-463f-ae2e-d4e7a5e79bad - - - - -] 1 targets setup
2015-11-09_21:49:39.84353 2015-11-09 21:49:39.843 25982 WARNING oslo_config.cfg [req-934b8a48-0a0e-463f-ae2e-d4e7a5e79bad - - - - -] Option "rabbit_ha_queues" from group "DEFAULT" is deprecated. Use option "rabbit_ha_queues" from group "oslo_messaging_rabbit".
2015-11-09_21:49:39.84401 2015-11-09 21:49:39.843 25982 WARNING oslo_config.cfg [req-934b8a48-0a0e-463f-ae2e-d4e7a5e79bad - - - - -] Option "rabbit_hosts" from group "DEFAULT" is deprecated. Use option "rabbit_hosts" from group "oslo_messaging_rabbit".
2015-11-09_21:49:39.84445 2015-11-09 21:49:39.844 25982 WARNING oslo_config.cfg [req-934b8a48-0a0e-463f-ae2e-d4e7a5e79bad - - - - -] Option "rabbit_password" from group "DEFAULT" is deprecated. Use option "rabbit_password" from group "oslo_messaging_rabbit".
2015-11-09_21:49:39.84491 2015-11-09 21:49:39.844 25982 WARNING oslo_config.cfg [req-934b8a48-0a0e-463f-ae2e-d4e7a5e79bad - - - - -] Option "rabbit_use_ssl" from group "DEFAULT" is deprecated. Use option "rabbit_use_ssl" from group "oslo_messaging_rabbit".
2015-11-09_21:49:39.84570 2015-11-09 21:49:39.845 25982 WARNING oslo_config.cfg [req-934b8a48-0a0e-463f-ae2e-d4e7a5e79bad - - - - -] Option "rabbit_userid" from group "DEFAULT" is deprecated. Use option "rabbit_userid" from group "oslo_messaging_rabbit".
2015-11-09_21:49:39.84612 2015-11-09 21:49:39.845 25982 WARNING oslo_config.cfg [req-934b8a48-0a0e-463f-ae2e-d4e7a5e79bad - - - - -] Option "rabbit_virtual_host" from group "DEFAULT" is deprecated. Use option "rabbit_virtual_host" from group "oslo_messaging_rabbit".
2015-11-09_21:49:39.84826 2015-11-09 21:49:39.848 25982 INFO designate.backend.base [-] Starting backend:agent backend
2015-11-09_21:49:39.86437 2015-11-09 21:49:39.864 25982 INFO designate.service [-] Starting pool_manager service (version: 2.0.0)
2015-11-09_21:49:39.86610 2015-11-09 21:49:39.865 25982 WARNING kazoo.client [-] Connection dropped: socket connection error: ECONNREFUSED
2015-11-09_21:49:40.23062 2015-11-09 21:49:40.230 25982 WARNING kazoo.client [-] Connection dropped: socket connection error: ECONNREFUSED
2015-11-09_21:49:40.77512 2015-11-09 21:49:40.774 25982 WARNING kazoo.client [-] Connection dropped: socket connection error: ECONNREFUSED
2015-11-09_21:49:41.21083 2015-11-09 21:49:41.210 25982 WARNING kazoo.client [-] Connection dropped: socket connection error: ECONNREFUSED
2015-11-09_21:49:42.75401 2015-11-09 21:49:42.753 25982 WARNING kazoo.client [-] Connection dropped: socket connection error: ECONNREFUSED
2015-11-09_21:49:44.64618 2015-11-09 21:49:44.645 25982 WARNING kazoo.client [-] Connection dropped: socket connection error: ECONNREFUSED
2015-11-09_21:49:48.49157 2015-11-09 21:49:48.491 25982 WARNING kazoo.client [-] Connection dropped: socket connection error: ECONNREFUSED
2015-11-09_21:49:49.90838 2015-11-09 21:49:49.908 25982 WARNING kazoo.client [-] Failed connecting to Zookeeper within the connection retry policy.
2015-11-09_21:49:49.93357 Traceback (most recent call last):
2015-11-09_21:49:49.93396 File "/opt/designate/designate/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, in fire_timers
2015-11-09_21:49:49.93449 timer()
2015-11-09_21:49:49.93464 File "/opt/designate/designate/local/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
2015-11-09_21:49:49.93499 cb(*args, **kw)
2015-11-09_21:49:49.93517 File "/opt/designate/designate/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
2015-11-09_21:49:49.93555 result = function(*args, **kwargs)
2015-11-09_21:49:49.93569 File "/opt/designate/designate/local/lib/python2.7/site-packages/oslo_service/service.py", line 645, in run_service
2015-11-09_21:49:49.93618 service.start()
2015-11-09_21:49:49.93636 File "/opt/designate/designate/local/lib/python2.7/site-packages/designate/pool_manager/service.py", line 141, in start
2015-11-09_21:49:49.93689 super(Service, self).start()
2015-11-09_21:49:49.93715 File "/opt/designate/designate/local/lib/python2.7/site-packages/designate/service.py", line 121, in start
2015-11-09_21:49:49.93756 super(RPCService, self).start()
2015-11-09_21:49:49.93780 File "/opt/designate/designate/local/lib/python2.7/site-packages/designate/coordination.py", line 83, in start
2015-11-09_21:49:49.93809 self._coordinator.start()
2015-11-09_21:49:49.93829 File "/opt/designate/designate/local/lib/python2.7/site-packages/tooz/coordination.py", line 197, in start
2015-11-09_21:49:49.93860 self._start()
2015-11-09_21:49:49.93884 File "/opt/designate/designate/local/lib/python2.7/site-packages/tooz/drivers/zookeeper.py", line 80, in _start
2015-11-09_21:49:49.93918 cause=e)
2015-11-09_21:49:49.93939 File "/opt/designate/designate/local/lib/python2.7/site-packages/tooz/coordination.py", line 540, in raise_with_cause
2015-11-09_21:49:49.93960 excutils.raise_with_cause(exc_cls, message, *args, **kwargs)
2015-11-09_21:49:49.93983 File "/opt/designate/designate/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 139, in raise_with_cause
2015-11-09_21:49:49.94011 six.raise_from(exc_cls(message, *args, **kwargs), kwargs.get('cause'))
2015-11-09_21:49:49.94032 File "/opt/designate/designate/local/lib/python2.7/site-packages/six.py", line 718, in raise_from
2015-11-09_21:49:49.94067 raise value
2015-11-09_21:49:49.94090 ToozConnectionError: operation error: Connection time-out

(with no further log entries or activity shown from Pool Manager)

Thanks,
Rahman

Revision history for this message
Kiall Mac Innes (kiall) wrote :

Agree this is an issue, but disagree with the fix - we should stay running, and continue to try connecting - same as we do for MySQL/RabbitMQ etc. Otherwise, it forces deployment ordering weirdness

Changed in designate:
status: New → Triaged
importance: Undecided → Critical
milestone: none → mitaka-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (master)

Fix proposed to branch: master
Review: https://review.openstack.org/250789

Changed in designate:
assignee: nobody → Federico Ceratto (federico-ceratto)
status: Triaged → In Progress
Revision history for this message
Federico Ceratto (federico-ceratto) wrote :
Changed in designate:
assignee: Federico Ceratto (federico-ceratto) → Endre Karlson (endre-karlson)
Changed in designate:
assignee: Endre Karlson (endre-karlson) → Federico Ceratto (federico-ceratto)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (master)

Reviewed: https://review.openstack.org/250789
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=c4c421e528a8ffbbe16d016cb9d41acc18567771
Submitter: Jenkins
Branch: master

commit c4c421e528a8ffbbe16d016cb9d41acc18567771
Author: Federico Ceratto <email address hidden>
Date: Fri Nov 27 12:26:43 2015 +0000

    Retry Coordinator start indefinitely

    Change-Id: Iab9a131bf2b606431033236ec40dcda655a9ee78
    Closes-Bug: 1514602

Changed in designate:
status: In Progress → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/designate 2.0.0.0b3

This issue was fixed in the openstack/designate 2.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/301612

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on designate (stable/liberty)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/301612

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.