infinite child spawn loop when rabbit_durable_queues differs from rabbit server

Bug #1074132 reported by Brian Waldon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Fix Released
High
Brian Waldon
Grizzly
Fix Released
High
Brian Waldon

Bug Description

With a vanilla devstack install, set the following values in /etc/glance/glance-api.conf:

workers = 1
notifier_strategy = noop
rabbit_password = secrete
rabbit_durable_queues = False

Restart glance-api - everything should be fine and notifications should be pushed out to rabbit. Now change the following value:

rabbit_durable_queus = True

Restart glance-api - the service will enter an infinite loop where the child process raises an exception that looks like the following and exits, causing the parent to restart it:

2012-11-01 20:43:30 17554 TRACE glance self._connect()
2012-11-01 20:43:30 17554 TRACE glance File "/opt/stack/glance/glance/notifier/notify_kombu.py", line 121, in _connect
2012-11-01 20:43:30 17554 TRACE glance queue.declare()
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 359, in declare
2012-11-01 20:43:30 17554 TRACE glance return (self.name and self.exchange.declare(nowait),
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 151, in declare
2012-11-01 20:43:30 17554 TRACE glance nowait=nowait)
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/kombu/syn.py", line 14, in blocking
2012-11-01 20:43:30 17554 TRACE glance return __sync_current(fun, *args, **kwargs)
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/kombu/syn.py", line 40, in __eblocking__
2012-11-01 20:43:30 17554 TRACE glance return spawn(fun, *args, **kwargs).wait()
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
2012-11-01 20:43:30 17554 TRACE glance return self._exit_event.wait()
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2012-11-01 20:43:30 17554 TRACE glance return hubs.get_hub().switch()
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
2012-11-01 20:43:30 17554 TRACE glance return self.greenlet.switch()
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
2012-11-01 20:43:30 17554 TRACE glance result = function(*args, **kwargs)
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 843, in exchange_declare
2012-11-01 20:43:30 17554 TRACE glance (40, 11), # Channel.exchange_declare_ok
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/amqplib/client_0_8/abstract_channel.py", line 105, in wait
2012-11-01 20:43:30 17554 TRACE glance return amqp_method(self, args)
2012-11-01 20:43:30 17554 TRACE glance File "/usr/local/lib/python2.7/dist-packages/amqplib/client_0_8/channel.py", line 273, in _close
2012-11-01 20:43:30 17554 TRACE glance (class_id, method_id))
2012-11-01 20:43:30 17554 TRACE glance AMQPChannelException: (406, u"PRECONDITION_FAILED - cannot redeclare exchange 'glance' in vhost '/' with different type, durable, internal or autodelete value", (40, 10), 'Channel.exchange_declare')

I would expect the service to stop either before each child is created or after each child fails to start properly.

Revision history for this message
Eoghan Glynn (eglynn) wrote :

Can this issue be worked around simply by setting the rabbit_max_retries config variable to some sane non-zero value (e.g. 10)?

Unfortunately it defaults to zero, which the RabbitStrategy interprets as an infinite number of retries.

Revision history for this message
Eoghan Glynn (eglynn) wrote :

Sorry, ignore the above, I see now the retry logic is not even engaged in the problematic case.

Revision history for this message
Jay Pipes (jaypipes) wrote :

Do we have any known workaround for this?

Revision history for this message
Brian Waldon (bcwaldon) wrote :

You should be able to work around this by matching the rabbit_durable_queues setting to that of the queues that exist on your rabbitmq server.

Brian Waldon (bcwaldon)
Changed in glance:
assignee: nobody → Brian Waldon (bcwaldon)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/15783

Changed in glance:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance (master)

Reviewed: https://review.openstack.org/15783
Committed: http://github.com/openstack/glance/commit/9669067e91ceffcd462f9177d1e2c446dfb347ea
Submitter: Jenkins
Branch: master

commit 9669067e91ceffcd462f9177d1e2c446dfb347ea
Author: Brian Waldon <email address hidden>
Date: Fri Nov 9 13:40:35 2012 -0800

    Prevent infinite respawn of child processes

    Rather than looking for an exit code of 2 to prevent child processes
    from respawning, we look for any non-zero status.

    Fixes bug 1074132.
    Fixes bug 1065197.

    Change-Id: I74ac13737d8a9b8710a8268eb1d67fbe5e75d491

Changed in glance:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in glance:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.