Comment 14 for bug 856764

Revision history for this message
Kevin Bringard (kbringard) wrote :

So, based on Ask's comment about notifications, I started looking into it. As it turns out, *if* you're running a version of kombu/amqp which supports the channel_errors object (version 2.1.4 seems to be when it was introduced: http://kombu.readthedocs.org/en/latest/changelog.html), the following simple patch resolves the issue (also attached):

--- impl_kombu.py.new 2013-08-22 21:52:54.711337602 +0000
+++ impl_kombu.py.orig 2013-08-22 21:52:37.727386558 +0000
@@ -488,7 +488,6 @@
             self.connection = None
         self.connection = kombu.connection.BrokerConnection(**params)
         self.connection_errors = self.connection.connection_errors
- self.channel_errors = self.connection.channel_errors
         if self.memory_transport:
             # Kludge to speed up tests.
             self.connection.transport.polling_interval = 0.0
@@ -562,7 +561,7 @@
         while True:
             try:
                 return method(*args, **kwargs)
- except (self.channel_errors, socket.timeout, IOError), e:
+ except (self.connection_errors, socket.timeout, IOError), e:
                 if error_callback:
                     error_callback(e)
             except Exception, e:

Basically, in ensure() you want to watch the channel and not the connection.

I verified this in a 2 node rabbit cluster. There are 2 nodes: .139 and .141. .139 is currently the master.

The following is from the nova logs when .139 is stopped (and .141 is promoted to the master):

Notice, we're connected to 192.168.128.141:

013-08-22 21:27:45.807 INFO nova.openstack.common.rpc.common [req-20aa6610-b0df-4730-9773-6024e47a6da7 None None] Connected to AMQP server on 192.168.128.141:5672
2013-08-22 21:27:45.843 INFO nova.openstack.common.rpc.common [req-c82c8ea0-aa8b-49b0-925c-b79399f011de None None] Connected to AMQP server on 192.168.128.141:5672

...

Then, we stop rabbit on .139 and see the following *channel* error:

2013-08-22 21:28:13.475 20003 ERROR nova.openstack.common.rpc.common [-] Failed to consume message from queue: tag u'2'
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common Traceback (most recent call last):
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py", line 572, in ensure
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common return method(*args, **kwargs)
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py", line 654, in _consume
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common return self.connection.drain_events(timeout=timeout)
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 281, in drain_events
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common return self.transport.drain_events(self.connection, **kwargs)
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/local/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 91, in drain_events
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common return connection.drain_events(**kwargs)
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 286, in drain_events
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common return amqp_method(channel, args)
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/local/lib/python2.7/dist-packages/amqp/channel.py", line 1628, in _basic_cancel_notify
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common raise ConsumerCancel('tag %r' % (consumer_tag, ))
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common ConsumerCancel: tag u'2'
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common

Ensure fails due to the channel error and causes the service to reconnect. It reconnects to the same host (as it is now the only one alive):

2013-08-22 21:28:13.478 20003 INFO nova.openstack.common.rpc.common [-] Reconnecting to AMQP server on 192.168.128.141:5672
2013-08-22 21:28:13.510 20003 INFO nova.openstack.common.rpc.common [-] Connected to AMQP server on 192.168.128.141:5672
2013-08-22 21:28:17.007 INFO nova.openstack.common.rpc.common [req-482627bb-812e-4997-90c0-96fbf3c8de34 None None] Connected to AMQP server on 192.168.128.141:5672

Message processing then continues as per usual.

Pip install --upgrade kombu works (even on Ubuntu 12.04) to upgrade kombu to support this, however the ultimate solution will likely need to be more robust than this patch as we should do our best to support the shipping version in LTS out of the box.