Here's a trace of an RPC call that times out after a reconnect to rabbit. Somehow impl_rabbit._consume is failing to consume the RPC reply that was sent earlier. Full log is here: https://gist.github.com/noelbk/468f852c6f2b78883cc9 1. ALICE sends RPC call to BOB Jul 10 07:50:49 10.35.0.13 cobalt-compute: ALICE debug __wrapped [Kombu channel:1] _basic_publish(Message({'body': '{"oslo.message": "{\\"_context_roles\\": [], \\"_msg_id\\": \\"8504f806c278472cac67288977887170\\", \\"_context_quota_class\\": null, \\"_context_request_id\\": \\"req-27caa60d-f916-4962-820b-5e9741021f50\\", \\"_context_service_catalog\\": [], \\"args\\": {\\"instance_id\\": 3, \\"teardown\\": false, \\"host\\": \\"10.35.0.13\\"}, \\"_unique_id\\": \\"7604a4c7254a49eba26a78251ebe8fb5\\", \\"_context_user\\": null, \\"_context_user_id\\": null, \\"_context_project_name\\": null, \\"_context_read_deleted\\": \\"no\\", \\"_reply_q\\": \\"reply_a6882628d63d497394316484050dd50b\\", \\"_context_auth_token\\": null, \\"_context_tenant\\": null, \\"_context_instance_lock_checked\\": false, \\"_context_is_admin\\": true, \\"version\\": \\"1.0\\", \\"_context_project_id\\": null, \\"_context_timestamp\\": \\"2014-07-10T07:50:48.043133\\", \\"_context_user_name\\": null, \\"method\\": \\"setup_networks_on_host\\", \\"_context_remote_address\\": null}", "oslo.version": "2.0"}', 'properties': {'priority': 0, 'application_headers': {'ttl': 60000}, 'delivery_mode': 2, 'content_encoding': 'utf-8', 'content_type': 'application/json'}, 'channel': None}), mandatory=False, routing_key='network', immediate=False, exchange='nova') 2. ALICE looses connection to rabbit, delays reconnect for 1 sec Jul 10 07:50:49 10.35.0.13 cobalt-compute: ALICE impl_rabbit ensure NBK: ensure reconnecting... Jul 10 07:50:49 10.35.0.13 cobalt-compute: ALICE impl_rabbit _connect Reconnecting to AMQP server on 10.35.0.3:5672 Jul 10 07:50:49 10.35.0.13 2014-07-10 07:50:49.344 9114 140035666425488 oslo.messaging._drivers.impl_rabbit impl_rabbit acknowledge NBK: acknowledge self._raw_message.body=u'{"oslo.message": "{\\"_context_roles\\": [], \\"_msg_id\\": \\"8504f806c278472cac67288977887170\\", \\"_context_quota_class\\": null, \\"_context_request_id\\": \\"req-27caa60d-f916-4962-820b-5e9741021f50\\", \\"_context_service_catalog\\": [], \\"args\\": {\\"instance_id\\": 3, \\"teardown\\": false, \\"host\\": \\"10.35.0.13\\"}, \\"_unique_id\\": \\"7604a4c7254a49eba26a78251ebe8fb5\\", \\"_context_user\\": null, \\"_context_user_id\\": null, \\"_context_project_name\\": null, \\"_context_read_deleted\\": \\"no\\", \\"_reply_q\\": \\"reply_a6882628d63d497394316484050dd50b\\", \\"_context_auth_token\\": null, \\"_context_tenant\\": null, \\"_context_instance_lock_checked\\": false, \\"_context_is_admin\\": true, \\"version\\": \\"1.0\\", \\"_context_project_id\\": null, \\"_context_timestamp\\": \\"2014-07-10T07:50:48.043133\\", \\"_context_user_name\\": null, \\"method\\": \\"setup_networks_on_host\\", \\"_context_remote_address\\": null}", "oslo.version": "2.0"}' Jul 10 07:50:49 10.35.0.13 cobalt-compute: ALICE impl_rabbit _connect Delaying reconnect for 1.0 seconds... 3. BOB received ALICE's RPC call and publishes result back to ALICE (to reply_a6882628d63d497394316484050dd50b) Jul 10 07:50:50 10.35.0.13 2014-07-10 07:50:50.138 BOB kombu.channel debug __wrapped [Kombu channel:1] _basic_publish(Message({'body': '{"oslo.message": "{\\"_unique_id\\": \\"ea8d75513576431d98a9bc6bab620dd7\\", \\"failure\\": null, \\"_msg_id\\": \\"8504f806c278472cac67288977887170\\", \\"result\\": null, \\"ending\\": true}", "oslo.version": "2.0"}', 'properties': {'priority': 0, 'application_headers': {}, 'delivery_mode': 2, 'content_encoding': 'utf-8', 'content_type': 'application/json'}, 'channel': None}), mandatory=False, routing_key=u'reply_a6882628d63d497394316484050dd50b', immediate=False, exchange=u'reply_a6882628d63d497394316484050dd50b') __wrapped /usr/lib64/python2.7/site-packages/kombu/utils/debug.py:56 Jul 10 07:50:50 10.35.0.13 2014-07-10 07:50:50.138 BOB oslo.messaging._drivers.impl_rabbit impl_rabbit ensure NBK: ensure method=oslo.messaging._drivers.impl_rabbit._publish took=0.000 sec retry=0 got_ret=True exceptions=[] Jul 10 07:50:50 10.35.0.13 2014-07-10 07:50:50.138 BOB kombu.channel debug __wrapped [Kombu channel:1] close() __wrapped /usr/lib64/python2.7/site-packages/kombu/utils/debug.py:56 Jul 10 07:50:50 10.35.0.13 2014-07-10 07:50:50.140 BOB oslo.messaging.rpc.dispatcher dispatcher _dispatch_and_reply NBK: rpc server _dispatch_and_reply method=u'setup_networks_on_host' took=0.210 got_reply=True exceptions=[] 4. ALICE reconnects to rabbit Jul 10 07:50:50 10.35.0.13 rabbitmq.log: =INFO REPORT==== 10-Jul-2014::07:50:50 === Jul 10 07:50:50 10.35.0.13 rabbitmq.log: accepting AMQP connection <0.1272.0> (10.35.0.13:63831 -> 10.35.0.3:5672) Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE connection _start Start from server, version: 0.9, properties: {u'information': u'Licensed under the MPL. See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright (C) 2007-2013 VMware, Inc.', u'capabilities': {u'exchange_exchange_bindings': True, u'consumer_cancel_notify': True, u'publisher_confirms': True, u'basic.nack': True}, u'platform': u'Erlang/OTP', u'version': u'3.1.3'}, mechanisms: [u'PLAIN', u'AMQPLAIN'], locales: [u'en_US'] Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE connection _open_ok Open OK! Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE channel __init__ using channel_id: 1 Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE channel _open_ok Channel open Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE debug __wrapped [Kombu channel:1] exchange_declare(nowait=False, exchange='reply_a6882628d63d497394316484050dd50b', durable=False, passive=False, arguments=None, type='direct', auto_delete=True) Jul 10 07:50:50 10.35.0.13 cobalt-compute.log: /usr/lib64/python2.7/site-packages/amqp/channel.py:616: VDeprecationWarning: The auto_delete flag for exchanges has been deprecated and will be removed Jul 10 07:50:50 10.35.0.13 cobalt-compute.log: from py-amqp v1.5.0. Jul 10 07:50:50 10.35.0.13 cobalt-compute.log: warn(VDeprecationWarning(EXCHANGE_AUTODELETE_DEPRECATED)) Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE debug __wrapped [Kombu channel:1] queue_declare(passive=False, nowait=False, exclusive=False, durable=False, queue='reply_a6882628d63d497394316484050dd50b', arguments={}, auto_delete=True) Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE debug __wrapped [Kombu channel:1] queue_bind(queue='reply_a6882628d63d497394316484050dd50b', arguments=None, nowait=False, routing_key='reply_a6882628d63d497394316484050dd50b', exchange='reply_a6882628d63d497394316484050dd50b') Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE impl_rabbit _connect Connected to AMQP server on 10.35.0.3:5672 5. ALICE calls impl_rabbit._consume, but doesn't receive BOB's earlier reply (consume from reply_a6882628d63d497394316484050dd50b) Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE impl_rabbit _consume NBK: _consume self.do_consume=True self.consumers=[] Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE impl_rabbit _consume NBK: queues_tail.consume queues_tail= Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE debug __wrapped [Kombu channel:1] basic_consume(queue='reply_a6882628d63d497394316484050dd50b', consumer_tag='1', nowait=False, no_ack=False, callback=) Jul 10 07:50:50 10.35.0.13 cobalt-compute: ALICE impl_rabbit _consume NBK: _consume connection.drain_events 6. ALICE times out waiting for RPC response 60sec later - ERROR Jul 10 07:51:50 10.35.0.13 cobalt-compute: ALICE impl_rabbit _error_callback Timed out waiting for RPC response: timed out