nova-conductor infinitely reconnects to rabbit

Bug #1460652 reported by Michael Kazakov
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
oslo.messaging
Fix Released
High
Viktor Serhieiev

Bug Description

1. Exact version of Nova
ii nova-api 1:2014.1.100+git201410062002~trusty-0ubuntu1 all OpenStack Compute - API frontend
ii nova-cert 1:2014.1.100+git201410062002~trusty-0ubuntu1 all OpenStack Compute - certificate management
ii nova-common 1:2014.1.100+git201410062002~trusty-0ubuntu1 all OpenStack Compute - common files
ii nova-conductor 1:2014.1.100+git201410062002~trusty-0ubuntu1 all OpenStack Compute - conductor service
ii nova-console 1:2014.1.100+git201410062002~trusty-0ubuntu1 all OpenStack Compute - Console
ii nova-consoleauth 1:2014.1.100+git201410062002~trusty-0ubuntu1 all OpenStack Compute - Console Authenticatorii nova-novncproxy 1:2014.1.100+git201410062002~trusty-0ubuntu1 all OpenStack Compute - NoVNC proxy
ii nova-scheduler 1:2014.1.100+git201410062002~trusty-0ubuntu1 all OpenStack Compute - virtual machine scheduler
ii python-nova 1:2014.1.100+git201410062002~trusty-0ubuntu1 all OpenStack Compute Python libraries
ii python-novaclient 1:2.17.0.74.g2598714+git201404220131~trusty-0ubuntu1 all client library for OpenStack Compute API

rabbit configuration in nova.conf:

  rabbit_hosts = m610-2:5672, m610-1:5672
  rabbit_ha_queues = true

2. Relevant log files:
/var/log/nova/nova-conductor.log

 exchange 'reply_bea18a6133c548f099b85b168fddf83c' in vhost '/'
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 624, in ensure
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs)
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 729, in _publish
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs)
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 361, in __init__
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit type='direct', **options)
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel)
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key)
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 82, in __init__
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel)
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 216, in revive
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit self.declare()
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 102, in declare
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare()
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive,
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 612, in exchange_declare
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit (40, 11), # Channel.exchange_declare_ok
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 75, in wait
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit return self.dispatch_method(method_sig, args, content)
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 93, in dispatch_method
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit return amqp_method(self, args)
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 232, in _close
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit reply_code, reply_text, (class_id, method_id), ChannelError,
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit NotFound: Exchange.declare: (404) NOT_FOUND - no exchange 'reply_bea18a6133c548f099b85b168fddf83c' in vhost '/'
2015-06-01 08:23:56.484 16427 TRACE oslo.messaging._drivers.impl_rabbit
2015-06-01 08:23:56.486 16427 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on m610-2:5672
2015-06-01 08:23:56.486 16425 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on m610-1:5672
2015-06-01 08:23:56.486 16427 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2015-06-01 08:23:56.489 16425 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'reply_bea18a6133c548f099b85b168fddf83c': Exchange.declare: (404) NOT_FOUND - no exchange 'reply_bea18a6133c548f099b85b168fddf83c' in vhost '/'
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 624, in ensure
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs)
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 729, in _publish
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs)
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 361, in __init__
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit type='direct', **options)
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel)
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key)
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 82, in __init__
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel)
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 216, in revive
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit self.declare()
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 102, in declare
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare()
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive,
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 612, in exchange_declare
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit (40, 11), # Channel.exchange_declare_ok
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 75, in wait
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit return self.dispatch_method(method_sig, args, content)
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 93, in dispatch_method
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit return amqp_method(self, args)
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 232, in _close
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit reply_code, reply_text, (class_id, method_id), ChannelError,
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit NotFound: Exchange.declare: (404) NOT_FOUND - no exchange 'reply_bea18a6133c548f099b85b168fddf83c' in vhost '/'
2015-06-01 08:23:56.489 16425 TRACE oslo.messaging._drivers.impl_rabbit
2015-06-01 08:23:56.491 16425 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on m610-2:5672
2015-06-01 08:23:56.491 16425 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2015-06-01 08:23:56.704 16429 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on m610-2:5672
2015-06-01 08:23:56.707 16429 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on m610-1:5672
2015-06-01 08:23:56.709 16429 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'reply_7ff0d26e85d94c129c7ae0c2d9fef40c': Exchange.declare: (404) NOT_FOUND - no exchange 'reply_7ff0d26e85d94c129c7ae0c2d9fef40c' in vhost '/'
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 624, in ensure
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs)
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 729, in _publish
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs)
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 361, in __init__
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit type='direct', **options)
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel)
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key)
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 82, in __init__
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel)
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 216, in revive
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit self.declare()
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 102, in declare
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare()
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive,
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 612, in exchange_declare
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit (40, 11), # Channel.exchange_declare_ok
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 75, in wait
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit return self.dispatch_method(method_sig, args, content)
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 93, in dispatch_method
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit return amqp_method(self, args)
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 232, in _close
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit reply_code, reply_text, (class_id, method_id), ChannelError,
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit NotFound: Exchange.declare: (404) NOT_FOUND - no exchange 'reply_7ff0d26e85d94c129c7ae0c2d9fef40c' in vhost '/'
2015-06-01 08:23:56.709 16429 TRACE oslo.messaging._drivers.impl_rabbit
2015-06-01 08:23:56.712 16429 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on m610-1:5672
2015-06-01 08:23:56.712 16429 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2015-06-01 08:23:56.713 16429 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'reply_7ff0d26e85d94c129c7ae0c2d9fef40c': Exchange.declare: (404) NOT_FOUND - no exchange 'reply_7ff0d26e85d94c129c7ae0c2d9fef40c' in vhost '/'
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 624, in ensure
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs)
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 729, in _publish
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs)
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 361, in __init__
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit type='direct', **options)
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel)
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key)
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 82, in __init__
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel)
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 216, in revive
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit self.declare()
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 102, in declare
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare()
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive,
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 612, in exchange_declare
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit (40, 11), # Channel.exchange_declare_ok
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 75, in wait
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit return self.dispatch_method(method_sig, args, content)
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 93, in dispatch_method
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit return amqp_method(self, args)
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 232, in _close
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit reply_code, reply_text, (class_id, method_id), ChannelError,
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit NotFound: Exchange.declare: (404) NOT_FOUND - no exchange 'reply_7ff0d26e85d94c129c7ae0c2d9fef40c' in vhost '/'
2015-06-01 08:23:56.713 16429 TRACE oslo.messaging._drivers.impl_rabbit
2015-06-01 08:23:56.714 16429 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on m610-2:5672
2015-06-01 08:23:56.714 16429 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...

/<email address hidden>

=ERROR REPORT==== 1-Jun-2015::08:56:38 ===
connection <0.300.15>, channel 1 - soft error:
{amqp_error,not_found,
            "no exchange 'reply_7ff0d26e85d94c129c7ae0c2d9fef40c' in vhost '/'",
            'exchange.declare'}

=ERROR REPORT==== 1-Jun-2015::08:56:38 ===
connection <0.294.15>, channel 1 - soft error:
{amqp_error,not_found,
            "no exchange 'reply_bea18a6133c548f099b85b168fddf83c' in vhost '/'",
            'exchange.declare'}

=ERROR REPORT==== 1-Jun-2015::08:56:38 ===
connection <0.297.15>, channel 1 - soft error:
{amqp_error,not_found,
            "no exchange 'reply_bea18a6133c548f099b85b168fddf83c' in vhost '/'",
            'exchange.declare'}

=ERROR REPORT==== 1-Jun-2015::08:56:38 ===
connection <0.303.15>, channel 1 - soft error:
{amqp_error,not_found,
            "no exchange 'reply_7ff0d26e85d94c129c7ae0c2d9fef40c' in vhost '/'",
            'exchange.declare'}

3. Reproduce steps:
2 controller nodes with rabbitMQ HA cluster and nova services (m610-2, m610-1)
rabbit configuration in nova.conf:
It happens after crush one of controllers. Restarting of nova-conductor service does not solving this problem.

David Medberry (med)
summary: - nova-conductor infinitely reconnets to rabbit
+ nova-conductor infinitely reconnects to rabbit
Revision history for this message
Mehdi Abaakouk (sileht) wrote :

Restarting a rpc client can lead to a connection starvation on the connection pool on the rpc server side.

Step that lead to this issue:
* The rpc client sends a bunch of message (> of the connection pool size, 30 by default)
* The rpc server receives all this messages and process it (but don't sent yet the reply)
* The rpc client application is restarted
* The rpc server tries to replies to all messages received before the restart
   * here the reply queue doesn't exists anymore
   * for each messages that need to be replies we wait 60 seconds (in case of this is due to a rabbit restart)
* In the meantime the new rpc client try to send message and expected reply,
   but the rpc server is waiting to old rpc client to come back.
   Here we got a ton of RPC timeout until the rpc server finished to process its replies messages

Changed in oslo.messaging:
assignee: nobody → Mehdi Abaakouk (sileht)
Changed in nova:
status: New → Invalid
Changed in oslo.messaging:
status: New → Confirmed
importance: Undecided → High
Changed in oslo.messaging:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/193484

Changed in oslo.messaging:
assignee: Mehdi Abaakouk (sileht) → Victor Sergeyev (vsergeyev)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/193037
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=286659a38be5db399d0b9f807fac7b980d6c0b7e
Submitter: Jenkins
Branch: master

commit 286659a38be5db399d0b9f807fac7b980d6c0b7e
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Jun 17 18:45:24 2015 +0200

    Don't reply when we known that client is gone

    In case of a broker restart/failover a reply queue can be
    unreachable for short period the IncomingMessage.send_reply
    will block for 60 seconds in this case or until rabbit recovers.

    But in case of the reply queue is unreachable because the
    rpc client is really gone, we can have a ton of reply to send
    waiting 60 seconds.
    This leads to a starvation of connection of the pool
    The rpc server take to much time to send reply, other rpc client will
    raise TimeoutError because their don't receive their replies in time.

    This changes introduces an object cache that stores already known gone
    client to not wait 60 seconds and hold a connection of the pool
    Keeping 200 last gone rpc client for 1 minute is enough
    and doesn't hold to much memory.

    This also don't raise anymore a frightening exception when we can't send reply
    to the rpc client. But just logging a info about missing exchange and
    a warning about unsend reply.

    Closes-bug: #1460652

    Change-Id: I928b30c9b5f9ee007532ff703e136640b0e8aaf4

Changed in oslo.messaging:
status: In Progress → Fix Committed
Changed in oslo.messaging:
milestone: none → 1.17.0
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (stable/kilo)

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/193484
Reason: Introduce a new requirement not present in kilo

JuanJo Ciarlante (jjo)
tags: added: canonical-bootstack
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/256203

Revision history for this message
Mehdi Abaakouk (sileht) wrote :

For people that want a kilo backport of this, we made a better alternative, the previous change was improving the situation by not holding the connection lock with when we known that the client is gone. I have recently propose a new changes that allow to not hold the connection at all while we are waiting for a client to comeback and that will be backported to kilo:

https://review.openstack.org/#/c/252361/
https://review.openstack.org/#/c/255530/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (stable/kilo)

Change abandoned by ChangBo Guo(gcb) (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/256203
Reason: we don't this any more due to sileht's patch fixed the issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.