Failed connection attempt to broker results in many services connecting to undeclared queue

Bug #1576672 reported by Michael H Wilson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Confirmed
High
MOS Oslo

Bug Description

Diagnostic snapshot:
http://mos-scale-share.mirantis.com/fuel-snapshot-2016-04-29_08-53-26.tar.gz

For an unknown reason, OpenStack services start failing to report status via RPC.

from nova-compute: http://paste.openstack.org/show/495756/

2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task [req-1dca1efc-a293-475a-b286-da6277c87217 - - - - -] Error during ComputeManager._sync_power_states
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task Traceback (most recent call last):
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/dist-packages/oslo_service/periodic_task.py", line 220, in run_periodic_tasks
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task task(self, context)
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6180, in _sync_power_states
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task use_slave=True)
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 174, in wrapper
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task args, kwargs)
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/dist-packages/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task args=args, kwargs=kwargs)
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task retry=self.retry)
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 91, in _send
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task timeout=timeout, retry=retry)
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 512, in send
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task retry=retry)
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 501, in _send
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task result = self._waiter.wait(msg_id, timeout)
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 379, in wait
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task message = self.waiters.get(msg_id, timeout=timeout)
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 277, in get
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task 'to message ID %s' % msg_id)
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID 68ab268b54604128b8ac09618611fdd0
2016-04-29 11:01:00.232 74189 ERROR oslo_service.periodic_task

from neutron-l3: http://paste.openstack.org/show/495757/

<167>Apr 29 11:35:36 node-1 neutron-openvswitch-agent: 2016-04-29 11:35:36.791 29786 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-a2c5c901-f2b2-42f2-8143-a79cbefa2231 - - - - -] Agent rpc_loop - iteration:63
281 completed. Processed ports statistics: {'regular': {'updated': 0, 'added': 0, 'removed': 0}}. Elapsed:0.114 loop_count_and_wait /usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1722
<163>Apr 29 11:35:37 node-1 neutron-l3-agent: 2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent [-] Failed reporting state!
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 684, in _report_state
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent True)
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/rpc.py", line 86, in report_state
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent return method(context, 'report_state', **kwargs)
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent retry=self.retry)
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 91, in _send
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent timeout=timeout, retry=retry)
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 512, in send
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent retry=retry)
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 501, in _send
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent result = self._waiter.wait(msg_id, timeout)
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 379, in wait
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent message = self.waiters.get(msg_id, timeout=timeout)
2016-04-29 11:35:37.917 32638 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/
<164>Apr 29 11:35:37 node-1 neutron-l3-agent: 2016-04-29 11:35:37.918 32638 WARNING oslo.service.loopingcall [-] Function 'neutron.agent.l3.agent.L3NATAgentWithStateReport._report_state' run outlasted interval by 50.01 sec

Rabbit broker is reporting:

=ERROR REPORT==== 29-Apr-2016::11:50:35 ===
Channel error on connection <0.3495.25> (192.168.0.174:39036 -> 192.168.0.178:5673, vhost: '/', user: 'nova'), channel 1:
operation basic.consume caused a channel exception not_found: "no queue 'reply_733637fb32a84ff5bc1c956078930a28' in vhost '/'"

=ERROR REPORT==== 29-Apr-2016::11:50:35 ===
Channel error on connection <0.3495.25> (192.168.0.174:39036 -> 192.168.0.178:5673, vhost: '/', user: 'nova'), channel 1:
operation basic.consume caused a channel exception not_found: "no queue 'reply_733637fb32a84ff5bc1c956078930a28' in vhost '/'"

=ERROR REPORT==== 29-Apr-2016::11:50:35 ===
Channel error on connection <0.3495.25> (192.168.0.174:39036 -> 192.168.0.178:5673, vhost: '/', user: 'nova'), channel 1:
operation basic.consume caused a channel exception not_found: "no queue 'reply_733637fb32a84ff5bc1c956078930a28' in vhost '/'"

Also relevant:

AMQP server on 192.168.0.138:5673 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 1 seconds.

Steps to reproduce:
1. Deploy Fuel 9.0-244
2. Deploy environment: 3 controllers, 20 computes+ceph, 175 computes, vxlan+dvr, ceph for all, LMA plugin
3. perform rally scenario: create_and_delete_secgroups (has been started at 2016-04-28 14:00:11.466)

Workaround:
Restarting the OpenStack service in question will cause a re connection and the error should disappear. NEED TO VERIFY

Impact:
Affected services do not report status and fail various tasks.

description: updated
Revision history for this message
Michael H Wilson (geekinutah) wrote :

Added logs inline for better search engine indexing.

description: updated
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

MOS Oslo, could you please take a closer look?

tags: added: area-oslo
Changed in mos:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → MOS Oslo (mos-oslo)
milestone: none → 9.0
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

expected result

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.