RabbitMQ Connection Issues with Nova-api

Bug #1371371 reported by Tyler Wilson
This bug report is a duplicate of:  Bug #1371723: Too many reconnects in logs. Edit Remove
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
New
Undecided
Unassigned

Bug Description

{"build_id": "2014-09-16_06-06-28", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "29", "auth_required": true, "api": "1.0", "nailgun_sha": "b8d8189cc37d6d1b26f4479be6be7313beefb1c8", "production": "docker", "fuelmain_sha": "915152ab06fb33fdf7fb2653cc767609a1de29d9", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["experimental"], "release": "5.1", "release_versions": {"2014.1.1-5.1": {"VERSION": {"build_id": "2014-09-16_06-06-28", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "29", "api": "1.0", "nailgun_sha": "b8d8189cc37d6d1b26f4479be6be7313beefb1c8", "production": "docker", "fuelmain_sha": "915152ab06fb33fdf7fb2653cc767609a1de29d9", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["experimental"], "release": "5.1", "fuellib_sha": "395fd9d20a003603cc9ad26e16cb13c1c45e24e6"}}}, "fuellib_sha": "395fd9d20a003603cc9ad26e16cb13c1c45e24e6"}

1. Create new environment (Ubuntu, HA mode)
2. Choose GRE segmentation
3. Add controller x5 + Ceilometer
4. Add computes + Ceph OSD
5. Shut down Controllers 1 by 1, restart
6. Attempt to create 15+ Instances

nova-api Shows error:

2014-09-19T02:03:43.256383+01:00 err: 2014-09-19 01:03:43.255 22678 ERROR oslo.messaging._drivers.impl_rabbit [req-2f7835d2-bbc7-4065-a2b6-0dbd6a2d8840 ] Failed to publish message to topic 'notifications.info': [Errno 32] Broken pipe
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 648, in ensure
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 753, in _publish
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 420, in __init__
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit super(NotifyPublisher, self).__init__(conf, channel, topic, **kwargs)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 396, in __init__
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit **options)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 339, in __init__
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 423, in reconnect
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit super(NotifyPublisher, self).reconnect(channel)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 347, in reconnect
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 84, in __init__
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 218, in revive
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit self.declare()
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 104, in declare
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare()
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive,
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 613, in exchange_declare
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit self._send_method((40, 10), args)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 56, in _send_method
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit self.channel_id, method_sig, args, content,
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 221, in write_method
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit write_frame(1, channel, payload)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/transport.py", line 177, in write_frame
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit frame_type, channel, size, payload, 0xce,
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 307, in sendall
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit tail = self.send(data, flags)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 293, in send
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit total_sent += fd.send(data[total_sent:], flags)
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 32] Broken pipe
2014-09-19 01:03:43.255 22678 TRACE oslo.messaging._drivers.impl_rabbit
2014-09-19T02:03:43.257473+01:00 info: 2014-09-19 01:03:43.256 22678 INFO oslo.messaging._drivers.impl_rabbit [req-2f7835d2-bbc7-4065-a2b6-0dbd6a2d8840 ] Reconnecting to AMQP server on 192.168.0.4:5673
2014-09-19T02:03:43.258584+01:00 info: 2014-09-19 01:03:43.257 22678 INFO oslo.messaging._drivers.impl_rabbit [req-2f7835d2-bbc7-4065-a2b6-0dbd6a2d8840 ] Delaying reconnect for 1.0 seconds...
2014-09-19T02:03:44.278938+01:00 info: 2014-09-19 01:03:44.278 22678 INFO oslo.messaging._drivers.impl_rabbit [req-2f7835d2-bbc7-4065-a2b6-0dbd6a2d8840 ] Connected to AMQP server on 192.168.0.4:5673
2014-09-19T02:03:46.141801+01:00 info: 2014-09-19 01:03:46.141 22678 INFO nova.osapi_compute.wsgi.server [req-2f7835d2-bbc7-4065-a2b6-0dbd6a2d8840 None] 192.168.0.5 "POST /v2/f5ad099f05af4f548633550d7b2a3f41/servers HTTP/1.1" status: 202 len: 731 time: 3.2802272

root@node-1:~# crm status
Last updated: Fri Sep 19 01:07:57 2014
Last change: Fri Sep 19 01:07:42 2014 via crm_attribute on node-3
Stack: classic openais (with plugin)
Current DC: node-3 - partition with quorum
Version: 1.1.10-42f2063
5 Nodes configured, 5 expected votes
32 Resources configured

Online: [ node-1 node-2 node-3 node-4 ]
OFFLINE: [ node-5 ]

 vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-1
 vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-3
 p_ceilometer-alarm-evaluator (ocf::mirantis:ceilometer-alarm-evaluator): Started node-4
 p_ceilometer-agent-central (ocf::mirantis:ceilometer-agent-central): Started node-2
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Masters: [ node-1 ]
     Slaves: [ node-2 node-3 node-4 ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-1 node-2 node-3 node-4 ]
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-1 node-2 node-3 node-4 ]
 p_heat-engine (ocf::mirantis:heat-engine): Started node-1
 Clone Set: clone_p_neutron-plugin-openvswitch-agent [p_neutron-plugin-openvswitch-agent]
     Started: [ node-1 node-2 node-3 node-4 ]
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-1 node-2 node-3 node-4 ]
 p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started node-3
 p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started node-1

Revision history for this message
Tyler Wilson (loth) wrote :
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Thanks for issue Tyler. Could you provide time between steps 5 and step 6, cause we have issue that rebuilding rabbit cluster takes 15-20 min.

Changed in fuel:
status: New → Incomplete
Revision history for this message
Tyler Wilson (loth) wrote :
Download full text (18.9 KiB)

Was able to replicate issue after a fresh install with ISO

{"build_id": "2014-09-18_06-04-08", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "31", "auth_required": true, "api": "1.0", "nailgun_sha": "eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d", "production": "docker", "fuelmain_sha": "8ef433e939425eabd1034c0b70e90bdf888b69fd", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["experimental"], "release": "5.1", "release_versions": {"2014.1.1-5.1": {"VERSION": {"build_id": "2014-09-18_06-04-08", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "31", "api": "1.0", "nailgun_sha": "eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d", "production": "docker", "fuelmain_sha": "8ef433e939425eabd1034c0b70e90bdf888b69fd", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["experimental"], "release": "5.1", "fuellib_sha": "d9b16846e54f76c8ebe7764d2b5b8231d6b25079"}}}, "fuellib_sha": "d9b16846e54f76c8ebe7764d2b5b8231d6b25079"}

Was able to replicate RabbitMQ <-> Nova API Timeouts without shutting off any nodes;

182>Sep 19 19:30:12 node-2 nova-api 2014-09-19 19:30:12.802 17124 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.2
<180>Sep 19 19:30:12 node-2 nova-api 2014-09-19 19:30:12.838 17124 WARNING keystoneclient.middleware.auth_token [-] Authorization failed for token
<180>Sep 19 19:30:12 node-2 nova-api 2014-09-19 19:30:12.839 17124 WARNING keystoneclient.middleware.auth_token [-] Authorization failed for token
<182>Sep 19 19:30:12 node-2 nova-api 2014-09-19 19:30:12.840 17124 INFO keystoneclient.middleware.auth_token [-] Invalid user token - rejecting request
<182>Sep 19 19:30:12 node-2 nova-api 2014-09-19 19:30:12.842 17124 INFO nova.osapi_compute.wsgi.server [-] 192.168.0.2 "GET /v2/8986a0ce721840bd8cd692d542475940/servers/detail?all_tenants=True&host=node-12 HT
TP/1.1" status: 401 len: 194 time: 0.0441570
<179>Sep 19 19:31:57 node-2 nova-api 2014-09-19 19:31:57.543 17121 ERROR oslo.messaging._drivers.impl_rabbit [req-45a12408-2c2b-4438-8d55-2bda1ed2eb20 ] Failed to publish message to topic 'compute.node-40': [
Errno 32] Broken pipe
2014-09-19 19:31:57.543 17121 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-09-19 19:31:57.543 17121 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 648, in ensure
2014-09-19 19:31:57.543 17121 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs)
2014-09-19 19:31:57.543 17121 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 753, in _publish
2014-09-19 19:31:57.543 17121 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs)
2014-09-19 19:31:57.543 17121 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 396, in __init__
2014-09-19 19:31:57.543 17121 TRACE oslo.messaging._drivers.impl_rabbit **options)
2014-09-19 19:31:57.543 17121 TRACE oslo.messaging._drive...

Revision history for this message
Tyler Wilson (loth) wrote :

Was able to reproduce in CentOS as well.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: Incomplete → New
milestone: none → 6.0
no longer affects: fuel
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.