Bug #1338732 “Timed out waiting for a reply via rabbit” : Bugs : oslo.messaging

Revision history for this message

Noel Burton-Krahn (noelbk) wrote on 2014-07-10:

#1

The code above was missing a call to acknowledge(), so it kept reading stale messages. However, with acknowedge(), it still reproduces the original bug. Here's how I reproduce the original problem. I verify the cluster is working, then kill rabbit, then try again.

    1. start cluster, create vms, migrate: ok
    2. kill and restart rabbit
    3. migrate vm: timeout

Here's a trace from https://gist.github.com/noelbk/619426fa88c3bdd0534c after rabbit was restarted. This
pattern repeats a few times during a migration after rabbit was
restarted:

    19:24:56 10.35.0.13 sends _msg_id = 29c6579c5de24c00b0b0e55579b8e047
    19:24:56 10.35.0.14 receives _msg_id = 29c6579c5de24c00b0b0e55579b8e047
    19:24:56 10.35.0.14 acknowledges _msg_id = 29c6579c5de24c00b0b0e55579b8e047
    19:25:56 10.35.0.13 Timed out waiting for a reply to message ID 29c6579c5de24c00b0b0e55579b8e047

I'm instrumenting the rpc calls now to see if they all actually do retry and complete after the timeout errors. I'm trying to get a trace of all the rpc calls to see if they're being acknowledged but not replied to in time.

While digging through oslo.messaging, I noticed that in amqpdriver.py, the incoming queues in ReplyWaiter and AMQPListener are plain arrays, not thread-safe Queues. ReplyWaiter does acquire a lock, but I'm not sure if the plain arrays are always thread-safe. Not sure if this is causing my issue.

Still digging...

Revision history for this message

Noel Burton-Krahn (noelbk) wrote on 2014-07-10:

#2

Download full text (7.8 KiB)

Here's a trace of an RPC call that times out after a reconnect to rabbit. Somehow impl_rabbit._consume is failing to consume the RPC reply that was sent earlier. Full log is here: https://gist.github.com/noelbk/468f852c6f2b78883cc9

1. ALICE sends RPC call to BOB

Jul 10 07:50:49 10.35.0.13 cobalt-compute: ALICE debug __wrapped [Kombu channel:1] _basic_publish(Message({'body': '{"oslo.message": "{\\"_context_roles\\": [], \\"_msg_id\\": \\"8504f806c278472cac67288977887170\\", \\"_context_quota_class\\": null, \\"_context_request_id\\": \\"req-27caa60d-f916-4962-820b-5e9741021f50\\", \\"_context_service_catalog\\": [], \\"args\\": {\\"instance_id\\": 3, \\"teardown\\": false, \\"host\\": \\"10.35.0.13\\"}, \\"_unique_id\\": \\"7604a4c7254a49eba26a78251ebe8fb5\\", \\"_context_user\\": null, \\"_context_user_id\\": null, \\"_context_project_name\\": null, \\"_context_read_deleted\\": \\"no\\", \\"_reply_q\\": \\"reply_a6882628d63d497394316484050dd50b\\", \\"_context_auth_token\\": null, \\"_context_tenant\\": null, \\"_context_instance_lock_checked\\": false, \\"_context_is_admin\\": true, \\"version\\": \\"1.0\\", \\"_context_project_id\\": null, \\"_context_timestamp\\": \\"2014-07-10T07:50:48.043133\\", \\"_context_user_name\\": null, \\"method\\": \\"setup_networks_on_host\\", \\"_context_remote_address\\": null}", "oslo.version": "2.0"}', 'properties': {'priority': 0, 'application_headers': {'ttl': 60000}, 'delivery_mode': 2, 'content_encoding': 'utf-8', 'content_type': 'application/json'}, 'channel': None}), mandatory=False, routing_key='network', immediate=False, exchange='nova')

2. ALICE looses connection to rabbit, delays reconnect for 1 sec

  Jul 10 07:50:49 10.35.0.13 cobalt-compute: ALICE impl_rabbit ensure NBK: ensure reconnecting...
  Jul 10 07:50:49 10.35.0.13 cobalt-compute: ALICE impl_rabbit _connect Reconnecting to AMQP server on 10.35.0.3:5672
  Jul 10 07:50:49 10.35.0.13 2014-07-10 07:50:49.344 9114 140035666425488 oslo.messaging._drivers.impl_rabbit impl_rabbit acknowledge NBK: acknowledge self._raw_message.body=u'{"oslo.message": "{\\"_context_roles\\": [], \\"_msg_id\\": \\"8504f806c278472cac67288977887170\\", \\"_context_quota_class\\": null, \\"_context_request_id\\": \\"req-27caa60d-f916-4962-820b-5e9741021f50\\", \\"_context_service_catalog\\": [], \\"args\\": {\\"instance_id\\": 3, \\"teardown\\": false, \\"host\\": \\"10.35.0.13\\"}, \\"_unique_id\\": \\"7604a4c7254a49eba26a78251ebe8fb5\\", \\"_context_user\\": null, \\"_context_user_id\\": null, \\"_context_project_name\\": null, \\"_context_read_deleted\\": \\"no\\", \\"_reply_q\\": \\"reply_a6882628d63d497394316484050dd50b\\", \\"_context_auth_token\\": null, \\"_context_tenant\\": null, \\"_context_instance_lock_checked\\": false, \\"_context_is_admin\\": true, \\"version\\": \\"1.0\\", \\"_context_project_id\\": null, \\"_context_timestamp\\": \\"2014-07-10T07:50:48.043133\\", \\"_context_user_name\\": null, \\"method\\": \\"setup_networks_on_host\\", \\"_context_remote_address\\": null}", "oslo.version": "2.0"}'
  Jul 10 07:50:49 10.35.0.13 cobalt-compute: ALICE impl_rabbit _connect Delaying reconnect for 1.0 seconds...

3. BOB received ALICE's RPC ...

Sure, I'm just making a pull request for it now, but here's the patch:
https://github.com/noelbk/oslo.messaging/commit/960fc26ff050ca3073ad90eccbef1ca95712e82e

On Thu, Jul 17, 2014 at 6:45 AM, Jens-Christian Fischer <
jens-christian.fischer@switch.ch> wrote:

> We are (most likely) running into this exact problem (Our Controller
> Node with Rabbit is running on a VM that hiccups in regular instances.
> We see a lot of rabbit reconnects from all over the place). Our Cluster
> is hardly usable because VMs don't get started or deleted consistently.
>
> Can you point me to a patch or tell me which file(s) you are looking at?
> I  would like to try to hot-fix this and see if that solves (some of)
> our problems.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1338732
>
> Title:
>   Timed out waiting for a reply via rabbit
>
> Status in Messaging API for OpenStack:
>   New
>
> Bug description:
>   Icehouse
>   oslo-messaging 1.3.0 (stable/icehouse branch from github, c21a2c2)
>   rabbitmq-server 3.1.3
>
>   Nova rpc calls fail often after rabbit restarts.  I've tracked it down
>   to oslo/rabbit/kombu timing out if it's forced to reconnect to rabbit.
>   The code below times out waiting for a reply if the topic has been
>   used in a previous run.  The reply always arrives the first time a
>   topic is used, or if the topic is none.  But, the second run with the
>   same topic will hang with this error:
>
>   MessagingTimeout: Timed out waiting for a reply to message ID ...
>
>   This problem seems too basic to not be caught earlier in oslo, but the
>   program below does really reproduce the same symptoms we see in nova
>   when run against a live rabbit server.
>
>   Here's a log from a test run:
>   https://gist.github.com/noelbk/12adcfe188d9f54f971d
>
>
>   #! /usr/bin/python
>
>   from oslo.config import cfg
>   import threading
>   from oslo import messaging
>   import logging
>   import time
>   log = logging.getLogger(__name__)
>
>   class OsloTest():
>       def test(self):
>           # The code below times out waiting for a reply if the topic
>           # has been used in a previous run.  The reply always arrives
>           # the first time a topic is used, or if the topic is none.
>           # But, the second run with the same topic will hang with this
>           # error:
>           #
>           # MessagingTimeout: Timed out waiting for a reply to message ID
> ...
>           #
>           topic  = 'will_hang_on_second_usage'
>           #topic  = None # never hangs
>
>           url = "%(proto)s://%(user)s:%(password)s@%(host)s/" % dict(
>               proto = 'rabbit',
>               host = '1.2.3.4',
>               password = 'xxxxxxxx',
>               user = 'rabbit-mq-user',
>               )
>           transport = messaging.get_transport(cfg.CONF, url)
>           driver = transport._driver
>
>           target = messaging.Target(topic=topic)
>           listener = driver.listen(target)
>           ctxt={"context": True}
>           timeout = 10
>
>           def send_main():
>               log.debug("sending msg")
>               reply = driver.send(target,
>                                   ctxt,
>                                   {'send': 1},
>                                   wait_for_reply=True,
>                                   timeout=timeout)
>
>               # times out if topic was not None and used before
>               log.debug("received reply=%r" % (reply,))
>
>           send_thread = threading.Thread(target=send_main)
>           send_thread.daemon = True
>           send_thread.start()
>
>           msg = listener.poll()
>           log.debug("received msg=%r" % (msg,))
>
>           msg.reply({'reply': 1})
>
>           log.debug("sent reply")
>
>           send_thread.join()
>
>   if __name__ == '__main__':
>       FORMAT = '%(asctime)-15s %(process)5d %(thread)5d %(filename)s
> %(funcName)s %(message)s'
>       logging.basicConfig(level=logging.DEBUG, format=FORMAT)
>       OsloTest().test()
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/oslo.messaging/+bug/1338732/+subscriptions
>

Bogdan Dobrelya (bogdando) on 2014-07-23

Changed in oslo.messaging:
status:	New → Confirmed

OpenStack Infra (hudson-openstack) on 2014-07-24

Changed in oslo.messaging:
assignee:	nobody → Noel Burton-Krahn (noel-burton-krahn)
status:	Confirmed → In Progress

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-07-28:

#8

Addressed by: https://review.openstack.org/#/c/109373/

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-07-28:

#9

Related bug which is a part of this issue as well https://bugs.launchpad.net/oslo.messaging/+bug/1349301

Revision history for this message

Stan Lagun (slagun) wrote on 2014-07-29:

#10

Please see https://review.openstack.org/#/c/110058/. It looks like it solves this problem, but I'm not sure

Revision history for this message

Noel Burton-Krahn (noelbk) wrote on 2014-07-30:

#11

The review above https://review.openstack.org/#/c/110058/ does not fix this particular problem

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-22: Fix proposed to oslo.messaging (master)

#12

Fix proposed to branch: master
Review: https://review.openstack.org/123115

Changed in oslo.messaging:
assignee:	Noel Burton-Krahn (noel-burton-krahn) → Andrey Pavlov (apavlov-e)

Mehdi Abaakouk (sileht) on 2014-12-03

Changed in oslo.messaging:
status:	In Progress → Incomplete
status:	Incomplete → In Progress

Revision history for this message

Mehdi Abaakouk (sileht) wrote on 2014-12-03:

#13

I can't reproduce this issue, the timeout code path of oslo.messaging have been fixed, Can you try with oslo.messaging 1.5 if this is still valid.

Also test code in the bug description is wrong, you need to ack message to make it works:

msg = listener.poll()
msg.acknowledge()

Changed in oslo.messaging:
status:	In Progress → Incomplete

Andrey Pavlov (apavlov-e) on 2014-12-03

Changed in oslo.messaging:
assignee:	Andrey Pavlov (apavlov-e) → nobody

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-03: Change abandoned on oslo.messaging (master)

#14

Change abandoned by Andrey Pavlov (<email address hidden>) on branch: master
Review: https://review.openstack.org/123115

Revision history for this message

Zhen Qin (zqinit) wrote on 2014-12-04:

#15

To Andrey Pavlov: could you confirm if you still see this problem in slo.messaging 1.5? Thanks.

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2014-12-04:

#16

Unfortunately I can not. Our deployment bases on Icehouse version and not so big.
Also I faced with this issue in very strange case that resolved by re-configuring other cloud software.

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2014-12-19:

#17

Download full text (15.7 KiB)

I believe I have reproduced this in 1.5. It is very easy to reproduce in devstack. Steps:

1. launch an instance: nova
nova boot --image cirros-0.3.2-x86_64-uec --flavor 1 foo
2. call nova console-log
nova console-log foo
3. restart rabbit
sudo service rabbitmq-server restart
4. wait a couple of minutes for compute to reconnect
5. call nova console-log
nova console-log foo

It will fail the first time. Sometimes it fails the second time as well. Errors in the log:

2014-12-19 11:29:47.758 ERROR oslo.messaging._drivers.impl_rabbit [req-a69080c0-4d4c-4de4-9226-0cc5d89f98bc demo demo] Failed to publish message to topic 'compute.devstack': [Errno 104] Connection reset by peer
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 436, in _ensured
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit return fun(*args, **kwargs)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 508, in __call__
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit return fun(*args, channel=channels[0], **kwargs), channels[0]
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 714, in _publish
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, channel, topic=topic, **kwargs)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 381, in __init__
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit **options)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 329, in __init__
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 337, in reconnect
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 85, in __init__
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 218, in revive
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit self.declare()
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 105, in declare
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare()
2014-12-19 11:29:47.7...

I believe I have reproduced this in 1.5. It is very easy to reproduce in devstack. Steps:

1. launch an instance: nova
 nova boot --image cirros-0.3.2-x86_64-uec --flavor 1 foo
2. call nova console-log
 nova console-log foo
3. restart rabbit
 sudo service rabbitmq-server restart
4. wait a couple of minutes for compute to reconnect
5. call nova console-log
 nova console-log foo

It will fail the first time. Sometimes it fails the second time as well. Errors in the log:

2014-12-19 11:29:47.758 ERROR oslo.messaging._drivers.impl_rabbit [req-a69080c0-4d4c-4de4-9226-0cc5d89f98bc demo demo] Failed to publish message to topic 'compute.devstack': [Errno 104] Connection reset by peer
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 436, in _ensured
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     return fun(*args, **kwargs)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 508, in __call__
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     return fun(*args, channel=channels[0], **kwargs), channels[0]
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 714, in _publish
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     publisher = cls(self.conf, channel, topic=topic, **kwargs)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 381, in __init__
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     **options)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 329, in __init__
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     self.reconnect(channel)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 337, in reconnect
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     routing_key=self.routing_key)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 85, in __init__
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     self.revive(self._channel)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 218, in revive
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     self.declare()
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 105, in declare
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     self.exchange.declare()
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     nowait=nowait, passive=passive,
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/amqp/channel.py", line 613, in exchange_declare
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     self._send_method((40, 10), args)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 56, in _send_method
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, method_sig, args, content,
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/amqp/method_framing.py", line 221, in write_method
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     write_frame(1, channel, payload)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/amqp/transport.py", line 182, in write_frame
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     frame_type, channel, size, payload, 0xce,
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio.py", line 359, in sendall
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     tail = self.send(data, flags)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio.py", line 342, in send
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit     total_sent += fd.send(data[total_sent:], flags)
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer
2014-12-19 11:29:47.758 TRACE oslo.messaging._drivers.impl_rabbit
2014-12-19 11:29:47.760 ERROR oslo.messaging._drivers.impl_rabbit [req-a69080c0-4d4c-4de4-9226-0cc5d89f98bc demo demo] AMQP server on 192.168.50.131:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.
2014-12-19 11:29:48.773 ERROR oslo.messaging._drivers.impl_rabbit [req-a69080c0-4d4c-4de4-9226-0cc5d89f98bc demo demo] Failed to consume message from queue: [Errno 104] Connection reset by peer
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 436, in _ensured
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     return fun(*args, **kwargs)
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 508, in __call__
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     return fun(*args, channel=channels[0], **kwargs), channels[0]
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 695, in _consume
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     return self.connection.drain_events(timeout=1)
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 275, in drain_events
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     return self.transport.drain_events(self.connection, **kwargs)
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 91, in drain_events
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     return connection.drain_events(**kwargs)
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 323, in drain_events
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     return amqp_method(channel, args)
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 526, in _close
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     self._x_close_ok()
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 554, in _x_close_ok
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     self._send_method((10, 51))
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 56, in _send_method
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, method_sig, args, content,
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/amqp/method_framing.py", line 221, in write_method
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     write_frame(1, channel, payload)
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/amqp/transport.py", line 182, in write_frame
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     frame_type, channel, size, payload, 0xce,
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio.py", line 359, in sendall
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     tail = self.send(data, flags)
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio.py", line 342, in send
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit     total_sent += fd.send(data[total_sent:], flags)
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer
2014-12-19 11:29:48.773 TRACE oslo.messaging._drivers.impl_rabbit
2014-12-19 11:29:48.774 ERROR oslo.messaging._drivers.impl_rabbit [req-a69080c0-4d4c-4de4-9226-0cc5d89f98bc demo demo] AMQP server on 192.168.50.131:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.
2014-12-19 11:30:48.826 ERROR nova.api.openstack [req-a69080c0-4d4c-4de4-9226-0cc5d89f98bc demo demo] Caught error: Timed out waiting for a reply to message ID 03fdd6852f0847f0a0f292d021d8d6bb
2014-12-19 11:30:48.826 TRACE nova.api.openstack Traceback (most recent call last):
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/opt/stack/nova/nova/api/openstack/__init__.py", line 125, in __call__
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return req.get_response(self.application)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1320, in send
2014-12-19 11:30:48.826 TRACE nova.api.openstack     application, catch_exc_info=False)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1284, in call_application
2014-12-19 11:30:48.826 TRACE nova.api.openstack     app_iter = application(self.environ, start_response)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return resp(environ, start_response)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/local/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py", line 828, in __call__
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return self._call_app(env, start_response)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/local/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py", line 758, in _call_app
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return self._app(env, _fake_start_response)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return resp(environ, start_response)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return resp(environ, start_response)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/local/lib/python2.7/dist-packages/routes/middleware.py", line 131, in __call__
2014-12-19 11:30:48.826 TRACE nova.api.openstack     response = self.app(environ, start_response)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return resp(environ, start_response)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__
2014-12-19 11:30:48.826 TRACE nova.api.openstack     resp = self.call_func(req, *args, **self.kwargs)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return self.func(req, *args, **kwargs)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 964, in __call__
2014-12-19 11:30:48.826 TRACE nova.api.openstack     content_type, body, accept)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 1029, in _process_stack
2014-12-19 11:30:48.826 TRACE nova.api.openstack     action_result = self.dispatch(meth, request, action_args)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/opt/stack/nova/nova/api/openstack/wsgi.py", line 1119, in dispatch
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return method(req=request, **action_args)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/opt/stack/nova/nova/api/openstack/compute/contrib/console_output.py", line 65, in get_console_output
2014-12-19 11:30:48.826 TRACE nova.api.openstack     length)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/opt/stack/nova/nova/compute/api.py", line 226, in wrapped
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return func(self, context, target, *args, **kwargs)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/opt/stack/nova/nova/compute/api.py", line 207, in wrapped
2014-12-19 11:30:48.826 TRACE nova.api.openstack     return function(self, context, instance, *args, **kwargs)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/opt/stack/nova/nova/compute/api.py", line 2910, in get_console_output
2014-12-19 11:30:48.826 TRACE nova.api.openstack     instance=instance, tail_length=tail_length)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/opt/stack/nova/nova/compute/rpcapi.py", line 440, in get_console_output
2014-12-19 11:30:48.826 TRACE nova.api.openstack     instance=instance, tail_length=tail_length)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call
2014-12-19 11:30:48.826 TRACE nova.api.openstack     retry=self.retry)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send
2014-12-19 11:30:48.826 TRACE nova.api.openstack     timeout=timeout, retry=retry)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 436, in send
2014-12-19 11:30:48.826 TRACE nova.api.openstack     retry=retry)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 425, in _send
2014-12-19 11:30:48.826 TRACE nova.api.openstack     result = self._waiter.wait(msg_id, timeout)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 315, in wait
2014-12-19 11:30:48.826 TRACE nova.api.openstack     reply, ending = self._poll_connection(msg_id, timer)
2014-12-19 11:30:48.826 TRACE nova.api.openstack   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 264, in _poll_connection
2014-12-19 11:30:48.826 TRACE nova.api.openstack     % msg_id)
2014-12-19 11:30:48.826 TRACE nova.api.openstack MessagingTimeout: Timed out waiting for a reply to message ID 03fdd6852f0847f0a0f292d021d8d6bb
2014-12-19 11:30:48.826 TRACE nova.api.openstack

Vish Ishaya (vishvananda) on 2014-12-19

Changed in oslo.messaging:
status:	Incomplete → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-24: Related fix proposed to oslo.messaging (stable/juno)

#18

Related fix proposed to branch: stable/juno
Review: https://review.openstack.org/143805

Revision history for this message

QingchuanHao (haoqingchuan-28) wrote on 2014-12-26:

#19

Sorry for my disrupting the posting formation. The not-reusing-timeout , fixed in the commit reviewed above, indeed solved the hanging problem. But in case of heartbeat, the problem may occur. Because the heartbeat as a packet, unlock the timeout set by the socket.settimeout, and hang in _next_method located in amqp/method_frame.py. Publisher declare the queue can be beneficial in such case that the message is published before a queue declared, which is not ensure in so many ensurances.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-30: Fix proposed to oslo.messaging (master)

#20

Fix proposed to branch: master
Review: https://review.openstack.org/144498

Changed in oslo.messaging:
assignee:	nobody → Vish Ishaya (vishvananda)
status:	Confirmed → In Progress

OpenStack Infra (hudson-openstack) on 2015-01-06

Changed in oslo.messaging:
assignee:	Vish Ishaya (vishvananda) → Mehdi Abaakouk (sileht)

Mehdi Abaakouk (sileht) on 2015-01-06

Changed in oslo.messaging:
importance:	Undecided → High
milestone:	none → next-kilo

Doug Hellmann (doug-hellmann) on 2015-01-27

Changed in oslo.messaging:
milestone:	1.6.0 → next-kilo

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-29: Fix merged to oslo.messaging (master)

#21

Reviewed: https://review.openstack.org/109373
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=434b5c8781b36cd65b7b642d723c0502e7093795
Submitter: Jenkins
Branch: master

commit 434b5c8781b36cd65b7b642d723c0502e7093795
Author: Mehdi Abaakouk <email address hidden>
Date: Tue Jul 22 09:42:52 2014 -0700

Declare DirectPublisher exchanges with passive=True

    If rabbit dies, the consumer can be disconnected before the publisher
    sends, and if the consumer hasn't declared the queue, the publisher's
    will send a message to an exchange that's not bound to a queue, and
    the message wll be lost. Setting passive=True will cause the
    publisher to fail and retry if the consumer hasn't declared the
    receiving queue yet.

    Co-Authored-By: Noel Burton-Krahn <email address hidden>
    Closes-Bug: #1338732
    Change-Id: I5ba4d311b97236d3a85a9f5badff61f12b08c12d

Changed in oslo.messaging:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-13: Change abandoned on oslo.messaging (master)

#22

Change abandoned by Mehdi Abaakouk (<email address hidden>) on branch: master
Review: https://review.openstack.org/144498
Reason: This bug have been closed by https://review.openstack.org/#/c/109373/

Mehdi Abaakouk (sileht) on 2015-02-25

Changed in oslo.messaging:
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-04: Related fix proposed to oslo.messaging (stable/juno)

#23

Related fix proposed to branch: stable/juno
Review: https://review.openstack.org/161119

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-26: Related fix merged to oslo.messaging (stable/juno)

#24

Reviewed: https://review.openstack.org/161119
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=858353394dd15101ebcb1c4b39e672bc4aa2bd86
Submitter: Jenkins
Branch: stable/juno

commit 858353394dd15101ebcb1c4b39e672bc4aa2bd86
Author: Jie Li <email address hidden>
Date: Wed Mar 4 15:42:13 2015 +0800

Fix _poll_connection not timeout issue (1/2)

    _poll_connection could fall into a loop waiting for a reply message, if
    rabbit dies and up. This commit will set up a rpc_response_timeout timer
    for one connection polling; so the rpc will finally jump out with a
    timeout exception which is expected in such scenario.

Related bug: #1338732

The title of the commit in master was:
rabbit: more precise iterconsume timeout

but this was changed since it didn't describe the actual change.
This commit resolved some conflicts due to cherry-pick.

Change-Id: I157dab80cdb4afcf9a5f26fa900f96f0696db502
(cherry picked from commit 023b7f44e2ccd77a7e9ee9ee78431a4646c88f13)

tags:

added: in-stable-juno

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-15: Related fix proposed to oslo.messaging (stable/icehouse)

#25

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/173717

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-15: Fix proposed to oslo.messaging (stable/icehouse)

#26

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/173724

James Page (james-page) on 2015-04-15

Changed in oslo.messaging (Ubuntu Vivid):
status:	New → Fix Released
Changed in oslo.messaging (Ubuntu Utopic):
status:	New → Fix Released
Changed in oslo.messaging (Ubuntu Trusty):
status:	New → Confirmed
importance:	Undecided → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-16: Change abandoned on oslo.messaging (stable/icehouse)

#27

Change abandoned by Edward Hope-Morley (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/173724
Reason: I am going to resubmit this as part of https://review.openstack.org/#/c/173717/ including the other patch mentioned by Mehdi

Edward Hope-Morley (hopem) on 2015-04-17

description:	updated
Changed in oslo.messaging (Ubuntu Trusty):
status:	Confirmed → In Progress

Mehdi Abaakouk (sileht) on 2015-04-17

tags:

added: icehouse-backport-potential

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2015-04-19:

#29

oslo.messaging_1.3.0-0ubuntu1hf1338732v20150415.1.debdiff Edit (20.5 KiB, text/plain)

Revision history for this message

David Medberry (med) wrote on 2015-04-20:

#30

At the top I see "Affects" for T, U, and V. Where do the "Affects" for UCA show up? Ie, where's the patch for Juno on Trusty going to appear? Is it being considered. (Filing this in one place but of course this affects almost any T/U backport for I/J and now K OpenStack releases. What's the right way to flag a bug for UCA in general? @dosaboy @jamespage

Revision history for this message

James Page (james-page) wrote on 2015-04-20:

#31

med

UCA will get auto-fed the result of the SRU process; so its implicit.

Raising a cloud-archive task so its trackable.

James Page (james-page) on 2015-04-23

Changed in oslo.messaging (Ubuntu Trusty):
assignee:	nobody → James Page (james-page)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-27: Change abandoned on oslo.messaging (stable/juno)

#32

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/143805
Reason: https://review.openstack.org/#/c/161119/

and

https://review.openstack.org/#/c/161120/

have replaced this changes.

Revision history for this message

Chris J Arges (arges) wrote on 2015-04-29: Please test proposed package

#33

Hello Noel, or anyone else affected,

Accepted oslo.messaging into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/oslo.messaging/1.3.0-0ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in oslo.messaging (Ubuntu Trusty):
status:	In Progress → Fix Committed
tags:	added: verification-needed

Revision history for this message

Ante Karamatić (ivoks) wrote on 2015-05-07:

#34

We used 1.3.0-0ubuntu1.1 and we confirm it solves the problem.

tags:

added: verification-done
removed: verification-needed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-05-07:

#35

This bug was fixed in the package oslo.messaging - 1.3.0-0ubuntu1.1

---------------
oslo.messaging (1.3.0-0ubuntu1.1) trusty; urgency=medium

  * Backport fixes for reliable AMQP reconnect support, ensuring
    nova-compute instances re-connect and message correctly when
    RabbitMQ message brokers disappear is clustered configurations:
    - d/p/0001-rabbit-more-precise-iterconsume-timeout.patch:
      Improve precision of iterconsume timeouts (LP: #1400268).
    - d/p/0002-rabbit-fix-timeout-timer-when-duration-is-None.patch:
      Fix timeout timer when duration is set to None (LP: #1408370).
    - d/p/0003-Declare-DirectPublisher-exchanges-with-passive-True.patch:
      Ensure that message publishers fail and retry if the consumer has
      not yet declared a receiving queue (LP: #1338732).
-- Edward Hope-Morley <email address hidden> Thu, 23 Apr 2015 15:56:08 +0100

Changed in oslo.messaging (Ubuntu Trusty):
status:	Fix Committed → Fix Released

Revision history for this message

Brian Murray (brian-murray) wrote on 2015-05-07: Update Released

#36

The verification of the Stable Release Update for oslo.messaging has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) on 2015-05-12

Changed in oslo.messaging (Ubuntu Utopic):
status:	Fix Released → New
importance:	Undecided → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-15: Related fix merged to oslo.messaging (stable/icehouse)

#37

Reviewed: https://review.openstack.org/173717
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=b58180210145e1c804ec496576d6bb2caabc68ef
Submitter: Jenkins
Branch: stable/icehouse

commit b58180210145e1c804ec496576d6bb2caabc68ef
Author: Mehdi Abaakouk <email address hidden>
Date: Mon Dec 8 10:56:52 2014 +0100

rabbit: more precise iterconsume timeout

The iterconsume always set the timeout of kombu to 1 second
even the requested timeout more precise or < 1 second.

This change fixes that.

    Related bug: #1400268
    Related bug: #1399257
    Related-bug: #1338732

(cherry picked from commit 023b7f44e2ccd77a7e9ee9ee78431a4646c88f13)

    Conflicts:
     oslo/messaging/_drivers/amqpdriver.py
     oslo/messaging/_drivers/impl_rabbit.py

Change-Id: I157dab80cdb4afcf9a5f26fa900f96f0696db502

tags:

added: in-stable-icehouse

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-15: Fix merged to oslo.messaging (stable/icehouse)

#38

Reviewed: https://review.openstack.org/173724
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=33f3b18539da88ddf984ada4b63e2767d64438f9
Submitter: Jenkins
Branch: stable/icehouse

commit 33f3b18539da88ddf984ada4b63e2767d64438f9
Author: Mehdi Abaakouk <email address hidden>
Date: Tue Jul 22 09:42:52 2014 -0700

Declare DirectPublisher exchanges with passive=True

    If rabbit dies, the consumer can be disconnected before the publisher
    sends, and if the consumer hasn't declared the queue, the publisher's
    will send a message to an exchange that's not bound to a queue, and
    the message wll be lost. Setting passive=True will cause the
    publisher to fail and retry if the consumer hasn't declared the
    receiving queue yet.

    Co-Authored-By: Noel Burton-Krahn <email address hidden>
    Closes-Bug: #1338732
    (cherry picked from commit 434b5c8781b36cd65b7b642d723c0502e7093795)

    Conflicts:
     oslo_messaging/_drivers/common.py
     oslo_messaging/tests/test_utils.py

Change-Id: I5ba4d311b97236d3a85a9f5badff61f12b08c12d

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-05-26:

#39

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in oslo.messaging (Ubuntu Utopic):
status:	New → Confirmed

Revision history for this message

Viktor Serhieiev (vsergeyev) wrote on 2015-05-26:

#40

What are the actual steps to reproduce this bug? I want to try this fix - https://review.openstack.org/#/c/110058/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-26: Related fix merged to oslo.messaging (master)

#41

Reviewed: https://review.openstack.org/184473
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=cc4ca1f9efc7f4a935ad31e869e8aae89e98b8fe
Submitter: Jenkins
Branch: master

commit cc4ca1f9efc7f4a935ad31e869e8aae89e98b8fe
Author: Victor Sergeyev <email address hidden>
Date: Wed May 20 15:49:53 2015 +0300

Add one more functional test for MessagingTimeout

Add a check, that MessagingTimeout raises on long-running queries,
if client sends another queries at the same time.

Added a long_running_task() to TestServerEndpoint and allowed to pass a
message executor into the RpcServerFixture.

Related bug: #1338732

Co-Authored-By: Roman Podoliaka <email address hidden>
Change-Id: Icafb6838e2d9fb76b6d1c202465c09c174a3bed9

Revision history for this message

Chris J Arges (arges) wrote on 2015-07-08: Please test proposed package

#42

Hello Noel, or anyone else affected,

Accepted oslo.messaging into utopic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/oslo.messaging/1.4.1-0ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in oslo.messaging (Ubuntu Utopic):
status:	Confirmed → Fix Committed
tags:	removed: verification-done
tags:	added: verification-needed

Revision history for this message

JohnsonYi (yichengli) wrote on 2015-07-17:

#43

Download full text (7.7 KiB)

Hello Chris J Arges, I tested your package https://launchpad.net/ubuntu/+source/oslo.messaging/1.4.1-0ubuntu1.1 on my fuel openstack environment, it didn't work very well.

Please check the below information for details,

[root@mvp-builder ~]# fuel --f
api: '1.0'
astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
auth_required: true
build_id: 2014-12-26_14-25-46
build_number: '58'
feature_groups:
- mirantis
- experimental
fuellib_sha: fde8ba5e11a1acaf819d402c645c731af450aff0
fuelmain_sha: 81d38d6f2903b5a8b4bee79ca45a54b76c1361b8
nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
production: docker
release: '6.0'
release_versions:
  2014.2-6.0:
    VERSION:
      api: '1.0'
      astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
      build_id: 2014-12-26_14-25-46
      build_number: '58'
      feature_groups:
      - mirantis
      fuellib_sha: fde8ba5e11a1acaf819d402c645c731af450aff0
      fuelmain_sha: 81d38d6f2903b5a8b4bee79ca45a54b76c1361b8
      nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
      ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
      production: docker
      release: '6.0'

root@node-3:~# dpkg -l | grep oslo
ii python-oslo.config 1:1.4.0-fuel6.0~mira16 Common code for Openstack Projects (configuration API)
ii python-oslo.db 1.0.1-fuel6.0~mira19 oslo db library
ii python-oslo.i18n 1.0.0-fuel6.0~mira16 Oslo Internationalization Utilities - Python 2.x
ii python-oslo.messaging 1.4.1-fuel6.0~mira18 oslo messaging library
ii python-oslo.rootwrap 1.3.0.0a1-ubuntu2 Fine filtering of shell commands as 'root'
ii python-oslo.serialization 1.0.0-fuel6.0~mira15 utilities for serialization , especially JSON - Python 2.x
ii python-oslo.utils 1.0.0-fuel6.0~mira16 set of utility functions for OpenStack - Python 2.x
ii python-oslo.vmware 0.6.0-fuel6.0~mira16 VMware library for OpenStack projects - Python 2.x

I replaced the whole folder under oslo.messaging-1.4.1\oslo\messaging to /usr/share/pyshared/oslo/messaging, then restart all 3 controllers, the HA didn't work very well, nova service-list did not work with HTTP 503 error. Neutron components didn't work very well also.

Please check below error messages,
<166>Jul 16 14:22:12 node-3 neutron-server 2015-07-16 14:22:12.436 6138 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 127.0.0.1:5673
<166>Jul 16 14:22:12 node-3 neutron-server 2015-07-16 14:22:12.456 6138 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 127.0.0.1:5673
<163>Jul 16 14:22:12 node-3 neutron-server 2015-07-16 14:22:12.544 6061 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'reply_c2d631eb2efa4fc1b88d2d03f90d6bca': [Errno 32] Broken pipe
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging....

oslo.messaging

Timed out waiting for a reply via rabbit

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Patches

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
Ubuntu Cloud Archive	Fix Released	Undecided	Unassigned
oslo.messaging	Fix Released	High	Mehdi Abaakouk	oslo.messaging 1.7.0
oslo.messaging (Ubuntu)	Fix Released	Undecided	Unassigned
Trusty	Fix Released	High	James Page
Utopic	Won't Fix	High	Jorge Niedbalski
Vivid	Fix Released	Undecided	Unassigned

Changed in oslo.messaging (Ubuntu Utopic):
assignee:	nobody → Jorge Niedbalski (niedbalski)
status:	Fix Committed → In Progress

Changed in oslo.messaging (Ubuntu Utopic):
status:	In Progress → Won't Fix