After a restart of rabbitmq getting exceptions on reply of rpc

Bug #1852058 reported by Eyal B
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
oslo.messaging
Fix Released
Undecided
Eyal B

Bug Description

We in mistral use RPC to communicate between the API service to the engine service.
If we do a restart of the rabbitmq server we get a strange exception on the oslo messaging server
during reply:

"Unable to connect to AMQP server on 192.168.0.55:5672 after inf tries: 'NoneType' object has no attribute '__getitem__'"

Even though the heartbeat detected the temporary loss of connection and reported that it was reconnected
only after we restart all mistral services than everything works fine.

This happens only on centos 7 and not on ubuntu on the latest train deliverables

The exception I see http://paste.openstack.org/show/785937/

2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 809, in ensure
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit ret, channel = autoretry_method()
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 510, in _ensured
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit return fun(*args, **kwargs)
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 586, in __call__
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit return fun(*args, channel=channels[0], **kwargs), channels[0]
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 798, in execute_method
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit method()
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1278, in _publish_and_raises_on_missing_exchange
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit timeout=timeout, transport_options=transport_options)
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1228, in _publish
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit compression=self.kombu_compression)
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 181, in publish
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit exchange_name, declare,
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 203, in _publish
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit mandatory=mandatory, immediate=immediate,
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1789, in basic_publish_confirm
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit self.wait([spec.Basic.Ack, spec.Basic.Nack], callback=confirm_handler)
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 80, in wait
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit self.connection.drain_events(timeout=timeout)
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 500, in drain_events
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit while not self.blocking_read(timeout):
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 506, in blocking_read
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit return self.on_inbound_frame(frame)
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 55, in on_frame
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit callback(channel, method_sig, buf, None)
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 509, in on_inbound_method
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit return self.channels[channel_id].dispatch_method(
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit TypeError: 'NoneType' object has no attribute '__getitem__'
2019-11-10 08:55:31.939 22096 ERROR oslo.messaging._drivers.impl_rabbit
2019-11-10 08:55:31.940 22096 ERROR oslo.messaging._drivers.impl_rabbit [req-881d22cd-9344-4735-8b19-b12e527c8b45 - CloudBandNetworkDirector - - -] Unable to connect to AMQP server on 192.168.0.55:5672 after inf tries: 'NoneType' object has no attribute '__getitem__'
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server [req-881d22cd-9344-4735-8b19-b12e527c8b45 - CloudBandNetworkDirector - - -] Can not send reply for message: MessageDeliveryFailure: Unable to connect to AMQP server on 192.168.0.55:5672 after inf tries: 'NoneType' object has no attribute '__getitem__'
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 178, in _process_incoming
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server message.reply(res)
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 147, in reply
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server self._send_reply(conn, reply, failure)
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 123, in _send_reply
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server conn.direct_send(self.reply_q, rpc_common.serialize_msg(msg))
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1305, in direct_send
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server transport_options=options)
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1185, in _ensure_publishing
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server self.ensure(method, retry=retry, error_callback=_error_callback)
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 831, in ensure
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server raise exceptions.MessageDeliveryFailure(msg)
2019-11-10 08:55:31.940 22096 ERROR oslo_messaging.rpc.server MessageDeliveryFailure: Unable to connect to AMQP server on 192.168.0.55:5672 after inf tries: 'NoneType' object has no attribute '__getitem__'

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.opendev.org/693704

Changed in oslo.messaging:
assignee: nobody → Eyal B (eyalb1)
status: New → In Progress
Revision history for this message
Eyal B (eyalb1) wrote :

it seems like a bug in amqp that was fixed in version 2.5.2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/694686

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/694792

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.opendev.org/694686
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=c8d6fed762607fad8f73ade531cb06d6b919b096
Submitter: Zuul
Branch: master

commit c8d6fed762607fad8f73ade531cb06d6b919b096
Author: Eyal <email address hidden>
Date: Sun Nov 17 12:59:06 2019 +0200

    Make sure minimum amqp is 2.5.2

    amqp fixed a bug in 2.5.2 that is needed
    also update kombu to support amqp 2.5.2

    see
    https://review.opendev.org/#/c/693704/
    https://github.com/celery/py-amqp/commit/86cb254dceab75e0240b4fa6b97249de70036a4b

    Change-Id: I4b72d8feb85c2b9b4657510c356cd21e22fe40c2
    Closes-bug: #1852058

Changed in oslo.messaging:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (stable/train)

Change abandoned by Eyal (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/694792

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.messaging 10.4.0

This issue was fixed in the openstack/oslo.messaging 10.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (master)

Change abandoned by Eyal (<email address hidden>) on branch: master
Review: https://review.opendev.org/693704
Reason: fix was done in amqp 2.5.2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 10.0.0.0b1

This issue was fixed in the openstack/mistral 10.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.