oslo-mssaging _publish_and_raises_on_missing_exchange can't return

Bug #1749187 reported by shen.zhixing
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo.messaging
Expired
Undecided
Unassigned

Bug Description

I encountered a situation using openstack; Rabbitmq and conductor are working at one node, when the node reboot, Check the "nova service-list", there is a certain probability of compute nodes is down, I find conductor process usd 30+ socket connection from "netstat" command. I doubt that the connection is not released when Connection publish msg,so the connection pool Exhausted.

So I add some log , I find "publish" can not return,and the end ,Function "wait" can't return of class AbstractChannel in /amqp/abstract_channel.py . The system flow will go to function "basic_publish_confirm" ,I loged the "method",and find (60,80) was not received.
But I can find out how this happen.

I put two condictions logs below, "Error condition:" and "correct condition:".

rpm packages:
rabbitmq-server-3.6.2
python2-oslo-messaging-5.30.2-1.el7.noarch
python2-kombu-4.0.2-5.el7.noarch
python2-amqp-2.1.4-2.el7.noarch
python-amqplib-1.0.2-7.el7.noarch

/etc/nova/nova.conf
heartbeat_timeout_threshold=0

/amqp/abstract_channel.py, class AbstractChannel, function "wait" :
    def wait(self, method, callback=None, timeout=None, returns_tuple=False):
        p = ensure_promise(callback)
        pending = self._pending
        prev_p = []
        if not isinstance(method, list):
            method = [method]

        for m in method:
            prev_p.append(pending.get(m))
            pending[m] = p

        try:
            while not p.ready:
                self.connection.drain_events(timeout=timeout)

            if p.value:
                args, kwargs = p.value
                return args if returns_tuple else (args and args[0])
        finally:
            for i, m in enumerate(method):
                if prev_p[i] is not None:
                    pending[m] = prev_p[i]
                else:
                    pending.pop(m, None)

Error condition:
2018-02-12 04:56:49.660 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] The reply a7be59f826a24a2fbe86228bd185f67c getconnections ......... reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:155
2018-02-12 04:56:49.661 23928 ERROR oslo_messaging._drivers.common [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] ConnectionContext get: 156415440
2018-02-12 04:56:49.661 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] sending reply msg_id: a7be59f826a24a2fbe86228bd185f67c reply queue: reply_f9184a83a1c243deaf9c913ab5296970 time elapsed: 0.0179738120023s _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:130
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _ensure_publishing pre _connection_lock timeout: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1161
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _ensure_publishing get _connection_lock timeout: None, retry: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1164
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _publish_and_raises_on_missing_exchange _publish _publish_and_raises_on_missing_exchange /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1249
2018-02-12 04:56:49.663 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] Connection._publish: sending message {'oslo.message': '{"_msg_id": "a7be59f826a24a2fbe86228bd185f67c", "failure": null, "_unique_id": "424800b80c41452a97586931bfc92358", "result": {"nova_object.version": "2.4", "nova_object.name": "InstanceList", "nova_object.data": {"objects": []}, "nova_object.namespace": "nova"}, "ending": true}', 'oslo.version': '2.0'} to Exchange reply_f9184a83a1c243deaf9c913ab5296970(direct) with routing key reply_f9184a83a1c243deaf9c913ab5296970 _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1190
2018-02-12 04:56:49.663 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] Connection._publish: _transport_socket_timeout : None _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1195
2018-02-12 04:56:49.664 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] basic_publish_confirm before _basic_publish channel #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1758
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _ensure_publishing pre _connection_lock timeout: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1161
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _ensure_publishing get _connection_lock timeout: None, retry: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1164
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _publish_and_raises_on_missing_exchange _publish _publish_and_raises_on_missing_exchange /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1249
2018-02-12 04:56:49.663 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] Connection._publish: sending message {'oslo.message': '{"_msg_id": "a7be59f826a24a2fbe86228bd185f67c", "failure": null, "_unique_id": "424800b80c41452a97586931bfc92358", "result": {"nova_object.version": "2.4", "nova_object.name": "InstanceList", "nova_object.data": {"objects": []}, "nova_object.namespace": "nova"}, "ending": true}', 'oslo.version': '2.0'} to Exchange reply_f9184a83a1c243deaf9c913ab5296970(direct) with routing key reply_f9184a83a1c243deaf9c913ab5296970 _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1190
2018-02-12 04:56:49.663 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] Connection._publish: _transport_socket_timeout : None _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1195
2018-02-12 04:56:49.664 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] basic_publish_confirm before _basic_publish channel #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1758
2018-02-12 04:56:49.664 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] having_timeout gettimeout: None, timeout: None having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:49.665 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] basic_publish_confirm after _basic_publish channel #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1760
2018-02-12 04:56:49.665 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] channel wait channel_id: 1 wait /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:98
2018-02-12 04:56:49.665 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] having_timeout gettimeout: None, timeout: None having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:49.952 23928 DEBUG nova_patch.openstack.common.deadlock_monitor [-] full_function_name end nova.servicegroup.drivers.db._report_state _deco /usr/lib/python2.7/site-packages/nova_patch/openstack/common/deadlock_monitor.py:92
2018-02-12 04:56:50.603 23928 DEBUG amqp [-] read_frame channel 1 read_frame /usr/lib/python2.7/site-packages/amqp/transport.py:268
2018-02-12 04:56:50.604 23928 DEBUG amqp [-] having_timeout gettimeout: None, timeout: 0.001 having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:50.604 23928 DEBUG amqp [-] read_frame channel 1 read_frame /usr/lib/python2.7/site-packages/amqp/transport.py:268
2018-02-12 04:56:50.605 23928 DEBUG amqp [-] having_timeout gettimeout: None, timeout: 0.001 having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:50.605 23928 DEBUG amqp [-] read_frame channel 1 read_frame /usr/lib/python2.7/site-packages/amqp/transport.py:268
2018-02-12 04:56:50.605 23928 DEBUG amqp [-] dispatch_method channel_id: 1 rawmsg: (60, 60) dispatch_method /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:118
2018-02-12 04:56:50.606 23928 DEBUG amqp [-] dispatch_method channel_id: 1 receive: (60, 60) dispatch_method /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:128
2018-02-12 04:56:50.606 23928 DEBUG oslo_messaging._drivers.amqpdriver [-] received message msg_id: b90d0d803e4f4268867aa67342ba9411 reply to reply_9bb71474b16240e1befc90971f3d6dbf __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:260
2018-02-12 04:56:50.607 23928 DEBUG amqp [-] having_timeout gettimeout: None, timeout: 0.001 having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:50.620 23928 DEBUG oslo.messaging._drivers.impl_rabbit [-] Timed out waiting for RPC response: timed out _raise_timeout /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1038
2018-02-12 04:56:50.620 23928 DEBUG oslo.messaging._drivers.impl_rabbit [-] Timed out waiting for RPC response: Timeout while waiting on RPC response - topic: "<unknown>", RPC method: "<unknown>" info: "<unknown>" _raise_timeout /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1038
2018-02-12 04:56:50.621 23928 DEBUG amqp [-] having_timeout gettimeout: None, timeout: 1.0 having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:50.627 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-3674b282-0ef1-4492-99ce-ae106619e751 - - - - -] The reply b90d0d803e4f4268867aa67342ba9411 getconnections ......... reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:155

correct condition:
2018-02-12 04:57:45.775 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] The reply 4ef37e1d2f4a469a8a1c5ab9f5d4a9af getconnections ......... reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:155
2018-02-12 04:57:45.775 23928 ERROR oslo_messaging._drivers.common [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] ConnectionContext get: 168085968
2018-02-12 04:57:45.776 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] sending reply msg_id: 4ef37e1d2f4a469a8a1c5ab9f5d4a9af reply queue: reply_ebf7c02b6e91434a94e76682c5f06858 time elapsed: 0.279703672s _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:130
2018-02-12 04:57:45.776 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] _ensure_publishing pre _connection_lock timeout: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1161
2018-02-12 04:57:45.777 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] _ensure_publishing get _connection_lock timeout: None, retry: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1164
2018-02-12 04:57:45.777 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] _publish_and_raises_on_missing_exchange _publish _publish_and_raises_on_missing_exchange /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1249
2018-02-12 04:57:45.778 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] Connection._publish: sending message {***************************} to Exchange reply_ebf7c02b6e91434a94e76682c5f06858(direct) with routing key reply_ebf7c02b6e91434a94e76682c5f06858 _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1190
2018-02-12 04:57:45.778 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] Connection._publish: _transport_socket_timeout : None _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1195
2018-02-12 04:57:45.779 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] basic_publish_confirm before _basic_publish channel #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1758
2018-02-12 04:57:45.779 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] having_timeout gettimeout: None, timeout: None having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:57:45.780 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] basic_publish_confirm after _basic_publish channel #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1760
2018-02-12 04:57:45.780 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] channel wait channel_id: 1 wait /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:98
2018-02-12 04:57:45.780 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] having_timeout gettimeout: None, timeout: None having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:57:45.781 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] read_frame channel 1 read_frame /usr/lib/python2.7/site-packages/amqp/transport.py:268
2018-02-12 04:57:45.781 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] dispatch_method channel_id: 1 rawmsg: (60, 80) dispatch_method /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:118
2018-02-12 04:57:45.781 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] dispatch_method channel_id: 1 receive: (60, 80) dispatch_method /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:128
2018-02-12 04:57:45.782 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] channel wait return p.value channel_id: 1 wait /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:106
2018-02-12 04:57:45.782 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] channel wait finally channel_id: 1 wait /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:114
2018-02-12 04:57:45.782 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] basic_publish_confirm channel wait finish #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1762
2018-02-12 04:57:45.783 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] Connection._publish: finish _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1201
2018-02-12 04:57:45.783 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] _publish_and_raises_on_missing_exchange _publish finish _publish_and_raises_on_missing_exchange /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1252
2018-02-12 04:57:45.783 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] _ensure_publishing _connection_lock release ok timeout: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1166
2018-02-12 04:57:45.784 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] direct_send success. msg_id : 4ef37e1d2f4a469a8a1c5ab9f5d4a9af reply_q: reply_ebf7c02b6e91434a94e76682c5f06858 .
2018-02-12 04:57:45.784 23928 ERROR oslo_messaging._drivers.common [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] ConnectionContext connection reset: 168085968
2018-02-12 04:57:45.785 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] The reply 4ef37e1d2f4a469a8a1c5ab9f5d4a9af release connections ......... reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:160

shen.zhixing (fooy5460)
description: updated
description: updated
Revision history for this message
Ken Giusti (kgiusti) wrote :

Hi - I'm unable to reproduce this problem locally.

Does the rabbitmq server log show any error reports?

oslo.messaging maintains a pool of connections. The pool's default maximum is 30. netstat will likely show more as o.m. creates other non-pooled connections for subscriptions.

So seeing 30+ probably means at some point the maximum was hit. At that point, any attempt to send a new message will cause the thread to block waiting for a connection to be freed up.

Would it be possible to generate a Guru Meditation report when the system is in that hanged state?

See: https://docs.openstack.org/oslo.reports/queens/user/usage.html

thanks

Changed in oslo.messaging:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for oslo.messaging because there has been no activity for 60 days.]

Changed in oslo.messaging:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.