I encountered a situation using openstack; Rabbitmq and conductor are working at one node, when the node reboot, Check the "nova service-list", there is a certain probability of compute nodes is down, I find conductor process usd 30+ socket connection from "netstat" command. I doubt that the connection is not released when Connection publish msg,so the connection pool Exhausted.
So I add some log , I find "publish" can not return,and the end ,Function "wait" can't return of class AbstractChannel in /amqp/abstract_channel.py . The system flow will go to function "basic_publish_confirm" ,I loged the "method",and find (60,80) was not received.
But I can find out how this happen.
I put two condictions logs below, "Error condition:" and "correct condition:".
rpm packages:
rabbitmq-server-3.6.2
python2-oslo-messaging-5.30.2-1.el7.noarch
python2-kombu-4.0.2-5.el7.noarch
python2-amqp-2.1.4-2.el7.noarch
python-amqplib-1.0.2-7.el7.noarch
/etc/nova/nova.conf
heartbeat_timeout_threshold=0
/amqp/abstract_channel.py, class AbstractChannel, function "wait" :
def wait(self, method, callback=None, timeout=None, returns_tuple=False):
p = ensure_promise(callback)
pending = self._pending
prev_p = []
if not isinstance(method, list):
method = [method]
for m in method:
prev_p.append(pending.get(m))
pending[m] = p
try:
while not p.ready:
self.connection.drain_events(timeout=timeout)
if p.value:
args, kwargs = p.value
return args if returns_tuple else (args and args[0])
finally:
for i, m in enumerate(method):
if prev_p[i] is not None:
pending[m] = prev_p[i]
else:
pending.pop(m, None)
Error condition:
2018-02-12 04:56:49.660 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] The reply a7be59f826a24a2fbe86228bd185f67c getconnections ......... reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:155
2018-02-12 04:56:49.661 23928 ERROR oslo_messaging._drivers.common [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] ConnectionContext get: 156415440
2018-02-12 04:56:49.661 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] sending reply msg_id: a7be59f826a24a2fbe86228bd185f67c reply queue: reply_f9184a83a1c243deaf9c913ab5296970 time elapsed: 0.0179738120023s _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:130
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _ensure_publishing pre _connection_lock timeout: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1161
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _ensure_publishing get _connection_lock timeout: None, retry: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1164
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _publish_and_raises_on_missing_exchange _publish _publish_and_raises_on_missing_exchange /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1249
2018-02-12 04:56:49.663 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] Connection._publish: sending message {'oslo.message': '{"_msg_id": "a7be59f826a24a2fbe86228bd185f67c", "failure": null, "_unique_id": "424800b80c41452a97586931bfc92358", "result": {"nova_object.version": "2.4", "nova_object.name": "InstanceList", "nova_object.data": {"objects": []}, "nova_object.namespace": "nova"}, "ending": true}', 'oslo.version': '2.0'} to Exchange reply_f9184a83a1c243deaf9c913ab5296970(direct) with routing key reply_f9184a83a1c243deaf9c913ab5296970 _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1190
2018-02-12 04:56:49.663 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] Connection._publish: _transport_socket_timeout : None _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1195
2018-02-12 04:56:49.664 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] basic_publish_confirm before _basic_publish channel #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1758
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _ensure_publishing pre _connection_lock timeout: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1161
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _ensure_publishing get _connection_lock timeout: None, retry: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1164
2018-02-12 04:56:49.662 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] _publish_and_raises_on_missing_exchange _publish _publish_and_raises_on_missing_exchange /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1249
2018-02-12 04:56:49.663 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] Connection._publish: sending message {'oslo.message': '{"_msg_id": "a7be59f826a24a2fbe86228bd185f67c", "failure": null, "_unique_id": "424800b80c41452a97586931bfc92358", "result": {"nova_object.version": "2.4", "nova_object.name": "InstanceList", "nova_object.data": {"objects": []}, "nova_object.namespace": "nova"}, "ending": true}', 'oslo.version': '2.0'} to Exchange reply_f9184a83a1c243deaf9c913ab5296970(direct) with routing key reply_f9184a83a1c243deaf9c913ab5296970 _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1190
2018-02-12 04:56:49.663 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] Connection._publish: _transport_socket_timeout : None _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1195
2018-02-12 04:56:49.664 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] basic_publish_confirm before _basic_publish channel #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1758
2018-02-12 04:56:49.664 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] having_timeout gettimeout: None, timeout: None having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:49.665 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] basic_publish_confirm after _basic_publish channel #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1760
2018-02-12 04:56:49.665 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] channel wait channel_id: 1 wait /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:98
2018-02-12 04:56:49.665 23928 DEBUG amqp [req-ef133cd8-9064-459e-958c-8ac3cf2d7495 - - - - -] having_timeout gettimeout: None, timeout: None having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:49.952 23928 DEBUG nova_patch.openstack.common.deadlock_monitor [-] full_function_name end nova.servicegroup.drivers.db._report_state _deco /usr/lib/python2.7/site-packages/nova_patch/openstack/common/deadlock_monitor.py:92
2018-02-12 04:56:50.603 23928 DEBUG amqp [-] read_frame channel 1 read_frame /usr/lib/python2.7/site-packages/amqp/transport.py:268
2018-02-12 04:56:50.604 23928 DEBUG amqp [-] having_timeout gettimeout: None, timeout: 0.001 having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:50.604 23928 DEBUG amqp [-] read_frame channel 1 read_frame /usr/lib/python2.7/site-packages/amqp/transport.py:268
2018-02-12 04:56:50.605 23928 DEBUG amqp [-] having_timeout gettimeout: None, timeout: 0.001 having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:50.605 23928 DEBUG amqp [-] read_frame channel 1 read_frame /usr/lib/python2.7/site-packages/amqp/transport.py:268
2018-02-12 04:56:50.605 23928 DEBUG amqp [-] dispatch_method channel_id: 1 rawmsg: (60, 60) dispatch_method /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:118
2018-02-12 04:56:50.606 23928 DEBUG amqp [-] dispatch_method channel_id: 1 receive: (60, 60) dispatch_method /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:128
2018-02-12 04:56:50.606 23928 DEBUG oslo_messaging._drivers.amqpdriver [-] received message msg_id: b90d0d803e4f4268867aa67342ba9411 reply to reply_9bb71474b16240e1befc90971f3d6dbf __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:260
2018-02-12 04:56:50.607 23928 DEBUG amqp [-] having_timeout gettimeout: None, timeout: 0.001 having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:50.620 23928 DEBUG oslo.messaging._drivers.impl_rabbit [-] Timed out waiting for RPC response: timed out _raise_timeout /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1038
2018-02-12 04:56:50.620 23928 DEBUG oslo.messaging._drivers.impl_rabbit [-] Timed out waiting for RPC response: Timeout while waiting on RPC response - topic: "<unknown>", RPC method: "<unknown>" info: "<unknown>" _raise_timeout /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1038
2018-02-12 04:56:50.621 23928 DEBUG amqp [-] having_timeout gettimeout: None, timeout: 1.0 having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:56:50.627 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-3674b282-0ef1-4492-99ce-ae106619e751 - - - - -] The reply b90d0d803e4f4268867aa67342ba9411 getconnections ......... reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:155
correct condition:
2018-02-12 04:57:45.775 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] The reply 4ef37e1d2f4a469a8a1c5ab9f5d4a9af getconnections ......... reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:155
2018-02-12 04:57:45.775 23928 ERROR oslo_messaging._drivers.common [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] ConnectionContext get: 168085968
2018-02-12 04:57:45.776 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] sending reply msg_id: 4ef37e1d2f4a469a8a1c5ab9f5d4a9af reply queue: reply_ebf7c02b6e91434a94e76682c5f06858 time elapsed: 0.279703672s _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:130
2018-02-12 04:57:45.776 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] _ensure_publishing pre _connection_lock timeout: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1161
2018-02-12 04:57:45.777 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] _ensure_publishing get _connection_lock timeout: None, retry: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1164
2018-02-12 04:57:45.777 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] _publish_and_raises_on_missing_exchange _publish _publish_and_raises_on_missing_exchange /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1249
2018-02-12 04:57:45.778 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] Connection._publish: sending message {***************************} to Exchange reply_ebf7c02b6e91434a94e76682c5f06858(direct) with routing key reply_ebf7c02b6e91434a94e76682c5f06858 _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1190
2018-02-12 04:57:45.778 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] Connection._publish: _transport_socket_timeout : None _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1195
2018-02-12 04:57:45.779 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] basic_publish_confirm before _basic_publish channel #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1758
2018-02-12 04:57:45.779 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] having_timeout gettimeout: None, timeout: None having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:57:45.780 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] basic_publish_confirm after _basic_publish channel #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1760
2018-02-12 04:57:45.780 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] channel wait channel_id: 1 wait /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:98
2018-02-12 04:57:45.780 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] having_timeout gettimeout: None, timeout: None having_timeout /usr/lib/python2.7/site-packages/amqp/transport.py:131
2018-02-12 04:57:45.781 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] read_frame channel 1 read_frame /usr/lib/python2.7/site-packages/amqp/transport.py:268
2018-02-12 04:57:45.781 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] dispatch_method channel_id: 1 rawmsg: (60, 80) dispatch_method /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:118
2018-02-12 04:57:45.781 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] dispatch_method channel_id: 1 receive: (60, 80) dispatch_method /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:128
2018-02-12 04:57:45.782 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] channel wait return p.value channel_id: 1 wait /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:106
2018-02-12 04:57:45.782 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] channel wait finally channel_id: 1 wait /usr/lib/python2.7/site-packages/amqp/abstract_channel.py:114
2018-02-12 04:57:45.782 23928 DEBUG amqp [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] basic_publish_confirm channel wait finish #1 basic_publish_confirm /usr/lib/python2.7/site-packages/amqp/channel.py:1762
2018-02-12 04:57:45.783 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] Connection._publish: finish _publish /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1201
2018-02-12 04:57:45.783 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] _publish_and_raises_on_missing_exchange _publish finish _publish_and_raises_on_missing_exchange /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1252
2018-02-12 04:57:45.783 23928 DEBUG oslo.messaging._drivers.impl_rabbit [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] _ensure_publishing _connection_lock release ok timeout: None _ensure_publishing /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:1166
2018-02-12 04:57:45.784 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] direct_send success. msg_id : 4ef37e1d2f4a469a8a1c5ab9f5d4a9af reply_q: reply_ebf7c02b6e91434a94e76682c5f06858 .
2018-02-12 04:57:45.784 23928 ERROR oslo_messaging._drivers.common [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] ConnectionContext connection reset: 168085968
2018-02-12 04:57:45.785 23928 DEBUG oslo_messaging._drivers.amqpdriver [req-9c6e6e55-1c2d-47af-82c4-c26d514349d1 - - - - -] The reply 4ef37e1d2f4a469a8a1c5ab9f5d4a9af release connections ......... reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:160
Hi - I'm unable to reproduce this problem locally.
Does the rabbitmq server log show any error reports?
oslo.messaging maintains a pool of connections. The pool's default maximum is 30. netstat will likely show more as o.m. creates other non-pooled connections for subscriptions.
So seeing 30+ probably means at some point the maximum was hit. At that point, any attempt to send a new message will cause the thread to block waiting for a connection to be freed up.
Would it be possible to generate a Guru Meditation report when the system is in that hanged state?
See: https:/ /docs.openstack .org/oslo. reports/ queens/ user/usage. html
thanks