senlin random rabbitmq connection error out

Bug #1909016 reported by Satish Patel
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
senlin
New
Undecided
Unassigned

Bug Description

I have deployed senlin using openstack-ansible victoria release. everything working fine but i am noticing very strangeness every few hours in logs where senlin loose connection with RabbitMQ and throwing following error.

whenever this error showed up in logs, my senlin stopped working and when i run following command it throws error ( at this point solution is to restart senlin api services)

$ openstack cluster list
HttpException: 504: Server Error for url: http://10.65.0.121:8778/v1/clusters?global_project=False, The server didn't respond in time.: 504 Gateway Time-out

Full logs output - http://paste.openstack.org/show/801236/

Dec 22 13:56:44 os-lab-infra-1-senlin-container-16f24bbe senlin-wsgi-api[8188]: 2020-12-22 13:56:44.220 8188 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection
Dec 22 13:56:44 os-lab-infra-1-senlin-container-16f24bbe senlin-conductor[8250]: 2020-12-22 13:56:44.461 8250 ERROR oslo_messaging.rpc.server [req-3c89475b-89fc-404b-8537-df7a587261d9 462618bed32745d2a9166bcc33fc117e f1502c79c70f4651be8ffc7b844b584f - - -] MessageUndeliverable error, source exception: Basic.return: (312) NO_ROUTE, routing_key: reply_54d93c43fe894ed18ce8092f4497306b, exchange: : : oslo_messaging.exceptions.MessageUndeliverable
                                                                                 2020-12-22 13:56:44.461 8250 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
                                                                                 2020-12-22 13:56:44.461 8250 ERROR oslo_messaging.rpc.server File "/openstack/venvs/senlin-22.0.0.0b2.dev56/lib/python3.8/site-packages/oslo_messaging/rpc/server.py", line 184, in _process_incoming
                                                                                 2020-12-22 13:56:44.461 8250 ERROR oslo_messaging.rpc.server message.reply(res)
                                                                                 2020-12-22 13:56:44.461 8250 ERROR oslo_messaging.rpc.server File "/openstack/venvs/senlin-22.0.0.0b2.dev56/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in reply
                                                                                 2020-12-22 13:56:44.461 8250 ERROR oslo_messaging.rpc.server self._send_reply(conn, reply, failure)
                                                                                 2020-12-22 13:56:44.461 8250 ERROR oslo_messaging.rpc.server File "/openstack/venvs/senlin-22.0.0.0b2.dev56/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 125, in _send_reply
                                                                                 2020-12-22 13:56:44.461 8250 ERROR oslo_messaging.rpc.server conn.direct_send(self.reply_q, rpc_common.serialize_msg(msg))
                                                                                 2020-12-22 13:56:44.461 8250 ERROR oslo_messaging.rpc.server File "/openstack/venvs/senlin-22.0.0.0b2.dev56/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1320, in direct_send
                                                                                 2020-12-22 13:56:44.461 8250 ERROR oslo_messaging.rpc.server self._ensure_publishing(self._publish_and_raises_on_missing_exchange,

Tags: victoria
Revision history for this message
Satish Patel (satish-txt) wrote :
Revision history for this message
Herve Beraud (herveberaud) wrote :

Notice that my original analyze [1] wasn't right.

The mandatory flag works as expected and wasn't the root cause.

[1] https://bugs.launchpad.net/oslo.messaging/+bug/1905965

Revision history for this message
Satish Patel (satish-txt) wrote :

@herve

Not sure if my issue is related to this bug or not but i am still getting following error in logs, senlin-api loosing connection to rabbitmq, all other services doing good except senlin so i don't think my rabbitMQ cluster has any issue.

Feb 22 15:29:42 ostack-phx-api-1-1-senlin-container-314663f9 senlin-wsgi-api[3531]: 2021-02-22 15:29:42.001 3531 ERROR oslo.messaging._drivers.impl_rabbit [-] [126429f0-8e25-495c-a612-b4870d00a4a2] AMQP server on 10.65.7.69:5671 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error>
Feb 22 15:29:42 ostack-phx-api-1-1-senlin-container-314663f9 senlin-wsgi-api[3531]: 2021-02-22 15:29:42.003 3531 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection
Feb 22 15:29:43 ostack-phx-api-1-1-senlin-container-314663f9 senlin-wsgi-api[3531]: 2021-02-22 15:29:43.036 3531 INFO oslo.messaging._drivers.impl_rabbit [-] [126429f0-8e25-495c-a612-b4870d00a4a2] Reconnected to AMQP server on 10.65.7.69:5671 via [amqp] client with port 57032.

Revision history for this message
Satish Patel (satish-txt) wrote :

Update/Solution:

After talking with Erik Olof Gunnar Andersson, He suggested to not use uWSGI with senlin and that was it, after removing uWSGI my problem got resolved now i am not seeing any connection issue with senlin-api.

It would be great we senlin-api support uWSGI but that is different issue. For now i am all set.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.