INTERNAL_ERROR - Cannot declare a queue during RabbitMQ start
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.messaging |
Fix Released
|
Undecided
|
Gabriele Santomaggio |
Bug Description
VErsions:
1. oslo.messaging=
2. Tested with different RabbitMQ versions 3.6.16 and 3.7.13/14
When one RabbitMQ cluster node comes up, there is a time that the AMQP socket is ready, but the store is not available yet.
In general, it is not a problem but when the `queue_
So even the olso-messaging client is connected to one running node, RabbitMQ tries to create the queues to the coming node.
This "rare" condition cause this error:
```
Calling echo ({'arg1': 'test_n_20', 'arg2': 'test_2_20'}) on server=None exchange=
2019-04-02 14:14:30.193 31178 ERROR oslo.messaging.
```
and the message can be lost.
To reproduce the error, you have to:
1- create a RabbitMQ cluster ( you can use my ready Vagrant conf [1]). I used the RabbitMQ version 3.7.14
2- pump the cluster with 1000/2000 queues
3- Use the Ken Giusti example:
git clone <email address hidden>
./rpc-server --url rabbit:
for i in {1..20}; do ./rpc-client --method echo --kwargs "arg1=test_n_$i arg2=test_2_$i" --url rabbit:
4- during the test restart the second node, the one where RabbitMQ is not connected.
you will see that some message gets lost, even if one or more RabbitMQ nodes are running.
There is a thread [2] on the RabbitMQ user group [2] about that.
I am looking at how to make the queue.declare function more tolerant.
Regards
Gabriele Santomaggio
Developer @SUSE
[1] https:/
[2] https:/
[3] https:/
Changed in oslo.messaging: | |
assignee: | nobody → Gabriele Santomaggio (gsantomaggio) |
status: | New → In Progress |
Reviewed: https:/ /review. opendev. org/649989 /git.openstack. org/cgit/ openstack/ oslo.messaging/ commit/ ?id=4d2787227b0 0b973973554f738 7e621d2664c0d8
Committed: https:/
Submitter: Zuul
Branch: master
commit 4d2787227b00b97 3973554f7387e62 1d2664c0d8
Author: Gabriele <email address hidden>
Date: Thu Apr 4 14:56:25 2019 +0200
Retry to declare a queue after internal error
Without this commit, the client can lose the messages, because the
client does not handler the 'AMQP internal error 541',
read here [2] for details.
The fix retries to create the queue after a delay.
When the virtual-host is ready the declare does not fail.
This is a rare condiction, please read the bug [1] for details.
Closes-Bug: #1822778
[1] https:/ /bugs.launchpad .net/oslo. messaging/ +bug/1822778 /www.rabbitmq. com/amqp- 0-9-1-reference .html
[2] https:/
Change-Id: I7ab1f9d21ebb80 7285bf1422bc14c c6e07dcd32a