Services may lost their classic queues after RabbitMQ service restart
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.messaging |
New
|
Undecided
|
Unassigned |
Bug Description
Hi,
I'm testing Rabbitmq HA with OS Zed kolla-ansible classic installation.
Rabbitmq service is deployed in containers using Ubuntu 22.04.1, on 3 host cluster using quorum+classic queues.
I've noticed, when the Rabbitmq service is restarted on the host that owns the classic queue, the queue moves to(succesfully recreates on) any other running Rabbitmq host. Next, we restart the service on the next owner of the queue (after correct restoration of the cluster). In our configuration, the queue will essentially migrate between two of the three hosts due to the Rabbitmq queue_master_
With several similar iterations of Rabbitmq service restart, Rabbitmq informs reconnecting clients(catched with Wireshark) that the queue exists, but in fact it does not exist - as a result, the clients does not recreate their queues(
Additional context
Tested on:
RabbitMQ 3.12.0, 3.12.6
Erlang 26.1.1
oslo.messaging 14.0.1
amqp 5.1.1
kombu 5.2.4
This simple script I'm using to automate this check:
#!/usr/bin/bash
connect_
host_list=
reply="
iter=1
tries=20
while [[ "${iter}" != "${tries}" ]]; do
for host in ${host_list}; do
if ssh ${connect_host} "sudo docker exec -u0 rabbitmq rabbitmqctl list_consumers -sq --no-table-headers" | grep -wq ${reply}; then
echo "iter:${iter} service restart on: ${host}"
ssh ${host} "sudo docker restart rabbitmq" > /dev/null
sleep 10
else
echo "Queue is lost - iter:${iter} service restart on: ${host}"
exit 0
fi
done
((++iter))
done
in script variables:
connect_host - the host to which the queue does not move because queue_master_
host_list - hosts between which the queue moves
reply - classic queue being monitored
How I see this behavior:
Client has a pool of RabbitMQ nodes to connect to - node1, node2, node3
Client connects to node1, creates a classic non mirrored queue, that appears on node3
We restart RabbitMQ service on node3
Client creates a queue again, that appears on node2
We restart RabbitMQ service on node2 after node3 enter cluster successfully
Client creates a queue again, that appears on node3
after some iterations like that, connected to node1 client gets an answer from RabbitMQ node1, that queue still exists but in fact it does not - and client is not redeclaring that queue.
description: | updated |
Also, there is my discussion related this problem https:/ /github. com/rabbitmq/ rabbitmq- server/ discussions/ 9716