oslo.messaging

Bug #2039693
Activity log

Activity log for bug #2039693

Date	Who	What changed	Old value	New value	Message
2023-10-18 14:27:27	Andrey Grishin	bug			added bug
2023-10-18 14:33:41	Andrey Grishin	summary	Services may lost their queues after RabbitMQ service restart	Services may lost their classic queues after RabbitMQ service restart
2023-10-19 07:17:39	Andrey Grishin	description	Hi, I'm testing Rabbitmq HA with OS Zed kolla-ansible classic installation. Rabbitmq service is deployed in containers using Ubuntu 22.04.1, on 3 host cluster using quorum+classic queues. I've noticed, when the Rabbitmq service is restarted on the host that owns the classic queue, the queue moves to(succesfully recreates on) any other running Rabbitmq host. Next, we restart the service on the next owner of the queue (after correct restoration of the cluster). In our configuration, the queue will essentially migrate between two of the three hosts due to the Rabbitmq queue_master_locator = min-masters setting With several similar iterations of Rabbitmq service restart, Rabbitmq informs reconnecting clients(catched with Wireshark) that the queue exists, but in fact it does not exist - as a result, the clients does not recreate its queues when reconnecting. Additional context Tested on: RabbitMQ 3.12.0, 3.12.6 Erlang 26.1.1 oslo.messaging 14.0.1 amqp 5.1.1 kombu 5.2.4 This simple script I'm using to automate this check: #!/usr/bin/bash connect_host="control-3" host_list="control-2 control-1" reply="reply_351f90757a75436287b313f0eb076e46" iter=1 tries=20 while [[ "${iter}" != "${tries}" ]]; do for host in ${host_list}; do if ssh ${connect_host} "sudo docker exec -u0 rabbitmq rabbitmqctl list_consumers -sq --no-table-headers" \| grep -wq ${reply}; then echo "iter:${iter} service restart on: ${host}" ssh ${host} "sudo docker restart rabbitmq" > /dev/null sleep 10 else echo "Queue is lost - iter:${iter} service restart on: ${host}" exit 0 fi done ((++iter)) done in script variables: connect_host - the host to which the queue does not move because queue_master_locator = min-masters, this host is used for control host_list - hosts between which the queue moves reply - classic queue being monitored How I see this behavior: Client has a pool of RabbitMQ nodes to connect to - node1, node2, node3 Client connects to node1, creates a classic non mirrored queue, that appears on node3 We restart RabbitMQ service on node3 Client creates a queue again, that appears on node2 We restart RabbitMQ service on node2 after node3 enter cluster successfully Client creates a queue again, that appears on node3 after some iterations like that, connected to node1 client gets an answer from RabbitMQ node1, that queue still exists but in fact it does not - and client is not redeclaring that queue.	Hi, I'm testing Rabbitmq HA with OS Zed kolla-ansible classic installation. Rabbitmq service is deployed in containers using Ubuntu 22.04.1, on 3 host cluster using quorum+classic queues. I've noticed, when the Rabbitmq service is restarted on the host that owns the classic queue, the queue moves to(succesfully recreates on) any other running Rabbitmq host. Next, we restart the service on the next owner of the queue (after correct restoration of the cluster). In our configuration, the queue will essentially migrate between two of the three hosts due to the Rabbitmq queue_master_locator = min-masters setting With several similar iterations of Rabbitmq service restart, Rabbitmq informs reconnecting clients(catched with Wireshark) that the queue exists, but in fact it does not exist - as a result, the clients does not recreate their queues(reply,fanout, etc.) when reconnecting, that leads to dead OS services. Additional context Tested on: RabbitMQ 3.12.0, 3.12.6 Erlang 26.1.1 oslo.messaging 14.0.1 amqp 5.1.1 kombu 5.2.4 This simple script I'm using to automate this check: #!/usr/bin/bash connect_host="control-3" host_list="control-2 control-1" reply="reply_351f90757a75436287b313f0eb076e46" iter=1 tries=20 while [[ "${iter}" != "${tries}" ]]; do for host in ${host_list}; do if ssh ${connect_host} "sudo docker exec -u0 rabbitmq rabbitmqctl list_consumers -sq --no-table-headers" \| grep -wq ${reply}; then echo "iter:${iter} service restart on: ${host}" ssh ${host} "sudo docker restart rabbitmq" > /dev/null sleep 10 else echo "Queue is lost - iter:${iter} service restart on: ${host}" exit 0 fi done ((++iter)) done in script variables: connect_host - the host to which the queue does not move because queue_master_locator = min-masters, this host is used for control host_list - hosts between which the queue moves reply - classic queue being monitored How I see this behavior: Client has a pool of RabbitMQ nodes to connect to - node1, node2, node3 Client connects to node1, creates a classic non mirrored queue, that appears on node3 We restart RabbitMQ service on node3 Client creates a queue again, that appears on node2 We restart RabbitMQ service on node2 after node3 enter cluster successfully Client creates a queue again, that appears on node3 after some iterations like that, connected to node1 client gets an answer from RabbitMQ node1, that queue still exists but in fact it does not - and client is not redeclaring that queue.