Quorum queues stucked on rabbit issue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.messaging |
Fix Released
|
Undecided
|
Arnaud Morin |
Bug Description
When using quorum queues and if the queue declaration on rabbit side is failing, the queue can exists but in a bad state, like this:
$ rabbitmq-queues quorum_status reply_36dcaa363
┌──────
│ Node Name │ Raft State │ Log Index │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├──────
│ rabbit@rabbit5 │ noproc │ │ │ │ │ │
├──────
│ rabbit@rabbit4 │ noproc │ │ │ │ │ │
├──────
│ rabbit@rabbit6 │ noproc │ │ │ │ │ │
└──────
In such situation, the only way to fix is to delete the queue as stated in doc [1]:
If a quorum of nodes cannot be recovered (say if 2 out of 3 RabbitMQ nodes are permanently lost) the queue is permanently unavailable and will need to be force deleted and recreated.
That would be nice if oslo_messaging was able to recover from such situation automatically.
[1] https:/
Changed in oslo.messaging: | |
assignee: | nobody → Arnaud Morin (arnaud-morin) |
Changed in oslo.messaging: | |
status: | New → In Progress |
Reviewed: https:/ /review. opendev. org/c/openstack /oslo.messaging /+/889313 /opendev. org/openstack/ oslo.messaging/ commit/ 8e3c523fd74257a 78ceb384063f81d b2e92a2ebd
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 8e3c523fd74257a 78ceb384063f81d b2e92a2ebd
Author: Arnaud Morin <email address hidden>
Date: Fri Jul 21 16:51:51 2023 +0200
Auto-delete the failed quorum rabbit queues
When rabbit is failing for a specific quorum queue, the only thing to
do is to delete the queue (as per rabbit doc, see [1]).
So, to avoid the RPC service to be broken until an operator eventually
do a manual fix on it, catch any INTERNAL ERROR (code 541) and trigger
the deletion of the failed queues under those conditions.
So on next queue declare (triggered from various retries), the queue
will be created again and the service will recover by itself.
Closes-Bug: #2028384
Related-bug: #2031497
[1] https:/ /www.rabbitmq. com/quorum- queues. html#availabili ty
Signed-off-by: Arnaud Morin <email address hidden> 091a4e0bf23bb59 3aca89c5905
Change-Id: Ib8dba833542973