Controller rolling restart creates rabbitmq duplicateerror messages and some services do not recover
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
Damien Ciabrini |
Bug Description
In our current controller architecture, rabbitmq message recipients are currently configured to be HA queues mirrored across controllers (for failover), but not durable (not persisted to disk for recoveries on server restart). One logical recipient has typically one master queue, and at least one mirror queue.
When a master queue becomes unavailable, a mirror automatically takes over the master role, as long as it's synchronized (i.e it received all the messages that the previous master queue had).
Publishing a message to a recipient consists in pushing the message to a rabbitmq "exchange", which has "bindings" (i.e. routes) to the master and mirror queues. The message publishing is acknowledged once all connected queues acknowledge its reception.
If one queue replica disappears (e.g. rabbitmq stops, controller node reboots), and reconnects _after_ some messages have been queued and not consumed yet in the remaining replicas, this reconnecting replica will become an "unsynchronized" mirror. As such, it won't be able to take over the Master role automatically if a master failover happens.
So during a rolling restart of all rabbitmq servers, it might happen that all master queues disconnects sequentially, and reconnect as unsynchronized mirrors. In such a case, when the last master disconnects, no mirror can take over the master role, and RabbitMQ deletes all the queues for the logical recipient. However, the important detail is that RabbitMQ _does not_ delete the "bindings" to those queues [1].
At this time, when publishing a message to the original logical recipient, rabbitmq will still receive it in the exchange, try to push it to inexistant queues via the leftover "bindings" and will never acknowledge the publishing, because there's no queue anymore to publish to.
Going back to our OpenStack context: an OpenStack client/service can send a "notification" to many consumers at once (i.e. Pub/Sub idiom) via a "fanout" exchange. Each registered consumer has its own HA queue, which means its own master and mirror queues.
As described above, it might happen that a rolling restart of controller nodes deletes all queues for a consumer. If that consumer never comes back online, the bindings to its queues will linger in the fanout exchange, publishing to this particular consumer will never be acknowledged, and consequently the fanout exchange can never acknowledge the publishing of the message to the OpenStack client.
The OpenStack client is unaware of that condition, so it will retry to publish the same message to the fanout exchange. At that time, some consumers already received and acknowledged the original message, ultimately resulting in the DuplicateMessag
Changed in tripleo: | |
assignee: | nobody → Michele Baldessari (michele) |
Changed in tripleo: | |
milestone: | stein-rc1 → train-1 |
Changed in tripleo: | |
milestone: | train-1 → train-2 |
Changed in tripleo: | |
assignee: | Michele Baldessari (michele) → Damien Ciabrini (dciabrin) |
status: | Triaged → In Progress |
tags: | added: stein-backport-potential |
tags: | added: queens-backport-potential rocky-backport-potential |
https:/ /review. opendev. org/#/c/ 649689/