Use expiring queues instead of auto-delete ones in RabbitMQ driver

Bug #1495568 reported by Dmitry Mescheryakov
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
oslo.messaging
Fix Released
Wishlist
Dmitry Mescheryakov

Bug Description

Assume the following scenario:

1. RPC server is the only consumer of a RabibtMQ queue which has auto-delete flag set to True. The server gets disconnected from RabbitMQ due to a network glitch.
2. RPC client sends a request to the server's queue.
3. RabbitMQ detects that the only consumer of the server's queue is disconnected and deletes the queue because it has auto-delete flag set.
4. RPC server reconnects and recreates the queue and continue working as normal.
5. RPC client never gets a response and fails with MessagingTimeout, because the request was deleted with the queue in step #3.

This is a bug which we probably can not fix if we continue using auto-delete queues. There were other issues caused by auto-delete queues fixed previously by the following CRs:
https://review.openstack.org/#/c/193037/
https://review.openstack.org/#/c/180905/
https://review.openstack.org/#/c/103157/

Luckily there is an easy alternative which does not have most downsides of auto-delete queues - it is expiring queues. See
http://www.rabbitmq.com/ttl.html#queue-ttl

Basically expiring queues are the same auto-delete queues, except that RabbitMQ waits for a user-defined grace period before deleting the queue. If consumer reconnects during that period, the deletion is canceled. The only other difference is that expiring queues are RabbitMQ-specific and not defined in AMQP 0.9.1 specs. But since we are making specifically RabbitMQ driver, we can neglect that.

Using queues with expiration time of 10-30 minutes will fix the problem outlined in the beginning of description and it will mostly fix other problems fixed by the referenced change requests.

Mehdi Abaakouk (sileht)
Changed in oslo.messaging:
importance: Undecided → Wishlist
status: New → Triaged
Changed in oslo.messaging:
assignee: nobody → Dmitry Mescheryakov (dmitrymex)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/243845
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=10625eed87b4c7f980bd5cd7cacbc4caa2dec197
Submitter: Jenkins
Branch: master

commit 10625eed87b4c7f980bd5cd7cacbc4caa2dec197
Author: John Eckersberg <email address hidden>
Date: Fri Nov 20 17:25:58 2015 -0500

    Kombu: make reply and fanout queues expire instead of auto-delete

    Right now fanout and reply queues are unconditionally created with
    auto-delete flag which causes a number of problems listed in bug
    1495568. Replacing auto-delete with queue expiration with some sane
    timeout should fix all these issues at once.

    Another problem being fixed is that auto-delete flag does not causes
    the queue to be deleted if it never had consumers. An orphaned fanout
    queue might appear that way and it will grow indefinitely until
    somebody manually removes it. See bug 1515278 for details.

    A new rabbit_transient_queues_ttl config parameter is introduced which
    configures the TTL for reply and fanout queues. It is a positive
    integer representing timeout in seconds. By default it is set to 10
    minutes. That should be enough for application to reconnect or
    for server to send reply to client which already died. At the same
    time, it seems that not so many messages could be accumulated in
    fanout queues during that time.

    DocImpact
    With this change RabbitMQ driver defines reply and fanout queues
    differently comparing with the previous release: now they are defined
    with queue TTL (https://www.rabbitmq.com/ttl.html#queue-ttl) instead
    of auto-delete flag. That helps avoid a number of issues, see commit
    description for details. A new rabbit_transient_queues_ttl parameter
    is defined which controls the TTL value. It is set to 10 minutes by
    default. The change does not affect upgrade in any way.

    Closes-bug: #1495568
    Closes-bug: #1515278

    Co-Authored-by: Dmitry Mescheryakov <email address hidden>
    Change-Id: I83a8d09dc0cdae24c12d7043ec810529a9ce57ab

Changed in oslo.messaging:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.