Fanout queues might grow pretty big before expire

Bug #1606213 reported by Dmitry Mescheryakov
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
oslo.messaging
Confirmed
Undecided
Kirill Bespalov

Bug Description

The issue is originally reported by Sam Morrison in thread http://markmail.org/message/yxkcpydkjuf2d5pb

OpenStack version: Liberty

Steps to reproduce:
 * Restart all Neutron agents in an OpenStack one by one.
 * Notice that a number of fanout queues without consumers were left in rabbitmq. You can see them by running 'rabbitmqctl list_queues consumers messages name | sort -nr'
 * The queues will live until they expire (time is controlled by rabbit_transient_queues_ttl parameter) _and_ they will accumulate all fanout messages designated to the agents.

That produces a pretty big load on RabbitMQ on big environment, here is some math from Sam's email:

"A bad scenario is when you make a change to your cloud that means all your 1000
neutron agents are restarted, this causes a couple of dead queues per agent to
hang around. (port updates and security group updates) We get around 25 messages
/ second on these queues and so you can see after 10 minutes we have a ton of
messages in these queues.

1000 x 2 x 25 x 600 = 30,000,000 messages in 10 minutes to be precise."

Aside from suggesting people to lower the expiration time to reduce impact, we can also help by deleting fanout queues on graceful shutdown. That will not help in case a service forcibly dies, but that covers pretty natural case when all agents are restarted after Neutron is updated.

description: updated
Changed in oslo.messaging:
assignee: nobody → Kirill Bespalov (k-besplv)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/346732
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=a6f0aaed3ff2e2ce40665d4e90e92da9d3b3c753
Submitter: Jenkins
Branch: master

commit a6f0aaed3ff2e2ce40665d4e90e92da9d3b3c753
Author: Kirill Bespalov <email address hidden>
Date: Mon Jul 25 15:11:53 2016 +0300

    Delete fanout queues on gracefully shutdown

    No reasons to kept fanout queues in case then
    a rpc server is gracefully shutdown. The expiration
    time of the fanout queue is too long (30 mins), so for
    large scales it can accumulate a lot of messages before it be removed

    Closes-Bug: 1606213
    Change-Id: Ieaa35c454df542042f3a5424d70f87d486693024

Changed in oslo.messaging:
status: In Progress → Fix Released
Revision history for this message
Sam Morrison (sorrison) wrote :

I think this helps but doesn't completely fix the issue, would've been good if the patch did a partial-fix tag. Can someone reopen this ticket?

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Reopened per Sam's comment. Sam, FIY any registered user can change bug status OpenStack projects.

Changed in oslo.messaging:
status: Fix Released → Confirmed
Revision history for this message
Sam Morrison (sorrison) wrote :

As discussed on the ML thread splitting out the config options would be another good step for this.

(also normal plebs can't change a ticket status if it's in Fix Released)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/oslo.messaging 5.6.0

This issue was fixed in the openstack/oslo.messaging 5.6.0 release.

Revision history for this message
Sam Morrison (sorrison) wrote :

Just a note to avoid confusion about the above comment, this bug is partially fixed but more work is needed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (feature/amqp-dispatch-router)

Fix proposed to branch: feature/amqp-dispatch-router
Review: https://review.openstack.org/356468

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (feature/amqp-dispatch-router)
Download full text (16.1 KiB)

Reviewed: https://review.openstack.org/356468
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=39c3901b8c1709a197915f12767737298da1d80c
Submitter: Jenkins
Branch: feature/amqp-dispatch-router

commit ee8fff03d989e1a73068262d072f483b8c779163
Author: OpenStack Proposal Bot <email address hidden>
Date: Fri Aug 12 00:24:13 2016 +0000

    Updated from global requirements

    Change-Id: Ibef43ee38fc395b3d9d55f5e0f820e5c0d0308b1

commit 20a07e7f480ec434cb06cf13156d7c785f70b414
Author: Gevorg Davoian <email address hidden>
Date: Wed Jul 6 11:49:01 2016 +0300

    [zmq] Implement retries for unacknowledged CASTs

    This patch tries to implement a mechanism of acknowledgements and
    retries via proxy for CAST messages.

    Change-Id: I83919382262b9f169becd09f5db465a01a0ccb78
    Partial-Bug: #1497306
    Closes-Bug: #1515269

commit 7c5d039fd355e60e099a0a36408c85a08bfcc2ad
Author: Oleksii Zamiatin <email address hidden>
Date: Thu Aug 4 15:31:45 2016 +0300

    Move zmq driver options into its own group

    ZeroMQ driver options are current stored into the DEFAULT group.
    This change makes the zmq configuration clearer by putting its
    options into oslo_messaging_zmq group.

    Change-Id: Ia00fda005b1664750d2646f8c82ebdf295b156fb
    Closes-bug: #1417040
    Co-Authored-By: Oleksii Zamiatin <email address hidden>

commit 51652c57d2b6fa040a0b88d20bafc0026253a516
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Aug 4 02:40:46 2016 +0000

    Updated from global requirements

    Change-Id: I5b87131404d34b69dab22564eccb8f1e1a141761

commit 2003a52a16105b22bb3696ccc4aac9b1b561f8bd
Author: OpenStack Proposal Bot <email address hidden>
Date: Wed Aug 3 09:06:36 2016 +0000

    Updated from global requirements

    Change-Id: Ibdebbd59e62297de8ddd6fbec7743e3c66d1108f

commit d946fb1862dacf3569d7d47759427d1460b64d11
Author: Gevorg Davoian <email address hidden>
Date: Tue Jul 19 12:53:27 2016 +0300

    Fix pika functional tests

    Change-Id: I05f2cbd914857da7a75ca068e99614156797d1ed
    Closes-Bug: #1599777
    Depends-On: Ic6acc5d006344e08c219724e488fc9222786d849

commit 9e61efa67d2d461626f79c1937dec6c50499568f
Author: ozamiatin <email address hidden>
Date: Tue Jul 26 12:52:11 2016 +0300

    [zmq] Use zmq.IMMEDIATE option for round-robin

    This options helps to prevent message loss by scheduling
    messages only to a connected queue. If there is no connections
    socket hangs waiting.

    Change-Id: I87b97c8b77887f53599a28e0d05fc2c71c149499
    Closes-Bug: #1606272

commit 75764971962252ffd2720036e3b564b5d0ae76f1
Author: maoshuai <fwsakura@163.com>
Date: Fri Jul 29 10:59:34 2016 +0800

    fix a typo in impl_rabbit.py

    Change-Id: I75f99d7e3a6b193e30d8d9baad6a939fbdd0ca6d

commit 12886219a6855a109609aaf009a96a9a2a19ffd2
Author: OpenStack Proposal Bot <email address hidden>
Date: Fri Jul 29 02:33:54 2016 +0000

    Updated from global requirements

    Change-Id: Iae11896324f16164fd62a53c089ebd2948437098

commit 317641c42f006eaec644c3524da63b63ab6771e1
Autho...

tags: added: in-feature-amqp-dispatch-router
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.