2016-07-25 12:15:50 |
Dmitry Mescheryakov |
description |
The issue originally reported by Sam Morrison in thread http://markmail.org/message/yxkcpydkjuf2d5pb
OpenStack version: Liberty
Steps to reproduce:
* Restart all Neutron agents in an OpenStack one by one.
* Notice that a number of fanout queues without consumers were left in rabbitmq. You can see them by running 'rabbitmqctl list_queues consumers messages name | sort -nr'
* The queues will live until they expire (time is controlled by rabbit_transient_queues_ttl parameter) _and_ they will accumulate all fanout messages designated to the agents.
That produces a pretty big load on RabbitMQ on big environment, here is some math from Sam's email:
"A bad scenario is when you make a change to your cloud that means all your 1000
neutron agents are restarted, this causes a couple of dead queues per agent to
hang around. (port updates and security group updates) We get around 25 messages
/ second on these queues and so you can see after 10 minutes we have a ton of
messages in these queues.
1000 x 2 x 25 x 600 = 30,000,000 messages in 10 minutes to be precise."
Aside from suggesting people to lower the expiration time to reduce impact, we can also help by deleting fanout queues on graceful shutdown. That will not help in case a service forcibly dies, but that covers pretty natural case when all agents are restarted after Neutron is updated. |
The issue is originally reported by Sam Morrison in thread http://markmail.org/message/yxkcpydkjuf2d5pb
OpenStack version: Liberty
Steps to reproduce:
* Restart all Neutron agents in an OpenStack one by one.
* Notice that a number of fanout queues without consumers were left in rabbitmq. You can see them by running 'rabbitmqctl list_queues consumers messages name | sort -nr'
* The queues will live until they expire (time is controlled by rabbit_transient_queues_ttl parameter) _and_ they will accumulate all fanout messages designated to the agents.
That produces a pretty big load on RabbitMQ on big environment, here is some math from Sam's email:
"A bad scenario is when you make a change to your cloud that means all your 1000
neutron agents are restarted, this causes a couple of dead queues per agent to
hang around. (port updates and security group updates) We get around 25 messages
/ second on these queues and so you can see after 10 minutes we have a ton of
messages in these queues.
1000 x 2 x 25 x 600 = 30,000,000 messages in 10 minutes to be precise."
Aside from suggesting people to lower the expiration time to reduce impact, we can also help by deleting fanout queues on graceful shutdown. That will not help in case a service forcibly dies, but that covers pretty natural case when all agents are restarted after Neutron is updated. |
|