Activity log for bug #1606213

Date Who What changed Old value New value Message
2016-07-25 12:15:39 Dmitry Mescheryakov bug added bug
2016-07-25 12:15:50 Dmitry Mescheryakov description The issue originally reported by Sam Morrison in thread http://markmail.org/message/yxkcpydkjuf2d5pb OpenStack version: Liberty Steps to reproduce: * Restart all Neutron agents in an OpenStack one by one. * Notice that a number of fanout queues without consumers were left in rabbitmq. You can see them by running 'rabbitmqctl list_queues consumers messages name | sort -nr' * The queues will live until they expire (time is controlled by rabbit_transient_queues_ttl parameter) _and_ they will accumulate all fanout messages designated to the agents. That produces a pretty big load on RabbitMQ on big environment, here is some math from Sam's email: "A bad scenario is when you make a change to your cloud that means all your 1000 neutron agents are restarted, this causes a couple of dead queues per agent to hang around. (port updates and security group updates) We get around 25 messages / second on these queues and so you can see after 10 minutes we have a ton of messages in these queues. 1000 x 2 x 25 x 600 = 30,000,000 messages in 10 minutes to be precise." Aside from suggesting people to lower the expiration time to reduce impact, we can also help by deleting fanout queues on graceful shutdown. That will not help in case a service forcibly dies, but that covers pretty natural case when all agents are restarted after Neutron is updated. The issue is originally reported by Sam Morrison in thread http://markmail.org/message/yxkcpydkjuf2d5pb OpenStack version: Liberty Steps to reproduce:  * Restart all Neutron agents in an OpenStack one by one.  * Notice that a number of fanout queues without consumers were left in rabbitmq. You can see them by running 'rabbitmqctl list_queues consumers messages name | sort -nr'  * The queues will live until they expire (time is controlled by rabbit_transient_queues_ttl parameter) _and_ they will accumulate all fanout messages designated to the agents. That produces a pretty big load on RabbitMQ on big environment, here is some math from Sam's email: "A bad scenario is when you make a change to your cloud that means all your 1000 neutron agents are restarted, this causes a couple of dead queues per agent to hang around. (port updates and security group updates) We get around 25 messages / second on these queues and so you can see after 10 minutes we have a ton of messages in these queues. 1000 x 2 x 25 x 600 = 30,000,000 messages in 10 minutes to be precise." Aside from suggesting people to lower the expiration time to reduce impact, we can also help by deleting fanout queues on graceful shutdown. That will not help in case a service forcibly dies, but that covers pretty natural case when all agents are restarted after Neutron is updated.
2016-07-25 12:29:24 OpenStack Infra oslo.messaging: status New In Progress
2016-07-25 12:29:24 OpenStack Infra oslo.messaging: assignee Kirill Bespalov (k-besplv)
2016-07-25 21:17:28 Nobuto Murata bug added subscriber Nobuto Murata
2016-07-25 23:15:50 Sam Morrison bug added subscriber Sam Morrison
2016-07-26 23:44:51 OpenStack Infra oslo.messaging: status In Progress Fix Released
2016-07-28 12:26:44 Dmitry Mescheryakov oslo.messaging: status Fix Released Confirmed
2016-08-18 02:49:20 OpenStack Infra tags in-feature-amqp-dispatch-router