oslo.messaging

Bug #1606213
Activity log

Activity log for bug #1606213

Date	Who	What changed	Old value	New value	Message
2016-07-25 12:15:39	Dmitry Mescheryakov	bug			added bug
2016-07-25 12:15:50	Dmitry Mescheryakov	description	The issue originally reported by Sam Morrison in thread http://markmail.org/message/yxkcpydkjuf2d5pb OpenStack version: Liberty Steps to reproduce: * Restart all Neutron agents in an OpenStack one by one. * Notice that a number of fanout queues without consumers were left in rabbitmq. You can see them by running 'rabbitmqctl list_queues consumers messages name \| sort -nr' * The queues will live until they expire (time is controlled by rabbit_transient_queues_ttl parameter) _and_ they will accumulate all fanout messages designated to the agents. That produces a pretty big load on RabbitMQ on big environment, here is some math from Sam's email: "A bad scenario is when you make a change to your cloud that means all your 1000 neutron agents are restarted, this causes a couple of dead queues per agent to hang around. (port updates and security group updates) We get around 25 messages / second on these queues and so you can see after 10 minutes we have a ton of messages in these queues. 1000 x 2 x 25 x 600 = 30,000,000 messages in 10 minutes to be precise." Aside from suggesting people to lower the expiration time to reduce impact, we can also help by deleting fanout queues on graceful shutdown. That will not help in case a service forcibly dies, but that covers pretty natural case when all agents are restarted after Neutron is updated.	The issue is originally reported by Sam Morrison in thread http://markmail.org/message/yxkcpydkjuf2d5pb OpenStack version: Liberty Steps to reproduce: * Restart all Neutron agents in an OpenStack one by one. * Notice that a number of fanout queues without consumers were left in rabbitmq. You can see them by running 'rabbitmqctl list_queues consumers messages name \| sort -nr' * The queues will live until they expire (time is controlled by rabbit_transient_queues_ttl parameter) _and_ they will accumulate all fanout messages designated to the agents. That produces a pretty big load on RabbitMQ on big environment, here is some math from Sam's email: "A bad scenario is when you make a change to your cloud that means all your 1000 neutron agents are restarted, this causes a couple of dead queues per agent to hang around. (port updates and security group updates) We get around 25 messages / second on these queues and so you can see after 10 minutes we have a ton of messages in these queues. 1000 x 2 x 25 x 600 = 30,000,000 messages in 10 minutes to be precise." Aside from suggesting people to lower the expiration time to reduce impact, we can also help by deleting fanout queues on graceful shutdown. That will not help in case a service forcibly dies, but that covers pretty natural case when all agents are restarted after Neutron is updated.
2016-07-25 12:29:24	OpenStack Infra	oslo.messaging: status	New	In Progress
2016-07-25 12:29:24	OpenStack Infra	oslo.messaging: assignee		Kirill Bespalov (k-besplv)
2016-07-25 21:17:28	Nobuto Murata	bug			added subscriber Nobuto Murata
2016-07-25 23:15:50	Sam Morrison	bug			added subscriber Sam Morrison
2016-07-26 23:44:51	OpenStack Infra	oslo.messaging: status	In Progress	Fix Released
2016-07-28 12:26:44	Dmitry Mescheryakov	oslo.messaging: status	Fix Released	Confirmed
2016-08-18 02:49:20	OpenStack Infra	tags		in-feature-amqp-dispatch-router