Comment 0 for bug 1871850

Revision history for this message
LIU Yulong (dragon889) wrote : [L3] existing router resources are partial deleted unexceptedly when MQ is gone

ENV: meet this issue on our stable/queens deployment, but master branch has the same code logic

When the L3 agent get a router update notification, it will try to retrieve the router info from DB server [1]. But at this time, if the message queue is down/unreachable. It will get exceptions related message queue. A resync action will be run then [2]. Sometimes, from my personal experience, rabbitMQ cluster is not so much easy to recover. Long time MQ recover time will cause the router info sync RPC never get successful until it meets the max retry time [3]. So the bad thing happens, L3 agent is trying to remove the router now [4]. It basically shutdown all the existing L3 traffic of this router.

[1] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L705
[2] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L710
[3] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L666
[4] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L671