In big and busy clusters there could be a condition when
rabbitmq clustering mechanism synchronizes queues and during
this period agents connected to that instance of rabbitmq
can't communicate with the server and server considers them
dead moving resources away. After agent become active again,
it needs to cleanup state entries and synchronize its state
with neutron-server.
The solution is to make agents aware of their state from
neutron-server point of view. This is done by changing state
reports from cast to call that would return agent's status.
When agent was dead and becomes alive, it would receive special
AGENT_REVIVED status indicating that it should refresh its
local data which it would not do otherwise.
Reviewed: https:/ /review. openstack. org/232661 /git.openstack. org/cgit/ openstack/ neutron/ commit/ ?id=3b6bd917e4b 968a47a5aacb7f5 90143fc83816d9
Committed: https:/
Submitter: Jenkins
Branch: master
commit 3b6bd917e4b968a 47a5aacb7f59014 3fc83816d9
Author: Eugene Nikanorov <email address hidden>
Date: Mon Oct 12 13:59:01 2015 +0400
Resync L3, DHCP and OVS/LB agents upon revival
In big and busy clusters there could be a condition when
rabbitmq clustering mechanism synchronizes queues and during
this period agents connected to that instance of rabbitmq
can't communicate with the server and server considers them
dead moving resources away. After agent become active again,
it needs to cleanup state entries and synchronize its state
with neutron-server.
The solution is to make agents aware of their state from
neutron-server point of view. This is done by changing state
reports from cast to call that would return agent's status.
When agent was dead and becomes alive, it would receive special
AGENT_REVIVED status indicating that it should refresh its
local data which it would not do otherwise.
Closes-Bug: #1505166 fbacf46e2c44e40 f27f59172a9
Change-Id: Id28248f4f75821