Add a configuration-enabled periodic check to examine the
status of all L3 agents with routers scheduled to them and
admin_state_up set to True. If the agent is dead, the router
will be rescheduled to an alive agent.
Neutron considers and agent 'dead' when the server doesn't
receive any heartbeat messages from the agent over the
RPC channel within a given number of seconds (agent_down_time).
There are various false positive scenarios where the agent may
fail to report even though the node is still forwarding traffic.
This is configuration driven because a dead L3 agent with active
namespaces forwarding traffic and responding to ARP requests may
cause issues. If the network backend does not block the dead
agent's node from using the router's IP addresses, there will be
a conflict between the old and new namespace.
This conflict should not break east-west traffic because both
namespaces will be attached to the appropriate networks and
either can forward the traffic without state. However, traffic
being overloaded onto the router's external network interface
IP in north-south traffic will be impacted because the matching
translation for port address translation will only be present
on one router. Additionally, floating IPs associated to ports
after the rescheduling will not work traversing the old
namespace because the mapping will not be present.
Reviewed: https:/ /review. openstack. org/110893 /git.openstack. org/cgit/ openstack/ neutron/ commit/ ?id=9677cf87cb8 31cfd344bc63cbe e23be52392c300
Committed: https:/
Submitter: Jenkins
Branch: master
commit 9677cf87cb831cf d344bc63cbee23b e52392c300
Author: Kevin Benton <email address hidden>
Date: Wed Jul 30 15:49:59 2014 -0700
Option to remove routers from dead l3 agents
Add a configuration- enabled periodic check to examine the
status of all L3 agents with routers scheduled to them and
admin_state_up set to True. If the agent is dead, the router
will be rescheduled to an alive agent.
Neutron considers and agent 'dead' when the server doesn't
receive any heartbeat messages from the agent over the
RPC channel within a given number of seconds (agent_down_time).
There are various false positive scenarios where the agent may
fail to report even though the node is still forwarding traffic.
This is configuration driven because a dead L3 agent with active
namespaces forwarding traffic and responding to ARP requests may
cause issues. If the network backend does not block the dead
agent's node from using the router's IP addresses, there will be
a conflict between the old and new namespace.
This conflict should not break east-west traffic because both
namespaces will be attached to the appropriate networks and
either can forward the traffic without state. However, traffic
being overloaded onto the router's external network interface
IP in north-south traffic will be impacted because the matching
translation for port address translation will only be present
on one router. Additionally, floating IPs associated to ports
after the rescheduling will not work traversing the old
namespace because the mapping will not be present.
DocImpact
Partial-Bug: #1174591 dd46b7616c09693 19afc0bb589
Change-Id: Id7d487f54ca54f