L3 agent unable to update HA router state after race between HA router creating and deleting
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
LIU Yulong | ||
Kilo |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
The router L3 HA binding process does not take into account the fact that the port it is binding to the agent can be concurrently deleted.
Details:
When neutron server deleted all the resources of a
HA router, L3 agent can not aware that, so race
happened in some procedure like this:
1. Neutron server delete all resources of a HA router
2. RPC fanout to L3 agent 1 in which
the HA router was master state
3. In l3 agent 2 'backup' router set itself to masert
and notify neutron server a HA router state change notify.
4. PortNotFound rasied in updating HA router states function
(Seems the DB error was no longer existed.)
How the step 2 and 3 happens?
Consider that l3 agent 2 has much more HA routers than l3 agent 1,
or any reason that causes l3 agent 2 gets/processes the deleting
RPC later than l3 agent 1. Then l3 agent 1 remove HA router's
keepalived process will soonly be detected by backup router in
l3 agent 2 via VRRP protocol. Now the router deleting RPC is in
the queue of RouterUpdate or any step of a HA router deleting
procedure, and the router_info will still have 'the' router info.
So l3 agent 2 will do the state change procedure, AKA notify
the neutron server to update router state.
summary: |
- L3 agent unable to update HA router state race after between HA router + L3 agent unable to update HA router state after race between HA router creating and deleting |
Changed in neutron: | |
assignee: | nobody → LIU Yulong (dragon889) |
status: | New → In Progress |
tags: | added: kilo-backport-potential |
tags: | added: liberty-backport-potential |
tags: | added: l3-ha |
description: | updated |
description: | updated |
description: | updated |
Changed in neutron: | |
importance: | Undecided → Medium |
description: | updated |
tags: | removed: kilo-backport-potential |
tags: |
added: in-stable-liberty removed: liberty-backport-potential |
Can you show an example TRACE?