L3 HA: multiple agents are active at the same time
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
High
|
Corey Bryant | ||
Mitaka |
Fix Released
|
High
|
Unassigned | ||
Newton |
Fix Released
|
High
|
Unassigned | ||
Ocata |
Fix Released
|
High
|
Corey Bryant | ||
Pike |
Fix Released
|
High
|
Corey Bryant | ||
Queens |
Fix Released
|
High
|
Corey Bryant | ||
neutron |
Fix Released
|
High
|
venkata anil | ||
neutron (Ubuntu) |
Fix Released
|
High
|
Corey Bryant | ||
Xenial |
Fix Released
|
High
|
Unassigned | ||
Zesty |
Won't Fix
|
High
|
Corey Bryant | ||
Artful |
Fix Released
|
High
|
Corey Bryant | ||
Bionic |
Fix Released
|
High
|
Corey Bryant |
Bug Description
OS: Xenial, Ocata from Ubuntu Cloud Archive
We have three neutron-gateway hosts, with L3 HA enabled and a min of 2, max of 3. There are approx. 400 routers defined.
At some point (we weren't monitoring exactly) a number of the routers changed from being one active, and 1+ others standby, to >1 active. This included each of the 'active' namespaces having the same IP addresses allocated, and therefore traffic problems reaching instances.
Removing the routers from all but one agent, and re-adding, resolved the issue. Restarting one l3 agent also appeared to resolve the issue, but very slowly, to the point where we needed the system alive again faster and reverted to removing/re-adding.
At the same time, a number of routers were listed without any agents active at all. This situation appears to have been resolved by adding routers to agents, after several minutes downtime.
I'm finding it very difficult to find relevant keepalived messages to indicate what's going on, but what I do notice is that all the agents have equal priority and are configured as 'backup'.
I am trying to figure out a way to get a reproducer of this, it might be that we need to have a large number of routers configured on a small number of gateways.
tags: | added: l3-ha |
Changed in neutron: | |
assignee: | nobody → venkata anil (anil-venkata) |
Changed in cloud-archive: | |
status: | New → Triaged |
importance: | Undecided → High |
assignee: | nobody → Corey Bryant (corey.bryant) |
Changed in neutron (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → High |
assignee: | nobody → Corey Bryant (corey.bryant) |
Changed in neutron (Ubuntu Artful): | |
status: | New → Triaged |
Changed in neutron (Ubuntu Zesty): | |
status: | New → Triaged |
importance: | Undecided → High |
assignee: | nobody → Corey Bryant (corey.bryant) |
Changed in neutron (Ubuntu Artful): | |
assignee: | nobody → Corey Bryant (corey.bryant) |
importance: | Undecided → High |
tags: | added: neutron-proactive-backport-potential |
tags: | removed: neutron-proactive-backport-potential |
no longer affects: | keepalived (Ubuntu Artful) |
no longer affects: | keepalived (Ubuntu Zesty) |
See https:/ /bugs.launchpad .net/neutron/ +bug/1597461 which could be related, but we're running 10.0.3- 0ubuntu1~ cloud0.
Keepalived is 1.2.19-1ubuntu0.2