2018-01-18 13:21:28 |
Corey Bryant |
bug |
|
|
added bug |
2018-01-18 13:23:40 |
Corey Bryant |
description |
This is the same issue as https://bugs.launchpad.net/neutron/+bug/1731595 however that bug is 'Fix Released' and the issue is still occurring. There are a lot of details in the linked bug so I won't add them here unless it's useful. |
This is the same issue as https://bugs.launchpad.net/neutron/+bug/1731595 however that bug is 'Fix Released' and the issue is still occurring. There are a lot of details in the linked bug so I won't add too many here.
It seems as if this bug surfaces due to load issues. While the fix provided by Venkata (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to.
It seems to me that something is being pushed to it's limit, and possibly once that limit is hit, master router advertisements aren't being received, causing a new master to be elected. If this is the case it would be great to get to the bottom of what resource is getting constrained. |
|
2018-01-18 13:24:30 |
Corey Bryant |
bug task added |
|
neutron (Ubuntu) |
|
2018-01-18 13:26:29 |
Corey Bryant |
description |
This is the same issue as https://bugs.launchpad.net/neutron/+bug/1731595 however that bug is 'Fix Released' and the issue is still occurring. There are a lot of details in the linked bug so I won't add too many here.
It seems as if this bug surfaces due to load issues. While the fix provided by Venkata (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to.
It seems to me that something is being pushed to it's limit, and possibly once that limit is hit, master router advertisements aren't being received, causing a new master to be elected. If this is the case it would be great to get to the bottom of what resource is getting constrained. |
- |
|
2018-01-18 13:26:37 |
Corey Bryant |
bug task deleted |
neutron |
|
|
2018-01-18 13:26:52 |
Corey Bryant |
summary |
L3 HA: multiple agents are active at the same time |
- |
|
2018-01-18 13:26:58 |
Corey Bryant |
neutron (Ubuntu): status |
New |
Incomplete |
|
2018-01-18 13:28:47 |
Corey Bryant |
summary |
- |
L3 HA: multiple agents are active at the same time |
|
2018-01-18 13:29:40 |
Corey Bryant |
description |
- |
This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked as 'Fix Released' and the issue is still occurring.
It seems as if this bug surfaces due to load issues. While the fix provided by Venkata (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to.
It seems to me that something is being pushed to it's limit, and possibly once that limit is hit, master router advertisements aren't being received, causing a new master to be elected. If this is the case it would be great to get to the bottom of what resource is getting constrained. |
|
2018-01-18 13:29:48 |
Corey Bryant |
neutron (Ubuntu): status |
Incomplete |
Triaged |
|
2018-01-18 13:29:56 |
Corey Bryant |
neutron (Ubuntu): importance |
Undecided |
High |
|
2018-01-18 13:30:02 |
Corey Bryant |
bug task added |
|
neutron |
|
2018-01-18 13:30:14 |
Corey Bryant |
bug task added |
|
cloud-archive |
|
2018-01-18 13:30:26 |
Corey Bryant |
nominated for series |
|
cloud-archive/queens |
|
2018-01-18 13:30:26 |
Corey Bryant |
bug task added |
|
cloud-archive/queens |
|
2018-01-18 13:30:26 |
Corey Bryant |
nominated for series |
|
cloud-archive/ocata |
|
2018-01-18 13:30:26 |
Corey Bryant |
bug task added |
|
cloud-archive/ocata |
|
2018-01-18 13:30:26 |
Corey Bryant |
nominated for series |
|
cloud-archive/pike |
|
2018-01-18 13:30:26 |
Corey Bryant |
bug task added |
|
cloud-archive/pike |
|
2018-01-18 13:30:26 |
Corey Bryant |
nominated for series |
|
cloud-archive/mitaka |
|
2018-01-18 13:30:26 |
Corey Bryant |
bug task added |
|
cloud-archive/mitaka |
|
2018-01-18 13:30:26 |
Corey Bryant |
nominated for series |
|
cloud-archive/newton |
|
2018-01-18 13:30:26 |
Corey Bryant |
bug task added |
|
cloud-archive/newton |
|
2018-01-18 13:30:54 |
Corey Bryant |
nominated for series |
|
Ubuntu Xenial |
|
2018-01-18 13:30:54 |
Corey Bryant |
bug task added |
|
neutron (Ubuntu Xenial) |
|
2018-01-18 13:30:54 |
Corey Bryant |
nominated for series |
|
Ubuntu Bionic |
|
2018-01-18 13:30:54 |
Corey Bryant |
bug task added |
|
neutron (Ubuntu Bionic) |
|
2018-01-18 13:30:54 |
Corey Bryant |
nominated for series |
|
Ubuntu Artful |
|
2018-01-18 13:30:54 |
Corey Bryant |
bug task added |
|
neutron (Ubuntu Artful) |
|
2018-01-18 13:31:08 |
Corey Bryant |
cloud-archive/mitaka: importance |
Undecided |
High |
|
2018-01-18 13:31:08 |
Corey Bryant |
cloud-archive/mitaka: status |
New |
Triaged |
|
2018-01-18 13:31:18 |
Corey Bryant |
cloud-archive/newton: importance |
Undecided |
High |
|
2018-01-18 13:31:18 |
Corey Bryant |
cloud-archive/newton: status |
New |
Triaged |
|
2018-01-18 13:31:31 |
Corey Bryant |
cloud-archive/ocata: importance |
Undecided |
High |
|
2018-01-18 13:31:31 |
Corey Bryant |
cloud-archive/ocata: status |
New |
Triaged |
|
2018-01-18 13:31:41 |
Corey Bryant |
cloud-archive/pike: importance |
Undecided |
High |
|
2018-01-18 13:31:41 |
Corey Bryant |
cloud-archive/pike: status |
New |
Triaged |
|
2018-01-18 13:31:51 |
Corey Bryant |
cloud-archive/queens: importance |
Undecided |
High |
|
2018-01-18 13:31:51 |
Corey Bryant |
cloud-archive/queens: status |
New |
Triaged |
|
2018-01-18 13:32:01 |
Corey Bryant |
neutron (Ubuntu Xenial): importance |
Undecided |
High |
|
2018-01-18 13:32:01 |
Corey Bryant |
neutron (Ubuntu Xenial): status |
New |
Triaged |
|
2018-01-18 13:32:14 |
Corey Bryant |
neutron (Ubuntu Artful): importance |
Undecided |
High |
|
2018-01-18 13:32:14 |
Corey Bryant |
neutron (Ubuntu Artful): status |
New |
Triaged |
|
2018-01-18 13:33:55 |
Corey Bryant |
description |
This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked as 'Fix Released' and the issue is still occurring.
It seems as if this bug surfaces due to load issues. While the fix provided by Venkata (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to.
It seems to me that something is being pushed to it's limit, and possibly once that limit is hit, master router advertisements aren't being received, causing a new master to be elected. If this is the case it would be great to get to the bottom of what resource is getting constrained. |
This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked as 'Fix Released' and the issue is still occurring and I can't change back to 'New' so it seems best to just open a new bug.
It seems as if this bug surfaces due to load issues. While the fix provided by Venkata (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to.
It seems to me that something is being pushed to it's limit, and possibly once that limit is hit, master router advertisements aren't being received, causing a new master to be elected. If this is the case it would be great to get to the bottom of what resource is getting constrained. |
|
2018-01-18 15:57:02 |
Alvaro Uria |
tags |
|
canonical-bootstack |
|
2018-01-18 15:57:17 |
Alvaro Uria |
bug |
|
|
added subscriber The Canonical Sysadmins |
2018-01-18 15:57:24 |
Alvaro Uria |
bug |
|
|
added subscriber Legacy - Canonical WTFB |
2018-01-18 22:17:04 |
John George |
bug |
|
|
added subscriber Canonical Field Critical |
2018-01-24 22:47:50 |
John George |
removed subscriber Canonical Field Critical |
|
|
|
2018-01-24 22:49:04 |
John George |
bug |
|
|
added subscriber Canonical Field High |
2018-02-08 19:10:00 |
Ryan Beisner |
removed subscriber Canonical Field High |
|
|
|
2018-02-14 14:32:29 |
James Troup |
bug |
|
|
added subscriber Canonical Field Critical |
2018-02-14 15:10:33 |
James Troup |
bug |
|
|
added subscriber Canonical Field High |
2018-02-14 15:10:36 |
James Troup |
removed subscriber Canonical Field Critical |
|
|
|
2018-02-17 07:40:35 |
Dominique Poulain |
bug |
|
|
added subscriber Dominique Poulain |
2018-04-12 10:11:25 |
Joris S'heeren |
bug |
|
|
added subscriber Joris S'heeren |
2018-04-13 20:39:46 |
James Troup |
removed subscriber Canonical Field High |
|
|
|
2018-07-03 12:47:27 |
Corey Bryant |
bug task added |
|
keepalived (Ubuntu) |
|
2018-07-03 12:47:38 |
Corey Bryant |
bug task deleted |
keepalived (Ubuntu Artful) |
|
|
2018-07-03 12:47:54 |
Corey Bryant |
keepalived (Ubuntu): importance |
Undecided |
High |
|
2018-07-03 12:47:54 |
Corey Bryant |
keepalived (Ubuntu): status |
New |
Triaged |
|
2018-07-03 12:48:10 |
Corey Bryant |
keepalived (Ubuntu Xenial): importance |
Undecided |
High |
|
2018-07-03 12:48:10 |
Corey Bryant |
keepalived (Ubuntu Xenial): status |
New |
Triaged |
|
2018-07-03 12:48:23 |
Corey Bryant |
keepalived (Ubuntu Bionic): importance |
Undecided |
High |
|
2018-07-03 12:48:23 |
Corey Bryant |
keepalived (Ubuntu Bionic): status |
New |
Triaged |
|
2018-07-03 12:48:35 |
Corey Bryant |
bug task deleted |
cloud-archive/newton |
|
|
2018-07-03 12:48:49 |
Corey Bryant |
bug task deleted |
neutron (Ubuntu Artful) |
|
|
2018-07-03 15:40:55 |
Corey Bryant |
summary |
L3 HA: multiple agents are active at the same time |
[SRU] L3 HA: multiple agents are active at the same time |
|
2018-07-03 15:48:08 |
Corey Bryant |
description |
This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked as 'Fix Released' and the issue is still occurring and I can't change back to 'New' so it seems best to just open a new bug.
It seems as if this bug surfaces due to load issues. While the fix provided by Venkata (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to.
It seems to me that something is being pushed to it's limit, and possibly once that limit is hit, master router advertisements aren't being received, causing a new master to be elected. If this is the case it would be great to get to the bottom of what resource is getting constrained. |
[Impact]
This is the same issue reported in https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked as 'Fix Released' and the issue is still occurring and I can't change back to 'New' so it seems best to just open a new bug.
It seems as if this bug surfaces due to load issues. While the fix provided by Venkata in https://bugs.launchpad.net/neutron/+bug/1731595 (https://review.openstack.org/#/c/522641/) should help clean things up at the time of l3 agent restart, issues seem to come back later down the line in some circumstances. xavpaice mentioned he saw multiple routers active at the same time when they had 464 routers configured on 3 neutron gateway hosts using L3HA, and each router was scheduled to all 3 hosts. However, jhebden mentions that things seem stable at the 400 L3HA router mark, and it's worth noting this is the same deployment that xavpaice was referring to.
keepalived has a patch upstream in 1.4.0 that provides a fix for removing left-over addresses if keepalived aborts. That patch will be cherry-picked to Ubuntu keepalived packages.
[Test Case]
The following SRU process will be followed:
https://wiki.ubuntu.com/OpenStackUpdates
In order to avoid regression of existing consumers, the OpenStack team will run their continuous integration test against the packages that are in -proposed. A successful run of all available tests will be required before the proposed packages can be let into -updates.
The OpenStack team will be in charge of attaching the output summary of the executed tests. The OpenStack team members will not mark ‘verification-done’ until this has happened.
[Regression Potential]
The regression potential is lowered as the fix is cherry-picked without change from upstream. In order to mitigate the regression potential, the results of the aforementioned tests are attached to this bug.
[Discussion] |
|
2018-07-03 15:48:18 |
Corey Bryant |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2018-07-03 15:48:38 |
Corey Bryant |
neutron (Ubuntu): status |
Triaged |
New |
|
2018-07-03 15:48:42 |
Corey Bryant |
neutron (Ubuntu): importance |
High |
Undecided |
|
2018-07-03 15:49:04 |
Corey Bryant |
neutron (Ubuntu Xenial): importance |
High |
Undecided |
|
2018-07-03 15:49:04 |
Corey Bryant |
neutron (Ubuntu Xenial): status |
Triaged |
New |
|
2018-07-03 15:49:15 |
Corey Bryant |
neutron (Ubuntu Bionic): importance |
High |
Undecided |
|
2018-07-03 15:49:15 |
Corey Bryant |
neutron (Ubuntu Bionic): status |
Triaged |
New |
|
2018-07-03 16:28:57 |
Launchpad Janitor |
keepalived (Ubuntu): status |
Triaged |
Fix Released |
|
2018-07-09 12:04:02 |
Łukasz Zemczak |
keepalived (Ubuntu Bionic): status |
Triaged |
Fix Committed |
|
2018-07-09 12:04:09 |
Łukasz Zemczak |
bug |
|
|
added subscriber SRU Verification |
2018-07-09 12:04:17 |
Łukasz Zemczak |
tags |
canonical-bootstack |
canonical-bootstack verification-needed verification-needed-bionic |
|
2018-07-13 04:12:34 |
Xav Paice |
bug |
|
|
added subscriber Canonical Field High |
2018-07-16 13:18:25 |
Edward Hope-Morley |
tags |
canonical-bootstack verification-needed verification-needed-bionic |
canonical-bootstack sts-sru-needed verification-needed verification-needed-bionic |
|
2018-07-16 14:01:57 |
Corey Bryant |
cloud-archive/queens: status |
Triaged |
Fix Committed |
|
2018-07-16 14:02:01 |
Corey Bryant |
tags |
canonical-bootstack sts-sru-needed verification-needed verification-needed-bionic |
canonical-bootstack sts-sru-needed verification-needed verification-needed-bionic verification-queens-needed |
|
2018-07-25 11:00:56 |
Edward Hope-Morley |
tags |
canonical-bootstack sts-sru-needed verification-needed verification-needed-bionic verification-queens-needed |
canonical-bootstack sts-sru-needed verification-done-bionic verification-needed verification-queens-needed |
|
2018-07-25 16:34:57 |
Corey Bryant |
bug task deleted |
cloud-archive/pike |
|
|
2018-07-25 16:35:05 |
Corey Bryant |
bug task deleted |
cloud-archive/ocata |
|
|
2018-07-26 11:44:12 |
James Page |
neutron (Ubuntu): status |
New |
Invalid |
|
2018-07-26 11:44:27 |
James Page |
neutron (Ubuntu Xenial): status |
New |
Invalid |
|
2018-07-26 11:44:44 |
James Page |
neutron (Ubuntu Bionic): status |
New |
Invalid |
|
2018-07-30 16:05:25 |
Łukasz Zemczak |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2018-07-30 16:15:31 |
Launchpad Janitor |
keepalived (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2018-08-06 12:40:12 |
Corey Bryant |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2018-08-06 13:56:51 |
Łukasz Zemczak |
keepalived (Ubuntu Xenial): status |
Triaged |
Fix Committed |
|
2018-08-06 13:56:58 |
Łukasz Zemczak |
tags |
canonical-bootstack sts-sru-needed verification-done-bionic verification-needed verification-queens-needed |
canonical-bootstack sts-sru-needed verification-done-bionic verification-needed verification-needed-xenial verification-queens-needed |
|
2018-08-09 15:29:12 |
Edward Hope-Morley |
tags |
canonical-bootstack sts-sru-needed verification-done-bionic verification-needed verification-needed-xenial verification-queens-needed |
canonical-bootstack sts-sru-needed verification-done verification-done-bionic verification-done-xenial verification-queens-needed |
|
2018-08-10 08:26:20 |
Edward Hope-Morley |
tags |
canonical-bootstack sts-sru-needed verification-done verification-done-bionic verification-done-xenial verification-queens-needed |
canonical-bootstack sts-sru-needed verification-done verification-done-bionic verification-done-xenial verification-queens-done |
|
2018-08-15 01:06:31 |
Launchpad Janitor |
keepalived (Ubuntu Xenial): status |
Fix Committed |
Fix Released |
|
2018-08-17 13:10:01 |
Corey Bryant |
cloud-archive/mitaka: status |
Triaged |
Fix Committed |
|
2018-08-17 13:10:04 |
Corey Bryant |
tags |
canonical-bootstack sts-sru-needed verification-done verification-done-bionic verification-done-xenial verification-queens-done |
canonical-bootstack sts-sru-needed verification-done verification-done-bionic verification-done-xenial verification-mitaka-needed verification-queens-done |
|
2018-08-20 14:36:12 |
Edward Hope-Morley |
tags |
canonical-bootstack sts-sru-needed verification-done verification-done-bionic verification-done-xenial verification-mitaka-needed verification-queens-done |
canonical-bootstack sts-sru-needed verification-done verification-done-bionic verification-done-xenial verification-mitaka-done verification-queens-done |
|
2018-08-21 12:25:43 |
Corey Bryant |
cloud-archive/mitaka: status |
Fix Committed |
Fix Released |
|
2018-08-21 12:32:50 |
Corey Bryant |
cloud-archive/queens: status |
Fix Committed |
Fix Released |
|
2018-08-28 11:17:23 |
Dr. Jens Harbott |
bug |
|
|
added subscriber Dr. Jens Harbott |
2018-09-24 13:28:15 |
Edward Hope-Morley |
cloud-archive: status |
Triaged |
Fix Released |
|
2018-10-22 13:46:04 |
Edward Hope-Morley |
tags |
canonical-bootstack sts-sru-needed verification-done verification-done-bionic verification-done-xenial verification-mitaka-done verification-queens-done |
canonical-bootstack sts-sru-done verification-done verification-done-bionic verification-done-xenial verification-mitaka-done verification-queens-done |
|
2020-06-23 20:06:05 |
Corey Bryant |
neutron: status |
New |
Invalid |
|
2020-06-23 20:06:31 |
Corey Bryant |
neutron: status |
Invalid |
Incomplete |
|
2020-10-01 12:28:55 |
Michael Skalka |
removed subscriber Canonical Field High |
|
|
|