2015-12-04 22:14:33 |
Assaf Muller |
bug |
|
|
added bug |
2015-12-07 06:56:26 |
venkata anil |
neutron: assignee |
|
venkata anil (anil-venkata) |
|
2015-12-08 05:04:40 |
Koji Iida |
bug |
|
|
added subscriber Koji Iida |
2015-12-09 12:44:05 |
OpenStack Infra |
neutron: status |
New |
In Progress |
|
2016-01-06 13:22:16 |
Assaf Muller |
description |
L3 HA did not work with l2pop at all, and that was fixed here:
https://bugs.launchpad.net/neutron/+bug/1365476 via https://review.openstack.org/#/c/141114/.
However, the solution is sub optimal because it assumes the control plane is operational for fail over to work correctly.
Without l2pop, L3 HA can fail over successfully if the database, messaging server, neutron-server and destination L3 agent are dead. With l2pop, all four are needed. This is because for fail over to work, the destination L3 agent notices that a router has transitioned to master, and notifies neutron-server via RPC. At which point neutron-server updates all of the internal router port's 'binding:host' value to point to the target node, and l2pop code is executed in order to update the L2 agents.
Instead, I'd like fail over to rely solely on the data plane regardless if l2pop is on or off. One such solution would be something similar to patch set 9 of the patch: https://review.openstack.org/#/c/141114/9//COMMIT_MSG. The idea is to tell l2pop to treat HA router ports as replicated ports (Which they are), so that tunnel endpoints would be created against all nodes that host replicas of the router, and the destination MAC address of the port would not be learned via l2pop, but via the fallback regular MAC learning mechanism. This means that we lost some of the advantage of l2pop, but I think it is essential to correct operation of L3 HA. |
Note: This is a soft requirement for DVR + L3 HA.
L3 HA did not work with l2pop at all, and that was fixed here:
https://bugs.launchpad.net/neutron/+bug/1365476 via https://review.openstack.org/#/c/141114/.
However, the solution is sub optimal because it assumes the control plane is operational for fail over to work correctly.
Without l2pop, L3 HA can fail over successfully if the database, messaging server, neutron-server and destination L3 agent are dead. With l2pop, all four are needed. This is because for fail over to work, the destination L3 agent notices that a router has transitioned to master, and notifies neutron-server via RPC. At which point neutron-server updates all of the internal router port's 'binding:host' value to point to the target node, and l2pop code is executed in order to update the L2 agents.
Instead, I'd like fail over to rely solely on the data plane regardless if l2pop is on or off. One such solution would be something similar to patch set 9 of the patch: https://review.openstack.org/#/c/141114/9//COMMIT_MSG. The idea is to tell l2pop to treat HA router ports as replicated ports (Which they are), so that tunnel endpoints would be created against all nodes that host replicas of the router, and the destination MAC address of the port would not be learned via l2pop, but via the fallback regular MAC learning mechanism. This means that we lost some of the advantage of l2pop, but I think it is essential to correct operation of L3 HA. |
|
2016-02-21 14:13:23 |
Tomoko Inoue |
bug |
|
|
added subscriber Tomoko Inoue |
2016-06-01 14:08:27 |
venkata anil |
description |
Note: This is a soft requirement for DVR + L3 HA.
L3 HA did not work with l2pop at all, and that was fixed here:
https://bugs.launchpad.net/neutron/+bug/1365476 via https://review.openstack.org/#/c/141114/.
However, the solution is sub optimal because it assumes the control plane is operational for fail over to work correctly.
Without l2pop, L3 HA can fail over successfully if the database, messaging server, neutron-server and destination L3 agent are dead. With l2pop, all four are needed. This is because for fail over to work, the destination L3 agent notices that a router has transitioned to master, and notifies neutron-server via RPC. At which point neutron-server updates all of the internal router port's 'binding:host' value to point to the target node, and l2pop code is executed in order to update the L2 agents.
Instead, I'd like fail over to rely solely on the data plane regardless if l2pop is on or off. One such solution would be something similar to patch set 9 of the patch: https://review.openstack.org/#/c/141114/9//COMMIT_MSG. The idea is to tell l2pop to treat HA router ports as replicated ports (Which they are), so that tunnel endpoints would be created against all nodes that host replicas of the router, and the destination MAC address of the port would not be learned via l2pop, but via the fallback regular MAC learning mechanism. This means that we lost some of the advantage of l2pop, but I think it is essential to correct operation of L3 HA. |
Note: This is a soft requirement for DVR + L3 HA.
L3 HA did not work with l2pop at all, and that was fixed here:
https://bugs.launchpad.net/neutron/+bug/1365476 via https://review.openstack.org/#/c/141114/.
However, the solution is sub optimal because it assumes the control plane is operational for fail over to work correctly.
Without l2pop, L3 HA can fail over successfully if the database, messaging server, neutron-server and destination L3 agent are dead. With l2pop, all four are needed. This is because for fail over to work, the destination L3 agent notices that a router has transitioned to master, and notifies neutron-server via RPC. At which point neutron-server updates all of the internal router port's 'binding:host' value to point to the target node, and l2pop code is executed in order to update the L2 agents.
Instead, I'd like fail over to rely solely on the data plane regardless if l2pop is on or off. One such solution would be something similar to patch set 9 of the patch: https://review.openstack.org/#/c/141114/9//COMMIT_MSG. The idea is to tell l2pop to treat HA router ports as replicated ports (Which they are), so that tunnel endpoints would be created against all nodes that host replicas of the router, and the destination MAC address of the port would not be learned via l2pop, but via the fallback regular MAC learning mechanism. This means that we lost some of the advantage of l2pop, but I think it is essential to correct operation of L3 HA.
As HA ports are distributed ports like DVR, we will follow DVR approach of port binding and revert https://review.openstack.org/#/c/141114/
1) It allows multiple port bindings for distributed port and l2pop will be notified for each binding to create tunnels to that agent
2) When agents restarted, or router added to new agent( or removed from agent) l2pop will be notified and flows are updated properly.
3) We can make use of existing implementation of DVR for "port binding and l2pop", and can avoid complex db operations for finding agents of HA ports. |
|
2016-06-30 13:37:43 |
Ihar Hrachyshka |
tags |
l2-pop l3-ha |
l2-pop l3-ha neutron-proactive-backport-potential |
|
2016-07-05 18:02:50 |
Armando Migliaccio |
neutron: status |
In Progress |
Incomplete |
|
2016-07-05 18:02:53 |
Armando Migliaccio |
neutron: assignee |
venkata anil (anil-venkata) |
|
|
2016-07-09 08:22:45 |
OpenStack Infra |
neutron: status |
Incomplete |
In Progress |
|
2016-07-09 08:22:45 |
OpenStack Infra |
neutron: assignee |
|
venkata anil (anil-venkata) |
|
2016-08-22 18:34:20 |
venkata anil |
description |
Note: This is a soft requirement for DVR + L3 HA.
L3 HA did not work with l2pop at all, and that was fixed here:
https://bugs.launchpad.net/neutron/+bug/1365476 via https://review.openstack.org/#/c/141114/.
However, the solution is sub optimal because it assumes the control plane is operational for fail over to work correctly.
Without l2pop, L3 HA can fail over successfully if the database, messaging server, neutron-server and destination L3 agent are dead. With l2pop, all four are needed. This is because for fail over to work, the destination L3 agent notices that a router has transitioned to master, and notifies neutron-server via RPC. At which point neutron-server updates all of the internal router port's 'binding:host' value to point to the target node, and l2pop code is executed in order to update the L2 agents.
Instead, I'd like fail over to rely solely on the data plane regardless if l2pop is on or off. One such solution would be something similar to patch set 9 of the patch: https://review.openstack.org/#/c/141114/9//COMMIT_MSG. The idea is to tell l2pop to treat HA router ports as replicated ports (Which they are), so that tunnel endpoints would be created against all nodes that host replicas of the router, and the destination MAC address of the port would not be learned via l2pop, but via the fallback regular MAC learning mechanism. This means that we lost some of the advantage of l2pop, but I think it is essential to correct operation of L3 HA.
As HA ports are distributed ports like DVR, we will follow DVR approach of port binding and revert https://review.openstack.org/#/c/141114/
1) It allows multiple port bindings for distributed port and l2pop will be notified for each binding to create tunnels to that agent
2) When agents restarted, or router added to new agent( or removed from agent) l2pop will be notified and flows are updated properly.
3) We can make use of existing implementation of DVR for "port binding and l2pop", and can avoid complex db operations for finding agents of HA ports. |
Note: This is a soft requirement for DVR + L3 HA.
L3 HA did not work with l2pop at all, and that was fixed here:
https://bugs.launchpad.net/neutron/+bug/1365476 via https://review.openstack.org/#/c/141114/.
However, the solution is sub optimal because it assumes the control plane is operational for fail over to work correctly.
Without l2pop, L3 HA can fail over successfully if the database, messaging server, neutron-server and destination L3 agent are dead. With l2pop, all four are needed. This is because for fail over to work, the destination L3 agent notices that a router has transitioned to master, and notifies neutron-server via RPC. At which point neutron-server updates all of the internal router port's 'binding:host' value to point to the target node, and l2pop code is executed in order to update the L2 agents.
Instead, I'd like fail over to rely solely on the data plane regardless if l2pop is on or off. One such solution would be something similar to patch set 9 of the patch: https://review.openstack.org/#/c/141114/9//COMMIT_MSG. The idea is to tell l2pop to treat HA router ports as replicated ports (Which they are), so that tunnel endpoints would be created against all nodes that host replicas of the router, and the destination MAC address of the port would not be learned via l2pop, but via the fallback regular MAC learning mechanism. This means that we lost some of the advantage of l2pop, but I think it is essential to correct operation of L3 HA. |
|
2016-08-22 18:54:52 |
Assaf Muller |
neutron: milestone |
|
newton-3 |
|
2016-08-22 18:54:57 |
Assaf Muller |
neutron: importance |
Medium |
High |
|
2016-09-01 20:08:48 |
Armando Migliaccio |
neutron: milestone |
newton-3 |
newton-rc1 |
|
2016-09-08 22:28:46 |
OpenStack Infra |
neutron: assignee |
venkata anil (anil-venkata) |
Carl Baldwin (carl-baldwin) |
|
2016-09-09 10:43:10 |
OpenStack Infra |
neutron: status |
In Progress |
Fix Released |
|
2016-09-14 19:29:58 |
Swaminathan Vasudevan |
tags |
l2-pop l3-ha neutron-proactive-backport-potential |
l2-pop l3-ha mitaka-backport-potential neutron-proactive-backport-potential |
|
2016-10-07 15:37:22 |
Ihar Hrachyshka |
tags |
l2-pop l3-ha mitaka-backport-potential neutron-proactive-backport-potential |
l2-pop l3-ha mitaka-backport-potential |
|
2016-11-14 18:01:43 |
OpenStack Infra |
tags |
l2-pop l3-ha mitaka-backport-potential |
in-stable-mitaka l2-pop l3-ha mitaka-backport-potential |
|