[RFE] DVR support for Allowed_address_pair port that are bound to multiple ACTIVE VM ports

Bug #1583694 reported by Swaminathan Vasudevan on 2016-05-19
100
This bug affects 21 people
Affects Status Importance Assigned to Milestone
neutron
Wishlist
Swaminathan Vasudevan

Bug Description

DVR support for Allowed_address_pair ports with FloatingIP that are unbound and assgined to Multiple VMs that are active.

Problem Statement:

When FloatingIP is asssigned to Allowed_address_pair port and assigned to multiple VMs that are ACTIVE and connected to DVR (Distributed Virtual Router) routers, the FloatingIP is not functional.
The use case here is to provide redundancy to the VMs that are serviced by the DVR routers.
This feature works good for Legacy Routers ( Centralized Routers).

Theory:
Distributed Virtual Routers were designed for scalability and performance and to reduce the load on the single network node.

Distributed Virtual Routers are created on each Compute node dynamically on demand and removed when not required. Distributed Virtual Routers heavily depend on the port binding to identify the requirement of a DVR service on a particular node.

Today we only create/update/delete floatingip based on the router and the host in which the floatingip service is required. So the 'host' part is very critical for the operation of the DVR.

In the above mentioned use case, we are dealing with Allowed_address_pair port, which is unbound to any specific host and are also assigned to multiple VMs that are ACTIVE at the same time.

We have a work around today to inherit the parent VMs port binding properties for the allowed_address_pair port if the parent VMs port is ACTIVE. This has a limitation, that we assume that there would be only one "ACTIVE" VM port with the allowed_address_pair port for this to work.

The reason for this is, if we have multiple "ACTIVE" VM port associated with the same allowed_address_pair port, and if the allowed_address_pair port has a FloatingIP associated with it, we can't provide the FloatingIP service on all the nodes were the VM's port is ACTIVE. This would create an issue because we will be seeing the same FloatingIP being advertised(GARP) from all nodes, and so the users on the external network will get confused on where the actual "ACTIVE" port is.

Why is it working with Legacy Routers:

In the case of legacy routers, the routers are always located a the network node and the DNAT is also done at the router_namespace in the Network node. They don't depend on the host-binding, since all the traffic have to flow through the centralized router in the network node. Also in the case of centralized routers, there is not issue of Floatingip GARP, since it is always going to be coming in through a single node.

So in the background, the allowed_address_pair port MAC is being dynamically switched from one VM to another VM by the keepalived that runs in the VM. So neutron does not need to know about any of those and it works as expected.

Why it is not working with DVR Routers:
1. Allowed_address_pair does not have host-binding.
2. If we were to inherit from the VMs host-binding, there are multiple VMs that are ACTIVE, so we can't have a single host-binding for these allowed_address_pair ports.
3. Even if we ignore the port_binding on the allowed_address_pair port and try to start providing the plumbing for the FloatingIP on multiple nodes based on the VMs it is assoicated with, there are issues with the same FloatingIP being GARP from different compute nodes that would confuse.

How we can make it to work with DVR:

Option 1:
Neutron should have a some visibility on the state of the VM port, when the switch between ACTIVE and STANDBY happens. Today it is done by the keepalived on the VM and so it is not being logged anywhere.
If the keepalived can log the event in neutron port, then it can be used by the neutron to determine when to allow FloatingIP traffic and block FloatingIP traffic for a particular node, and then send the GARP from the respective node. There is some delay introduced in this as well.

(Desired) Low-hanging fruit.

Option 2:

Option 2 basically negates the Distributed nature of DVR and makes it centralized for North-South.
The other option is to have the FloatingIP functionality centralized for such features. But this would be more complex, since we need to introduce config options for agents and floatingip. Also in this case, we can't have both the local floatingip and centralized floatingip support for the same node. A compute node can only have either localized floatingip or centralized floatingip.

Complex ( Negates the purpose of DVR)

References:
Some references to the patches that we have already to support a single use case for the Allowed_address_pair with FloatingIP in DVR.

https://review.openstack.org/#/c/254439/
https://review.openstack.org/#/c/301410/
https://review.openstack.org/#/c/304905/

tags: added: rfe
Changed in neutron:
importance: Undecided → Wishlist
status: New → Confirmed
Michael Johnson (johnsom) wrote :

My initial comment is that this breaks anyone using systems like VRRP with DVR enabled, not just Octavia

Michael Johnson (johnsom) wrote :

Also, to clarify, VRRP does not "move" the allowed address pairs MAC. The MACs on the allowed address pair ports stay the same.

What happens in a VRRP fail over is the MAC that is advertised as owning the IP address in the allowed address pair is changed by issuing GARPs from the new master and only responding to ARPs for the IP from the current master. No change is made to the neutron ports to accomplish a fail over.

Making changes to the neutron port would be too slow and defeat the purpose of using a protocol such as VRRP.

Michael Johnson (johnsom) wrote :

Shouldn't DVR be honoring GARP/ARP for the IP addresses? If the host answering the ARP for the IP changes, shouldn't DVR update it's tables?

I think this is option 3 and matches how physical networking behaves.

Yes DVR router_namespace will update the ARP table if a GARP is being issued for the Allowed_address_pair IP.
But the issue here is, if we have floatingIP created on all nodes, the GARP for the floatingIP will go from all nodes, we need a way to send FloatingIP GARP only on a particular node, where the IP is active.

How do we trigger it???

summary: [RFE] DVR support for Allowed_address_pair port that are bound to
- multiple ACTIVE VM ports used by Octavia
+ multiple ACTIVE VM ports

I used Octavia as an example, but let us remove it from the RFE and keep it generic.

Kevin Benton (kevinbenton) wrote :

@Michael, the difference is that DVR needs logic to move which compute node the floating IP is realized on. If we just adjust the DVR logic to follow ARP to find the other side of floating IP, it will work enough to get traffic flowing, but it will be hair-pinning through whichever compute node the floating IP was originally scheduled to. This is similar to option 2 except instead of being on centralized router, it would be on one of the compute node routers.

Michael Johnson (johnsom) wrote :

@kevinbenton I agree, there needs to a mechanism to move the FIP as opposed to hair pinning through the first host.

Another option would be have the FIP on all of the nodes with the allowed address pairs and set it to only ARP the FIP from the currently active compute node, much like VRRP is doing with the allowed address pairs IP.

Under the move scenario it gets a little messy as you don't want to reject traffic during the move.

Just trying to throw ideas out.

Yes identifying the active compute node is the key from the neutron's perspective.

Samuel Bercovici (samuelb) wrote :

In addition to VRRP. VIP addresses may be moving from VM to VM by using the vNIC macs. In this case the active VM will ARP/GARP the VIP as assigned to it MAC. In other words, when the VIP moves the MAC may change.

Yes, the router namespace in the compute host will be able to capture the GARP message from the vNIC and will update the ARP table.
But the problem here is the GARP for the FloatingIP from the floatingip namespace. Since we have floatingip configured from multiple host.

Either we need to have another keepalived running on floatingIP namespace and somehow able to monitor the VM Mac change and then try to send a GARP message to the external network that the floatingip will be operational from the new node.

After analyzing both the options, if HA or VRRP applications are not accepting the delaying in migrating from on FIP namespace to another namespace, then we should probably go back to the centralized model for the ports that are unbound.

So here is my proposal. We come up with a "neutron.conf" configuration option of "dvr_unbound_port_fip_use_cvr=True". If this option is configured, this will override any default behavior of the L3 agent and neutron server behavior for the DVR routers and will try to configure the Floatingip's for these unbound ports within the 'SNAT_namespace'. So those VMs that are associated with the unbound allowed_address_pair port or any other port that is unbound will have to send the traffic from their node to the network_node and the floatingip translation will happen within the snat_namespace similar to the CVR.

Any GARP update messages that are sent out from the VMs through the keepalived will reach the SNAT namespace and the traffic will get forwarded to the respective router interfaces.

The disadvantage of this is feature is we loose the distributed nature of the floatingip and make the agent job more complex.

Please let me know your thoughts.

Kevin Fox (kevpn) wrote :

So, we have vip's and fips... the vip is only on one vm at a time, managed by keepalived. the fip is a neutron construct, assumed to be on a single host too. It can't be created on more then one host or problems exist. but the vip can only be on one too or problems exist.

We need some way to track the vip and move the fip to match. If they are not on the same host, it should be ok for a little while? This would let stateful connections to continue to operate properly?

So how about this.... keepalived in the vm sends out a garp whenever it moves the vip. The hypervisor running that port can intercept the garp. If it sees one, it can then ask neutron for the fip associated with the vip to be rebound to itself. Then all the normal fip movement code kicks in.

Regarding #11, I'd shy away from the config option, as I don't see why one would make this configurable, and globally too. That said I am not really a fan of the proposal. Tieing the FIP back to the network node for N/S traffic seems a setback. As I understand this is Option 2 in the bug description, right?

Changed in neutron:
status: Confirmed → Triaged

The option at this point is pretty much open to option1 or option2.
As Kevin mentioned, if the users of the allowed_address_pair port is not able to update the neutron and inform what is the ACTIVE VM, then we should find a way to identify the ACTIVE VM port by either intercepting the GARP message sent by the Keepalived and then query the port based on the mac in the GARP and find out the host binding from that.

Then based on the hostbinding, we can trigger the GARP to be sent from the fip namespace for the migrating host.
But at this point it seems that it might not be as fast as what is seen today from a cvr by just toggeling the allowed_address_pair MAC.

As kevin mentioned, option 2, might be a temporary solution, for hybrid cases that involves the unbound port.

Here are the list of items that we need to figure out.
1. Flag the ip to be monitored for GARP on all the nodes.
2. When GARP kicks in, then either from the host, or from the router_namespace were the ARP entry changes for that ip, we should be able to send a message back to the neutron or the fip-namespace to trigger the GARP for the fipnamespace for that particular IP.
3. This again involves, creating the floatingip on all the nodes were the VM is hosted and just trigger the GARP on floating for the node where the VM is active.

Kevin I need some advice from you on the GARP intercept - were and how to proceed if this is what is the optimal solution.

tags: removed: neutron

Fix proposed to branch: master
Review: https://review.openstack.org/323618

Changed in neutron:
assignee: nobody → Swaminathan Vasudevan (swaminathan-vasudevan)
status: Triaged → In Progress
Changed in neutron:
status: In Progress → Confirmed

Let me see how I can get rid of the 'config_option'.

Changed in neutron:
status: Confirmed → In Progress

Based on the feedback I have removed the config option.

Changed in neutron:
status: In Progress → Confirmed
Changed in neutron:
status: Confirmed → Triaged

Can we have a summary of this approach? I gave a glance at the patches and I am having a hard time parsing them

https://review.openstack.org/#/c/320669/
https://review.openstack.org/#/c/323618/

To summarize this, what we are proposing is, if port-binding for a dvr service port (LOADBALANCER) does not exist and if a floatingip is associated with that port, then we are starting the floatingip service on the network node's snat_namespace.

So any floatingip traffic from the high available VM can flow from the compute node to the network node and vice versa.
This is based on Option 2 mentioned above.

So allegedly option 2 should bring the best of both worlds: for N/S (both DNAT and SNAT) a VM would go out from the Network node, for E/W, the VM would use the local compute node. Am I correct?

Now let me tell you what my fear is: I fear that this added complexity and lack of extensive testing will lead us to bug reports from folks who experience weird traffic errors and we'll keep piling up with code to handle corner cases etc. Result: DVR quality regression.

It's not like we have not seen this type of situation in the past, and I am worried we're going down the rabbit hole. As it stands, the proposed code already does not look pretty at all, so I am slightly towards a rejection, unless someone can prove me that my fears are totally unfounded.

Yes you are right, for East/West it would use the compute node and for North South it would use the Network Node.

At present there is no other option for a quick failover between instances. Even if we go down the route of of the traditional DVR N/S it would have a time window to failover between the VM instances.

Brian Haley was also involved in the discussion that we had with Octavia team on their requirement to work with DVR. So he can fill in the details, if I am missing something.

Ok, unless someone can convince me otherwise, I am a soft reject. I know it's not great for who is after this use case, but the 'best of both worlds' to me is a hybrid I am not comfortable with. I could potentially see this use case solved by means of a slightly modified network topology where an extra hop is introduced to offload the network node for distributed E/W traffic.

We need a few more iterations to figure out how to nail down this use case.

I think we can find a better solution, more brainstorming needed.

I took the time to look into this issue and I want to make sure I fully understand the context here.

To recap: report for bug 1445255 was reported to complain about lack of DVR support for VRRP use cases as described in blog post [1]. This was fixed and backported. However, this bug report was also filed in relation to a use case for LBaaS/Octavia.

It is my understanding that the fix for bug 1445255 is only effective if the VRRP failover is triggered by manually marking DOWN the admin status of the port to which the faulty Octavia instance is associated with, and thus it is totally ineffective otherwise. Involving the Neutron API for failover is clearly not acceptable. However, it is my understanding that using FIP/VRRP in conjunction with centralized routing, no manual failover is necessary; in other words, a keepalived instance can go down and the cluster will adjust itself seamlessly irrespective of the Neutron port status.

I hope someone can confirm that's the current state of affairs.

Now, I may argue that fix for bug 1445255 is in itself inadequate, and we should not treat this report as RFE per se. Having said that, if the user's intention is to use a Floating IP (FIP) with multiple ports at any given time by leveraging allowed address pairs as a binding mechanism between the FIP and ports via an unbound (utility) port that carries the keepalived VIP, then the distributed nature of the routing function supporting the FIP has unveiled a clear limitation of the very model employed to achieve this use case. Perhaps this RFE calls for figuring out new ways to achieve this use case in a way that the relationship between the FIP and ports involved is more clearly defined and would make the centralization of the FIP function for DVR less of an issue, especially if case the FIP ends up being reused for other use cases across its life cycle.

Thoughts?

[1] http://superuser.openstack.org/articles/implementing-high-availability-instances-with-neutron-using-vrrp

Michael Johnson (johnsom) wrote :

+1
Yes Armando, that is my understanding of the current situation. I think we really need to focus on a solution for anyone using allowed address pairs, distributed virtual router, and floating IPs. Targeting only LBaaS will just lead to future bug reports.

After last week, I had a chance to talk with Carl about this in more details. He may provide his own perspective on the problem.

I am wondering whether a new resource, let's call it redundant FIP for lack of better words, may come handy here to lock down this construct to the centralized SNAT and help simplify the changes required to the DVR control plane. By nature, A FIP for DVR may move along with the VM, whereas in this case I think we pretty much want the FIP to stay put despite how many VMs use it and how/if they move.

Spec is probably the format that would allow us to reason on design options.

Silence: is there a hint for moving it to the backburner?

Armando yes I can write up the spec for this feature. But can you confirm if it should be option1 based or option2 based.

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/323618
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/320669
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

tags: removed: lbaas
Changed in neutron:
status: Triaged → In Progress

Change abandoned by Swaminathan Vasudevan (<email address hidden>) on branch: master
Review: https://review.openstack.org/320669
Reason: I have new version of this patch and let us take it from there.

Change abandoned by Swaminathan Vasudevan (<email address hidden>) on branch: master
Review: https://review.openstack.org/323618
Reason: A new patch have been uploaded and so we can abandon this.

Fix proposed to branch: master
Review: https://review.openstack.org/448084

Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Oleg Bondarev (obondarev)

Change abandoned by Oleg Bondarev (<email address hidden>) on branch: master
Review: https://review.openstack.org/448084
Reason: This violates the case when a dvr_snat agent is running on a compute node. It also breaks assumption in code that dvr_edge_router is an extension of dvr_local_router

Changed in neutron:
assignee: Oleg Bondarev (obondarev) → Swaminathan Vasudevan (swaminathan-vasudevan)

Change abandoned by Swaminathan Vasudevan (<email address hidden>) on branch: master
Review: https://review.openstack.org/437970
Reason: User this for further review

https://review.openstack.org/#/c/466434/

Leandro Reox (leandro-reox) wrote :

Hi guys any progress on this bug ? it breaks many VRRP scenarios for US including OCTAVIA

Download full text (5.5 KiB)

Hi Leandro,Yes we have couple of patch for review and work is in progress.
Thanks.
 See you, Have a nice day, Bye Swaminathan Vasudevan

      From: Leandro Reox <email address hidden>
 To: <email address hidden>
 Sent: Tuesday, May 30, 2017 10:46 AM
 Subject: [Bug 1583694] Re: [RFE] DVR support for Allowed_address_pair port that are bound to multiple ACTIVE VM ports

Hi guys any progress on this bug ? it breaks many VRRP scenarios for US
including OCTAVIA

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1583694

Title:
  [RFE] DVR support for Allowed_address_pair port that are bound to
  multiple ACTIVE VM ports

Status in neutron:
  In Progress

Bug description:
  DVR support for Allowed_address_pair ports with FloatingIP that are
  unbound and assgined to Multiple VMs that are active.

  Problem Statement:

  When FloatingIP is asssigned to Allowed_address_pair port and assigned to multiple VMs that are ACTIVE and connected to DVR (Distributed Virtual Router) routers, the FloatingIP is not functional.
  The use case here is to provide redundancy to the VMs that are serviced by the DVR routers.
  This feature works good for Legacy Routers ( Centralized Routers).

  Theory:
  Distributed Virtual Routers were designed for scalability and performance and to reduce the load on the single network node.

  Distributed Virtual Routers are created on each Compute node
  dynamically on demand and removed when not required. Distributed
  Virtual Routers heavily depend on the port binding to identify the
  requirement of a DVR service on a particular node.

  Today we only create/update/delete floatingip based on the router and
  the host in which the floatingip service is required. So the 'host'
  part is very critical for the operation of the DVR.

  In the above mentioned use case, we are dealing with
  Allowed_address_pair port, which is unbound to any specific host and
  are also assigned to multiple VMs that are ACTIVE at the same time.

  We have a work around today to inherit the parent VMs port binding
  properties for the allowed_address_pair port if the parent VMs port is
  ACTIVE. This has a limitation, that we assume that there would be only
  one "ACTIVE" VM port with the allowed_address_pair port for this to
  work.

  The reason for this is, if we have multiple "ACTIVE" VM port
  associated with the same allowed_address_pair port, and if the
  allowed_address_pair port has a FloatingIP associated with it, we
  can't provide the FloatingIP service on all the nodes were the VM's
  port is ACTIVE. This would create an issue because we will be seeing
  the same FloatingIP being advertised(GARP) from all nodes, and so the
  users on the external network will get confused on where the actual
  "ACTIVE" port is.

  Why is it working with Legacy Routers:

  In the case of legacy routers, the routers are always located a the
  network node and the DNAT is also done at the router_namespace in the
  Network node. They don't depend on the host-binding, since all the
  traffic have to flow through the centralized router in the network
  node. Also in the case of centrali...

Read more...

Miguel Lavalle (minsel) on 2017-07-13
Changed in neutron:
milestone: none → pike-3

Reviewed: https://review.openstack.org/466434
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cced31c6b9cef33022a2cfb97d5aac6d02e75cb8
Submitter: Jenkins
Branch: master

commit cced31c6b9cef33022a2cfb97d5aac6d02e75cb8
Author: Swaminathan Vasudevan <email address hidden>
Date: Tue May 24 14:03:39 2016 -0700

    DVR: Server side patch to schedule an unbound port with Floating IP

    Unbound ports that are associated with a Floating IP and connected to
    DVR Routers will not be serviced by the DVR Routers, unless we bind it
    to a valid host.

    This server side patch allows the neutron server to schedule the
    unbound port Floating IP on the network node or the node with dvr_snat
    agent where the SNAT functionality resides.

    The DNAT rules for the unbound ports will be configured in the SNAT
    namespace on the network node.

    Related-Bug: #1583694
    Change-Id: I05d0bfb3fa275b1e4e479928000cf8494da858f6

Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → Swaminathan Vasudevan (swaminathan-vasudevan)
Kevin Benton (kevinbenton) wrote :
Changed in neutron:
milestone: pike-3 → pike-rc1
Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Akihiro Motoki (amotoki)
Akihiro Motoki (amotoki) on 2017-08-05
Changed in neutron:
assignee: Akihiro Motoki (amotoki) → Swaminathan Vasudevan (swaminathan-vasudevan)
Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → Swaminathan Vasudevan (swaminathan-vasudevan)

Reviewed: https://review.openstack.org/437986
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8b4bb9c0b057da175f2d773f8257de3e571aed4e
Submitter: Jenkins
Branch: master

commit 8b4bb9c0b057da175f2d773f8257de3e571aed4e
Author: Swaminathan Vasudevan <email address hidden>
Date: Tue May 31 15:21:37 2016 -0700

    DVR: Configure centralized floatingips to snat_namespace.

    This patch is the agent side patch that takes care of configuring
    the centralized floatingips for the unbound ports in the snat_namespace.

    Change-Id: I595ce4d6520adfd57bacbdf20ed03ffefd0b190a
    Closes-Bug: #1583694

Changed in neutron:
status: In Progress → Fix Released

This issue was fixed in the openstack/neutron 11.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers