[RFE]"Fast exit" for compute node egress flows when using DVR

Bug #1577488 reported by Ryan Tidwell on 2016-05-02
42
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Wishlist
Swaminathan Vasudevan

Bug Description

In its current state, distributed north-south flows with DVR can only be acheived when a floating IP is bound to a fixed IP. Without a floating IP associated, the north-south flows are steered through the centralized SNAT node, even if you are directly routing the tenant network without any SNAT. When DVR is combined with either BGP or IPv6 proxy neighbor discovery, it becomes possible to route traffic directly to a fixed IP by advertising the FIP gateway port on a compute as the next-hop. For packets egressing the compute node, we need the ability to bypass re-direction of packets to the central SNAT node in cases where no floating IP is associated with a fixed IP. By enabling this data flow on egress from a compute node, it leaves the operator with the option of not running any SNAT nodes. Distributed SNAT is not a consideration as the targeted use cases involve scenarios where the operator does not want to use any SNAT.

It is important to note that the use cases this would support are use cases where the operator has no need for SNAT. In the scenarios that would be supported by this RFE, the operator intends to run a routing protocol or IPv6 proxy neighbor discovery to directly route the fixed IP's of their tenants. It is also important to note that this RFE does not specify what technology the operator would use for routing their north-south DVR flows. The intent is simply to enable operators who have the infrastructure in place to handle north-south flows in a distributed fashion for their tenants.

To enable this functionality, we have the following options:

1. The semantics surrounding the "enable_snat" flag when set to "False" on a distributed router could use some refinement. We could use this flag to enable SNAT node bypass (fast-exit). This approach has the benefit of cleaning up some semantics that seem loosley defined, and allows us to piggyback on an existing attribute without extending the model. The drawback is that this field is exposed to tenants who most likely are not aware of how their network traffic is routed by the provider network. Tenants probably don't need to be made aware that they are "fast exit" treatment through the API, and it may not make sense to place the burden on them to set this flag appropriately.

2. Add a new L3 agent mode called "dvr_fast_exit". When the L3 agent is run in this mode, all router instances hosted on an L3 agent will send egress traffic directly out through the FIP namespace and out to the gateway, completely disabling SNAT support on all routers hosted on the agent. This approach involves a simple change to skip programmming the "steal" rule that sends traffic to the SNAT node when run in this mode. This is likely the least invasive change, but also has some drawbacks in that upgrading to using this flag requires an agent restart and all agents should be run in this mode. This approach would be well suited to green-field deployments, but doesn't work well with brown-field deployments.

3. There could be a third option I haven't considered yet. It could be hashed out in a spec.

In addition to the work discussed above, we need to be able to instantiate the FIP namespace and gateway port immediately when a router gateway is created instead of waiting for the first floating IP association on a node.

Related WIP patches
- https://review.openstack.org/#/c/297468/
- https://review.openstack.org/#/c/283757/

tags: added: l3-dvr-backlog l3-ipam-dhcp rfe
summary: - "Fast exit" for compute node egress flows when using DVR
+ [RFE]"Fast exit" for compute node egress flows when using DVR
Doug Wiegley (dougwig) on 2016-05-02
Changed in neutron:
importance: Undecided → Wishlist
status: New → Confirmed
Carl Baldwin (carl-baldwin) wrote :

The use case for fast exit is real. I'd like to see this work so that we can enable DVR routers connecting external gateways to routed provider networks and taking advantage of BGP routing.

As for the implementation details...

As an operator, I would expect fast exit from my DVR routers when the address scopes on the respective internal and external networks match *and* I have some mechanism turned on to route back (like BGP host routes). As a user, I'd just like to know that my router is doing the best job possible to avoid extra hops and bottlenecks and I don't need to know all of the details.

Do we already know everything we need to know to turn on fast exit?

Changed in neutron:
status: Confirmed → Triaged
Doug Wiegley (dougwig) wrote :

FWIW, I agree. I'd go so far as to say that the use case for north-south general SNAT traffic is stronger than FIP traffic.

Carl Baldwin (carl-baldwin) wrote :

Interest in this was expressed by a couple of operators in the operator pain points session.

@Doug This is for north / south DVR for tenant networks without SNAT (straight routing). If the operator sticks a SNAT box on the gateway to the internet outside of Neutron, then we could get something like what you're proposing.

My suggestion for the RFE description is to stick to the facts rather and leave the implementation options for later, but that's a trivial point.

What I am curious is how you envision this routing mechanism to work, whether globally, or a router basis, on a tenant basis and in conjunction of other routing mechanisms.

It would be good to show with a few diagrams and perhaps some pseudo instructions how you envision the use case.

When I think about Neutron routers (either DVR or HA or legacy), I think at the self-service workflow where as a tenant I go get myself a private network, and I stick a router in front of it if from my VM I need to get 'out' either via SNAT or DNAT.

What you're suggesting is that you want neither SNAT or DNAT and yet be able to leverage your provider router for north south and your software router for east west. That's like bringing the best of both worlds (using resources that are both tenant and admin provisioned). Am I barking up at the wrong tree?

Ryan Tidwell (ryan-tidwell) wrote :

I will link to some diagrams shortly. This RFE makes it possible for tenants to be able to access VM's both fixed and floating IP. But that's all the tenant needs to know, the details of how traffic is routed only needs to be understood by the operator. If you imagine a world without NAT, this would optimize the north-south data path when you use DVR. Operators don't need SNAT when their external network and their tenant networks are in the same address scope. Focusing on SNAT and DNAT really clouds the issue. NAT of any kind should not be a requirement for operators to allow tenants to access "outside" networks. Operators should have a choice between direct routed access and SNAT/FIP. We don't give operators much of a choice when DVR is involved because all *routed* north-south traffic must currently be routed through the SNAT node when it doesn't have to be. Again, I'll link to some diagrams shortly.

Carl Baldwin (carl-baldwin) wrote :

I think there are a few misconceptions. Hopefully Ryan's diagrams will help but I'll try to set a few things straight.

This wouldn't be replacing the software router with the provider router for N/S. Both routers are in the N/S datapath in all cases. However, you're right in thinking that we are looping in the provider router and taking advantage of its capabilities.

Address scopes give us the capability to take NAT out of the picture when routing N/S. My recent contribution to the networking guide [1] (which merged!) explains how that works.

Consider this with BGP dynamic routing. It gives us the option to peer Neutron with the provider router. We could send host routes to it so that it will send southbound traffic directly to the DVR router on the correct compute host with the internal port. We left this out of scope for Mitaka (except when floating IPs are used) for two reasons. 1) The fip namespace was not constructed properly when we needed it because it was only constructed on demand to service floating ips. 2) The northbound traffic would still get redirected to the network node creating asymmetric routing paths for northbound and southbound.

This RFE is about allowing a fast exit datapath for northbound traffic that could match the host route path for southbound traffic and keep the paths symmetric. In my view, this requires some knowledge on Neutron's part of the routing provider (BGP) that is setting up the host routes.

[1] https://review.openstack.org/#/c/286294/

Ryan Tidwell (ryan-tidwell) wrote :

Below are links to diagrams illustrating the current state of north-south flows when using DVR [1], and the desired state [2]. Keep in mind both diagrams are considering the scenario where fixed IP's are routable by the operator and some sort of routing mechanism is in place to direct incoming traffic to the appropriate next-hop for the VM. This could be static routes or BGP, it doesn't matter. Simply assume something is in place to communicate next-hops to the upstream router. No SNAT needs to occur, and to simplify the discussion let's assume floating IP's are not being used. Using floating IP's in this scenario doesn't change the data flow in the desired state [2].

[1] https://www.dropbox.com/s/hnuddjf534rebs9/slow_exit_dvr.png?dl=0
[2] https://www.dropbox.com/s/372gt8abukpirha/fast_exit_dvr.png?dl=0

Ryan Tidwell (ryan-tidwell) wrote :

One more link showing the data flow if BGP (or other routing mechanism) is not made aware that the FIP gateway port can be used as the next-hop for a fixed IP. What you would get is asymmetric data flow where northbound traffic get DVR treatment, but southbound traffic still gets routed through the centralized router. Once this is in place, a quick patch can be made to neutron-dynamic-routing to get BGP announcements that use the FIP gateway as the next-hop.

https://www.dropbox.com/s/wnc97uwhkkzwtp6/fast_exit_dvr_no_bgp.png?dl=0

In the description you stated:

The semantics surrounding the "enable_snat" flag when set to "False" on a distributed router could use some refinement. We could use this flag to enable SNAT node bypass (fast-exit). This approach has the benefit of cleaning up some semantics that seem loosley defined, and allows us to piggyback on an existing attribute without extending the model. The drawback is that this field is exposed to tenants who most likely are not aware of how their network traffic is routed by the provider network.

The policy framework can be leveraged to restrict access to this flag and I believe this is already possible to allow operators to prevent tenants for seeing it.

Can you elaborate if this is the only limitation you see?

Carl Baldwin (carl-baldwin) wrote :

So, discussed this in the drivers' meeting today. What we'd like to do is have the router figure out when to do fast exit. It already knows when it is going to do SNAT (enable_snat combined with address scopes). So, it should do fast exit any time it would not do SNAT.

Carl Baldwin (carl-baldwin) wrote :

I meant to add some more before submitting. The assumption is that asymmetric routing is ok. Work in BGP can be done along side this to cause upstream to route back on the fast path.

Kevin also mentioned the possibility of putting routes to the internal ports through the external network via their fast path. This would presumably cause the central router to generate ICMP redirects when the upstream router sends traffic over the slow path. This is intriguing but I think would fall out of the scope of this RFE in to another RFE.

@Ryan: please elaborate further on what gap you think the existing L3 model has.

My last understanding is as such:

a) no api changes are required to address this use case
b) fast exit will have to be addressed with a combination of address/scope and BGP
c) changes are required in L3 to make DVR routers fast-exist capable
d) this RFE unveiled a conflict between ext-gw-modes and address-scope extensions. We'd need to document the conflict and ensure that if a user is trying to use them together, he/she is properly alerted of the conflict.

On this basis, this RFE would be granted approval, but I am not sure if the level of implementation details is necessary for a spec. Probably a devref.

Thoughts?

Ryan Tidwell (ryan-tidwell) wrote :

I think devref is sufficient. What this amounts to is building the proper forwarding chains in iptables on each compute node. We're not talking about API or model changes, so a spec would seem to be overkill for this effort.

Can you confirm that all the points I made in comment #17 are all valid?

Ryan Tidwell (ryan-tidwell) wrote :

Your comments in comment #17 are mostly accurate, the exception being point b:

a) no api changes are required

b) BGP is NOT a pre-requisite for fast-exit. With BGP you can achieve synchronous data flow directly to and from the compute host. Fast-exit still takes the network node out of the outbound data path even when BGP is not advertising host routes, so you still get some benefit without BGP.

c) changes are required in L3 agent

d) yes there is a conflict with the use of address scopes and the enable_snat attribute on routers that requires attention. However, in my opinion is not a DVR-specific issue and should not be thought of as a blocker for DVR fast-exit.

I didn't say BGP was a prerequisite, but BGP was touted as a requirement for symmetric routing and I got the impression that's what most people wanted to stick with. To this point, did you mean symmetric rather than synchronous in the comment above?

Can you elaborate on the plan to address this need with and without BGP (can start putting notes together on the devref if you like), and which path you intend to pursue, if not both?

Carl Baldwin (carl-baldwin) wrote :

We skipped this one in yesterday's Drivers' meeting because of the ongoing discussion between Ryan and Armando.

@Armando, I got the opposite impression. It seemed that consensus was leaning toward it not being that important to guarantee symmetric routing but BGP could give the operator the option to keep it symmetric. But, there still might be some thinking to do for how this might affect something like lbaas.

As for the "conflict" between address scopes and enable_snat, ... Basically matching address scopes supersede enable_snat, rendering it irrelevant. I don't see it as a conflict but a feature that just needs to be documented. But, I agree with Armando that it should be documented. We could alert the user somehow, I guess. How would that be accomplished?

Ryan Tidwell (ryan-tidwell) wrote :

Yes, I meant symmetric not synchronous :) I'm not sure how symmetric routing can be achieved without the use of BGP. As has been discussed asymmetric routing will still work and isn't the worst thing in the world. I have the RFE to add fixed IP host route announcements here https://bugs.launchpad.net/neutron/+bug/1585770. BGP announcements are gated on having https://bugs.launchpad.net/neutron/+bug/1557290 fixed. I wouldn't gate the fast-exit work on BGP, even though the BGP changes aren't too tricky.

https://bugs.launchpad.net/neutron/+bug/1557290 needs more attention if we are to have the BGP announcements. Without it, the L3 agent won't setup the FIP namespace to route properly and packets will just black-hole when they hit the FIP namespace.

My hope is to be able to pursue both fast-exit and BGP independently since they really don't have to be tied together. I'll make a push on the BGP announcements as soon as https://bugs.launchpad.net/neutron/+bug/1557290 is fixed.

Ok, I thought we have enough knowledge about this one and I thought we had agreed to allow it. Shall we push the button?

Carl Baldwin (carl-baldwin) wrote :

@Armando +1

tags: added: rfe-approved
removed: rfe
Changed in neutron:
assignee: nobody → Swaminathan Vasudevan (swaminathan-vasudevan)
status: Triaged → In Progress
Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → Swaminathan Vasudevan (swaminathan-vasudevan)
Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → Swaminathan Vasudevan (swaminathan-vasudevan)

Reviewed: https://review.openstack.org/283757
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3162846a7b42d860533df6dc32d9c553669df45b
Submitter: Jenkins
Branch: master

commit 3162846a7b42d860533df6dc32d9c553669df45b
Author: Swaminathan Vasudevan <email address hidden>
Date: Mon Feb 22 15:20:46 2016 -0800

    DVR: Create router to fip namespace connection based on gateway state

    In order to route traffic between the internal subnets and the
    external subnet that belong to the same address_scopes we need
    to create the gateway port and the fip namespace irrespective of
    the configured floatingips for the internal subnet.

    This will consume an additional IP from the external subnet on
    all nodes, but with the introduction of service_type networks,
    this will not be an issue any more.

    This patch is the first in series that creates the agent gateway
    port and the fip namespace on every node when the gateway is set
    for the router. For every router created it will connect the
    router namespace to the fip namespace.

    Partial-Bug: #1577488
    DocImpact: Document the change in behavior for fip-agent-gw create
    Change-Id: I30c4f7fc250e486fe9a71b68540e783e90a6cf15

Reviewed: https://review.openstack.org/355062
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fb2093c3655ecd15f48e841c0fc6f9ccb7697a34
Submitter: Jenkins
Branch: master

commit fb2093c3655ecd15f48e841c0fc6f9ccb7697a34
Author: Swaminathan Vasudevan <email address hidden>
Date: Fri Aug 12 11:05:46 2016 -0700

    DVR: Add forwarding routes based on address_scopes

    When we create agent gateway port on all the nodes irrespective
    of the floatingips we can basically use that agent gateway port to
    forward traffic in and out of the nodes if the address_scopes match,
    since we don't need SNAT functionality if address scopes match.

    If a gateway is configured and if it has internal ports that belong
    to the same address_scopes then no need to add the redirect rules.
    At the same we should also add a static route in the fip namespace
    for every interface that is connected to the router that belongs to
    the same address scope.

    Change-Id: Iaf6d3b38b1fb45772cf0b88706586c057ddb0230
    Closes-Bug: #1577488

Changed in neutron:
status: In Progress → Fix Released

We land a revert, so I reopen the bug.

Changed in neutron:
status: Fix Released → Confirmed

This issue was fixed in the openstack/neutron 11.0.0.0b2 development milestone.

Fix proposed to branch: master
Review: https://review.openstack.org/474007

Changed in neutron:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/474007
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=dba107be0eebcbf9cc87b01fdc3b17ad4e013ef4
Submitter: Jenkins
Branch: master

commit dba107be0eebcbf9cc87b01fdc3b17ad4e013ef4
Author: Swaminathan Vasudevan <email address hidden>
Date: Fri Aug 12 11:05:46 2016 -0700

    DVR: Add forwarding routes based on address_scopes

    When we create agent gateway port on all the nodes irrespective
    of the floatingips we can basically use that agent gateway port to
    forward traffic in and out of the nodes if the address_scopes match,
    since we don't need SNAT functionality if address scopes match.

    If a gateway is configured and if it has internal ports that belong
    to the same address_scopes then no need to add the redirect rules.
    At the same we should also add a static route in the fip namespace
    for every interface that is connected to the router that belongs to
    the same address scope.

    Change-Id: I617e2fc5a70852c6f2e925ac7244f2a205d60de4
    Closes-Bug: #1577488

Changed in neutron:
status: In Progress → Fix Released

This issue was fixed in the openstack/neutron 11.0.0.0b3 development milestone.

Akihiro Motoki (amotoki) on 2018-02-28
Changed in neutron:
milestone: none → pike-3
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers