[RFE] Limit VXLAN to within Neutron availability zones

Bug #1808062 reported by Dan Sneddon on 2018-12-11
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Wishlist
Kailun Qin

Bug Description

Creating multiple Neutron availability zones allows the operator to schedule DHCP and L3 agents within a single AZ. Neutron will still try to form a VXLAN mesh between all nodes in all availability zones, which creates inter-AZ dependencies and may not work when strict firewalls are placed between AZs.

This behavior should be configurable, so that L2 may be limited to a particular AZ, and no tunnels are formed between different AZs. This will prevent Neutron from trying to form tunnels when the tunnel cannot function, and may enhance security when AZs are in different security zones.

The desired end-state configuration would have separate DHCP and L3 agents hosted in each AZ, along with tunnels formed only inside the AZ. This would allow, for instance, multiple edge sites within a single deployment that each performed local networking only. Any particular Neutron network would be limited to one AZ. A new flag would allow AZs to be truly autonomous and remove cross-AZ dependencies.

Example: Suppose to AZs, AZ1 (control plane 10.1.1.0/24) and AZ2 (control plane 172.16.2.0/24).

Here is example output from a node in AZ1. It is forming tunnels between members of both AZs. The desired configuration would have VXLAN tunnels only formed between endpoints in the same AZ.

    Bridge br-tun

        Controller "tcp:127.0.0.1:6633"

            is_connected: true

        fail_mode: secure

        Port "vxlan-1e0094c8"

            Interface "vxlan-1e0094c8"

                type: vxlan

                options: {df_default="true", in_key=flow, local_ip="10.1.1.20", out_key=flow, remote_ip="10.1.1.200"}

        Port br-tun

            Interface br-tun

                type: internal

        Port "vxlan-1e0094d6"

            Interface "vxlan-1e0094d6"

                type: vxlan

                options: {df_default="true", in_key=flow, local_ip="10.1.1.20", out_key=flow, remote_ip="172.16.2.214"}

        Port patch-int

            Interface patch-int

                type: patch

                options: {peer=patch-tun}

tags: added: rfe
Dan Sneddon (dsneddon) wrote :

Note that it appears that NSX-T has a concept called "Transport Zones" that enables the feature that is being requested here. Compute nodes within a given transport zone will only be able to communicate with compute nodes within that same transport zone. This prevents network traffic from being sent between zones. More information here:

https://docs.vmware.com/en/VMware-NSX-T-Data-Center/2.3/com.vmware.nsxt.install.doc/GUID-F47989B2-2B9D-4214-B3BA-5DDF66A1B0E6.html

NSX-T also supports Availability Zones, but it appears that those are separate from the Transport Zone functionality:

https://docs.vmware.com/en/VMware-Integrated-OpenStack/5.0/com.vmware.openstack.admin.doc/GUID-37F0E9DE-BD19-4AB0-964C-D1D12B06345C.html

It's possible that limiting tunneling traffic to a particular AZ may be outside the intended functions of Neutron AZs, but I think this is a valid use case.

Miguel Lavalle (minsel) wrote :

I think this makes sense. Let's move this to triaged stage, so it can be discussed by the Neutron drivers team

tags: added: rfe-triaged
removed: rfe
Miguel Lavalle (minsel) on 2019-01-11
Changed in neutron:
importance: Undecided → Wishlist
Miguel Lavalle (minsel) wrote :

@Dan,

We discussed this RFE during today's drivers meeting. We had a few questions:

1) In the example you show above, was L2pop enabled? Would l2pop be a solution for this?
2) Are you proposing to constrain networks to span only 1 AZ when the behavior proposed is enabled?
3) What would be the impact for the deployer in terms of networking nodes, etc?

Dan Sneddon (dsneddon) wrote :

1) In the example you show above, was L2pop enabled? Would l2pop be a solution for this?

I don't believe that l2pop would be a solution for this. The goal is to provide complete isolation for networks within an AZ, not only for resiliency but also for security. For that reason, the intent is to keep l2 from leaking from one AZ to another. If anything, I would think that l2pop for a given network would ideally be limited to a single AZ in this scenario.

2) Are you proposing to constrain networks to span only 1 AZ when the behavior proposed is enabled?

Yes, the goal is to constrain networks to a single AZ.

3) What would be the impact for the deployer in terms of networking nodes, etc?

The goal is to reduce the number of networking nodes. In the current architecture, if I want to create networks that only live within an AZ (for instance an edge site), I have to have dedicated networking nodes for that AZ. The goal would be to have centralized networking nodes that can support networks that exist only in one site.

For instance, suppose the following architecture:

1 central site with controllers and networking nodes
3 edge sites with compute and baremetal nodes (no networker nodes)

Each of the 3 edge sites will act as a Neutron AZ, with it's own DHCP and L3 agents. External connectivity will be provided by one or more provider networks in the edge sites. Internal connectivity within the edge site will be provided by VXLAN networks. Each VXLAN or provider network will only exist within an edge site AZ. Connectivity back to the central site is provided over L3 routes.

Another use case is within a single datacenter that has multiple security "zones". For security reasons, networks should only exist within a particular zone. This allows the creation of a production AZ, a staging AZ, and a DMZ AZ, with no shared networking between the 3 AZs. This can be achieved with separate networking nodes in each AZ, but it would be better if this could be done with only one set of networking nodes.

Slawek Kaplonski (slaweq) wrote :

One question to last use case which You mentioned. Will nova also somehow restrict VMs to be spawned always in same zone?
What if user will lets say have network which works only in AZ 1, there will be couple of VMs connected to this network and then user will create VM on compute node in different AZ. How neutron should then behave? Do You want to failed to bind such port or bind it properly - and then it will just not have connectivity to other VMs in same network because there will be no tunnel to it?

Dan Sneddon (dsneddon) wrote :

@slaweq, spawning VMs in the correct Neutron AZ is important, and in the use case I wanted to use this I would be using Nova Host Aggregates to separate compute nodes by Nova AZ.

I would expect that if an operator had set up Neutron AZs, but had not set up Nova AZs to match, that the Nova scheduler would attempt to bind a port to a random hypervisor, and that would fail if the network were not available to that compute node.

In any case, I think it is acceptable to require associating a Nova AZ on a host aggregate group when dividing the deployment into multiple Neutron AZs.

Miguel Lavalle (minsel) wrote :
Miguel Lavalle (minsel) wrote :

We discussed this RFE today and it is approved. We think the changes are sizable, so we are requesting a spec as the next step. We want this spec to be scoped at tunnels in general, not only VXLAN. Thanks for the submission

tags: added: rfe-approved
removed: rfe-triaged
Dan Sneddon (dsneddon) wrote :

Adding some of my comments from related bug https://bugs.launchpad.net/neutron/+bug/1808594

---

It appears to me from reading some ML2 plugin code that the list of tunnel peers is obtained via RPC. If it's possible to limit the list of tunnel peers that is sent to the ML2 plugin agent, or if we could fail to bind a port if a compute is in the wrong AZ, I think perhaps that could be done in a way that worked for multiple ML2 plugins. Someone already suggested doing filtering in the l2_pop driver, but l2_pop doesn't work in all deployment scenarios.

I can think of several ways to implement this, which can be discussed in a spec:

Method 1) A global flag for Neutron for limiting traffic within AZs. When set, compute nodes would only form tunnels with other computes in the same AZ. If it were possible to limit the list of remote compute nodes via RPC (one queue per AZ?), perhaps this could be implemented in a way that worked for multiple Neutron drivers. This wouldn't prevent binding two ports on the same network in different AZs, but the computes would only be able to pass East-West traffic within their local AZ (and to the L3 and DHCP agents for the network).

Method 2) One-way association between network and autonomous zone. A network could be assigned to one particular AZ, and would only work within that AZ. Networks that were not associated with a particular AZ would function as normal and could exist in all AZs. This would work for most use cases, but would require networks to be assigned to AZs in the DB. Perhaps binding a port would fail if the compute were not in the specified AZ.

Method 3) Many-to-many association between autonomous zone and the network. A network could be assigned to more than one autonomous zone, and a compute could only bind to that network if it were in one of the assigned AZs. This would require a network-to-AZ multi-way association in the DB and agents would need to be aware of this mapping.

For reference, I think this is where that filtering would be relevant if it were done in the openvswitch-agent:

https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1843

And I think this is where the filtering would be relevant if it were done in the l2_pop driver:

https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/l2pop/rpc_manager/l2population_rpc.py#L310

Kailun Qin (kailun.qin) on 2019-04-15
Changed in neutron:
assignee: nobody → Kailun Qin (kailun.qin)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers