neutron

[RFE] Add segment support to Neutron

Bug #1458890 reported by Kyle Mestery on 2015-05-26

120

This bug affects 16 people

Affects		Status	Importance	Assigned to	Milestone
	neutron	Fix Released	Wishlist	Carl Baldwin

Bug Description

This is feedback from the Vancouver OpenStack Summit.

During the large deployment team (Go Daddy, Yahoo!, NeCTAR, CERN, Rackspace, HP, BlueBox, among others) meeting, there was a discussion of network architectures that we use to deliver Openstack. As we talked it became clear that there are a number of challenges around networking.

In many cases, our data center networks are architected with a differientation between layer 2 and layer 3. Said another way, there are distinct network "segments" which are only available to a subset of compute hosts. These topolgies are typically necessary to manage network resource capacity (IP addresses, broadcast domain size, ARP tables, etc.) Network topologies like these are not possible to describe with Neutron constructs today.

The traditional solution to this is tunneling and overlay networks which makes all networks available everywhere in the data center. However, overlay networks represent a large increase in complexity that can be very difficult to troubleshoot. For this reason, many large deployers are not using overlay networks at all (or only for specific use cases like private tenant networks.)

Beacuse Neutron does not have constructs that accurately describe our network architectures, we'd like to see the notion of a network "segment" in Neutron. A "segment" could mean a L2 domain, IP block boundary, or other partition. Operators could use this new construct to build accurate models of network topology within Neutron, making it much more usable.

Example: The typical use case is L2 segments that are restrained to a single rack (or some subnet of compute hosts), but are still part of a larger L3 network. In this case, the overall Neutron network would describe the L3 network, and the network segments would be used to describe the L2 segments.

WIth the network segment construct (which are not intended to be exposed to end users ), there is also a need for some scheduling logic around placement and addressing of instances on an appropriate network segment based on availablity and capacity. This also implies a means via API to report IP capacity of networks and segments, so we can filter out segments without capacity and the compute nodes that are tied to those segments.

Example: The end user chooses the Neutron network for their instance, which is actually comprised of several lower level network segments within Neutron. Scheduling must be done such that the network segment chosen for the instance is available to the compute node on which the instance is placed. Additionally, the network segment that's chosen must have available IP capacity in order for the instance to be placed there.

Also, the scheduling for resize, migrate, ... should only consider the compute nodes allowed in the "network segment" where the VM is placed.

https://etherpad.openstack.org/p/Network_Segmentation_Usecases

See original description

Tags:

Kyle Mestery (mestery) on 2015-05-26

tags:

added: rfe

Revision history for this message

Carl Baldwin (carl-baldwin) wrote on 2015-05-26:

My request to add routing networks [1] is basically this idea applied to external networks and with a somewhat more limited scope. It would allow an external network to sit on top of multiple backing networks or segments which are not visible to normal tenants. My blueprint has the drawback that VMs would not work connected directly to such a network. Only neutron routers would connect. It also requires dynamic routing to maintain floating IP mobility across these "segments."

We should have a discussion soon about these two requests and how they relate. Maybe one could be built on the other.

[1] https://bugs.launchpad.net/neutron/+bug/1453906

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2015-05-27:

Why isn't this a blueprint?

Changed in neutron:
status:	New → Opinion

Revision history for this message

Kyle Mestery (mestery) wrote on 2015-05-27:

Per the new specs process [1], this is filed as a feature request. The requestors (large ops deployers) have noted this as a feature they want to have, but they don't have the manpower to develop it themselves. Thus, it's an RFE for now.

[1] https://review.openstack.org/177342

Changed in neutron:
status:	Opinion → Confirmed

Revision history for this message

Mike Dorman (mdorman-m) wrote on 2015-05-27:

@carl-baldwin, read through that other spec and I think I am at about 80% comprehension. :) Do you have any drawings or anything to share that could help illustrate.

I agree it makes sense to discuss this among the large deployments team. Next LDT meeting is 6/18 1600 UTC, but that's a few weeks away. Instead we could probably get a decent LDT contingent on the L3 subteam meeting some week. I'm happy to work on organizing people if we can agree on a day/time. Maybe your meeting on 6/4? Just a suggestion.

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2015-05-28:

Guidelines, details on how to write RFE's, and the process for handling features if you have already submitted specs in the past, but are yet to be complete. can be found here:

https://github.com/openstack/neutron/blob/master/doc/source/policies/blueprints.rst

For more details, please reach out on #openstack-neutron or openstack-dev ML [neutron].

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-06-02:

I'm interested in this. At the Neutron API level, it seems like there could be some commonality here with:

- my own project's requirement, for specifying a network that only provides L3 connectivity between VMs

- similar reqts that I heard about at Vancouver - apparently from Cisco/Huawei/ODP/Ed Warnicke - but have not yet been able to track down in detail.

In summary, whereas in the Neutron API today, a Network is a concept that provides uniform L2 (broadcast) and L3 connectivity, it appears that we could generalize this to a concept with uniform L3 connectivity, within which some areas are L2 segments.

- The back-compatible case would be where the L2 segment area is the same as the whole network.

- The L3-only case (e.g. my project) would be where there are no L2 segments defined.

- The LDT case would be intermediate between these two extremes - with L2 segments defined for some parts of the containing L3 network.

Does that make sense at all? If it does, I think the first step is to propose that API enhancement in detail, and I would be happy to take the lead on doing that.

Revision history for this message

Ed Warnicke (hagbard-0) wrote on 2015-06-02:

This sounds remarkably similar to an issue we ran into in OpenDaylight with the Forwarding Model for Policy.

In our case (which may differ from what you are discussion here) we found that endpoints (think ports) always need to live in *some* kind of 'network context'. That 'network context' can be:

- A Flood Domain
- A Bridge Domain
- An L3-Domain (think VRF)

A neutron network is basically defined as a 'bridge domain' out of the box. It sounds like you are saying that you also have L3-Domains you'd like to be able to model as well.

Does this sound similar to what you are asking for here?

Revision history for this message

Andy Hill (hillad) wrote on 2015-06-04:

The way we've accomplished this at Rackspace is with the quark plugin[1] and modifications to Nova's neutron calls. Segments are only used for provider networks that instances receive by default on boot. Segments have a 1:1 relationship with Nova's cells construct.

At a high level:

- Subnets in Neutron/Quark have a segment ID attribute
- nova-compute is aware of its segment ID
- When Nova requests ports for provider networks, the segment ID is passed in the request

[1] https://github.com/rackerlabs/quark

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-06-04:

@Ed - Yes, I think that what you describe does sound similar to what I'm interested in.

@Andy - Sounds interesting, but don't you think that it needs to be expressed explicitly on the Neutron API, if a 'Neutron network' does not have uniform L2 connectivity between all ports on that network? For example, if a 'Neutron network' is partitioned into three L2 segments, something needs to change about how DHCP agents+servers are positioned and scheduled, such that there is a (or at least one) DHCP agent+server in each segment.

Or, in the case that I'm interested in, where connectivity is L3 and there is effectively no L2 broadcast between VMs, DHCP agents+servers need to be run in a different way (on each compute host) so as to provide a DHCP service to unbridged TAP interfaces.

Thanks - Neil

Revision history for this message

Mike Dorman (mdorman-m) wrote on 2015-06-04:

#10

@Ed I agree as well, I think this is more or less the model most of us are looking for.

Go Daddy has a similar setup as Rackspace, although our network segments are scoped to a host aggregate (for us it boils down to a DC rack), rather than a cell. But same idea. We are not using Quark, but have local Neutron and Nova patches to accomplish the same thing.

Revision history for this message

Ed Warnicke (hagbard-0) wrote on 2015-06-04:

#11

@Andy @Mike

It sounds like you guys are avoiding L2 constructs entirely, just L3 from the edge with some kind of 'aggregate' of hosts that doesn't necessarily correspond to an L2 or L3?

Do I understand you correctly?

Revision history for this message

Mike Dorman (mdorman-m) wrote on 2015-06-04:

#12

More or less. For us (Go Daddy), the L2 boundary is that host aggregate/rack level. We call them "pods", which are defined as all the compute hosts tied to a single top-of-rack access switch pair. L2 stops at the access layer and everything beyond that is L3 only.

Today we create a Neutron provider network for each pod, but users are unaware of this. We transparently schedule instances to a network based on what host they get scheduled to. Said other way, users do not have the opportunity to choose their network.

Ultimately each of those per-pod/per-L2-domain Neutron networks are part of a security zone within our Network (which really is a L3 VRF.) It's _that_ level that we want to give users a choice of (what security zone their instance goes to.) So we'd like some construct in Neutron to be able to describe that the L3 network (for us, security zone) is comprised of many underlying L2 network segments.

I think that at a basic level this is what most other large deployers are doing. @Neil/Calico project is a similar setup, as I understand it, except the L2 boundary/segment is per-host and it's all L3 up from there.

Revision history for this message

Cedric Brandily (cbrandily) wrote on 2015-06-05:

#13

Such L2 boundary implies some constraints on nova scheduler? As nova scheduler should take into account L2 boundary to place vms.

Revision history for this message

Ed Warnicke (hagbard-0) wrote on 2015-06-05:

#14

@Cedric:

Does the Nova scheduler presume that a 'network' being an L2 segment has certain latency characteristics?
I ask, because increasingly, as tunneling tech like vxlan gets more popular... that presumption need not be true...

Revision history for this message

Ed Warnicke (hagbard-0) wrote on 2015-06-05:

#15

@Mike:

Thanks for the Calico pointer :)

Do you see L3 to the host gaining traction?

Revision history for this message

Mike Dorman (mdorman-m) wrote on 2015-06-05:

#16

@Cedric yes, there are scheduler implications of this, which really is the root of the issue for us.

@Ed Re: tunneling, this has been the design assumption of Neutron all along. Where you can get a "network" (L2 domain) anywhere and everywhere via tunneling. However, we believe in keeping things simple and avoiding the complexity of lots of tunneling and overlay networks, so we have tried to massage Neutron into understanding our physical network topology. And thus how we have arrived here.

I do think L3 to the host over time will be more popular. We don't have any plans at the moment to go that direction, but I can envision it happening eventually. @Neil / Calico may have more data/info about how widespread that architecture is becoming.

Revision history for this message

Itsuro Oda (oda-g) wrote on 2015-06-08:

#17

@Neil,
>For example, if a 'Neutron network' is partitioned into three L2 segments, something needs to change about how DHCP >agents+servers are positioned and scheduled, such that there is a (or at least one) DHCP agent+server in each segment.

I think the spec https://review.openstack.org/#/c/169612/ (Add availability_zone support) tries to address this.
I think "availability_zone" and "segment" of this RFE are similar (though the availability_zone spec seems to focus on network nodes rather than compute nodes).

Revision history for this message

Cedric Brandily (cbrandily) wrote on 2015-06-08:

#18

In my understanding, a neutron network is implemented by a segment which is available only on a subset of all neutron agents and nova computes and dhcp/routers should be allocated on dhcp/l3-agents in this subset, idem for vms on nova-computes

Revision history for this message

Kris Lindgren (klindgren) wrote on 2015-06-09:

#19

@ed, We (godaddy) currently have no need to schedule networks based upon some presumed latency characteristics. We just need to schedule a network based on where that network actually lives in the network. Be that at a host, rack, set of racks, or the entire cloud. It maybe a requirement to try to limit the number of vxlan extensions in the network at some point - though. Since current switches are limited to the number of VNI's they can support. IE: prefer hosts that already have this network segment trunked to it, vs's trunking a new segment through the network.

@Cedric, For us this is correct. In our configuration we also eliminated l3-agents from our implementation. We handle floating ip's by injecting a route into the network with the next-hop for the fixed-ip of them. Then we bind the IP the floating ip to a non-arping interface locally. Though technically we could implement this via a L3-agent that talks BGP to the switches to handle these advertisements, right now we are using our own internal automation system to inject routes into the network.

Revision history for this message

Ian Wells (ijw-ubuntu) wrote on 2015-06-12:

#20

This proposal skirts an issue of address allocation. There are three cases, in practice:

1. you don't actually care what address your VM gets and just want it to be somewhere on the network in any broadcast domain that suits. You can't do this today, but what this means in theory is that you would like a Neutron port that gets an address on binding, not creation, so that it could potentially land on any segment. We've done work in IPv6 where we had to add this feature to the port because we made a /64 per host with what we were doing and any fixed address pins you to a host (so you can't use them). The joy of v6 is your scheduling problems largely go away, mind you.
2. you allocate an address to a port and then run a VM on it. This may be something you wish to prevent an unprivileged user from doing; this effectively pre-selects a segment and therefore forces the instance into a certain section of the datacentre, which is not very cloudy in the sense that address domains don't really define anything about your cloud that a user should be using for placement purposes.
3. You're live migrating a VM. You need to reschedule it based on its address, which is equivalent to (2).

The shortcomings here would be:

- scheduling is a Nova-specific thing that doesn't account for Neutron inputs very well today - this is a bother in cases (2) and (3), and an issue in case (1) if addresses in a single segment get exhausted.
- There's no 'this has no address yet but it will' type of port.
- subnets are suddenly weird, because you want to allocate an IP from the network subnet but you can't. Actually, this may not be as bad as it sounds, because networks can indeed have multiple subnets and it's really just a matter of having different subnets apply to specific areas of the DC, a mapping that doesn't exist today. You just need to link a subnet to a segment in the case that a network is segmented.

The problem can be trimmed by preventing live migration for a VM on a segment and mandating the specially addressed port type for segmented networks. This obviously doesn't suit everyone.

This proposal skirts an issue of address allocation.  There are three cases, in practice:

1. you don't actually care what address your VM gets and just want it to be somewhere on the network in any broadcast domain that suits.  You can't do this today, but what this means in theory is that you would like a Neutron port that gets an address on binding, not creation, so that it could potentially land on any segment.  We've done work in IPv6 where we had to add this feature to the port because we made a /64 per host with what we were doing and any fixed address pins you to a host (so you can't use them).  The joy of v6 is your scheduling problems largely go away, mind you.
2. you allocate an address to a port and then run a VM on it.  This may be something you wish to prevent an unprivileged user from doing; this effectively pre-selects a segment and therefore forces the instance into a certain section of the datacentre, which is not very cloudy in the sense that address domains don't really define anything about your cloud that a user should be using for placement purposes.
3. You're live migrating a VM.  You need to reschedule it based on its address, which is equivalent to (2).

The shortcomings here would be:

- scheduling is a Nova-specific thing that doesn't account for Neutron inputs very well today - this is a bother in cases (2) and (3), and an issue in case (1) if addresses in a single segment get exhausted.
- There's no 'this has no address yet but it will' type of port.
- subnets are suddenly weird, because you want to allocate an IP from the network subnet but you can't.  Actually, this may not be as bad as it sounds, because networks can indeed have multiple subnets and it's really just a matter of having different subnets apply to specific areas of the DC, a mapping that doesn't exist today.  You just need to link a subnet to a segment in the case that a network is segmented.

The problem can be trimmed by preventing live migration for a VM on a segment and mandating the specially addressed port type for segmented networks.  This obviously doesn't suit everyone.

Revision history for this message

Kyle Mestery (mestery) wrote on 2015-06-17:

#21

Lets mark this as triaged.

Changed in neutron:
status:	Confirmed → Triaged

Revision history for this message

Kevin Benton (kevinbenton) wrote on 2015-06-24:

#22

Is this an appropriate summary?

"Neutron provides virtual networking. We don't want virtual networking. We want a networking API that maps to our physical infrastructure."

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-06-24:

#23

@Kevin - To answer for myself and the Calico project... Not exactly, no. To be completely honest, I'm not 100% sure what you mean by virtual networking, so I'm not sure that it would be correct for me to say that I don't want it.

- If virtual means implemented by overlays over the physical infrastructure: Calico doesn't want this, at least for data centers where the vast majority of traffic is IP-based between VMs with IP addressing in a shared address space. (I.e. where overlapping IPv4 addresses aren't needed for most traffic.) Calico's key proposition is that, for a data center where those assumptions apply, it is simpler and more scalable to provide connectivity between VMs using standard IP routing, treating the host-VM links at the same level as host-host.

- If virtual means supporting multiple tenants, with isolation between their VMs: Calico very much _does_ still want this. Calico implements this using iptables programming at the relevant compute hosts, instead of by using per-tenant overlay networks between the compute hosts.

- If virtual means - as in the current Neutron API - the semantics that VMs attached to a Neutron network don't initially have any connectivity to IP addresses outside that network, until that network is connected to a virtual Neutron router object: I'm not sure. In general I think the Neutron API should describe connectivity intent in a way that isn't closely tied to the in-tree reference software implementation; such as to allow alternative implementations of the same intent. In Calico at the moment we advise our users that they don't need to bother configuring any Neutron routers, because Calico already allows (subject to security configuration) connectivity between different Neutron networks. However that is a slightly uncomfortable departure from the established semantics, and I'd be open to either making this explicit somehow on the API, or modifying Calico so as to better implement the intent of the existing API.

I hope that helps! Please do ask if you have further questions.

@Kevin - To answer for myself and the Calico project...  Not exactly, no.  To be completely honest, I'm not 100% sure what you mean by virtual networking, so I'm not sure that it would be correct for me to say that I don't want it.

- If virtual means implemented by overlays over the physical infrastructure: Calico doesn't want this, at least for data centers where the vast majority of traffic is IP-based between VMs with IP addressing in a shared address space.  (I.e. where overlapping IPv4 addresses aren't needed for most traffic.)  Calico's key proposition is that, for a data center where those assumptions apply, it is simpler and more scalable to provide connectivity between VMs using standard IP routing, treating the host-VM links at the same level as host-host.

- If virtual means supporting multiple tenants, with isolation between their VMs: Calico very much _does_ still want this.  Calico implements this using iptables programming at the relevant compute hosts, instead of by using per-tenant overlay networks between the compute hosts.

- If virtual means - as in the current Neutron API - the semantics that VMs attached to a Neutron network don't initially have any connectivity to IP addresses outside that network, until that network is connected to a virtual Neutron router object: I'm not sure.  In general I think the Neutron API should describe connectivity intent in a way that isn't closely tied to the in-tree reference software implementation; such as to allow alternative implementations of the same intent.  In Calico at the moment we advise our users that they don't need to bother configuring any Neutron routers, because Calico already allows (subject to security configuration) connectivity between different Neutron networks.  However that is a slightly uncomfortable departure from the established semantics, and I'd be open to either making this explicit somehow on the API, or modifying Calico so as to better implement the intent of the existing API.

I hope that helps!  Please do ask if you have further questions.

Revision history for this message

Mike Dorman (mdorman-m) wrote on 2015-06-24:

#24

@Kevin, for our use case, and I think most of the other operators represented here, that is an accurate statement.

Revision history for this message

Andy Hill (hillad) wrote on 2015-06-24: Re: [Bug 1458890] Re: Add segment support to Neutron

#25

On Wed, Jun 24, 2015 at 4:14 AM, Kevin Benton <email address hidden>
wrote:

> Is this an appropriate summary?
>
> "Neutron provides virtual networking. We don't want virtual networking.
> We want a networking API that maps to our physical infrastructure."
>

At Rackspace, we have two bridged provider networks and N tunneled tenant
networks. Segmentation is how we scale bridged networks.

-AH

Revision history for this message

Kyle Mestery (mestery) wrote on 2015-06-24: Re: Add segment support to Neutron

#26

@Mike, are you saying you are literally not interested in providing tenant abilities to deploy their own topologies? Or you want that, but you also want Neutron to manage (or at least interact with) your underlying infrastructure?

Fundamentally I think we're confusing the logical models and abstractions Neutron provides with the underlying implementation. Throw in some management of the underlying infrastructure and it becomes a mess.

Revision history for this message

Mike Dorman (mdorman-m) wrote on 2015-06-24:

#27

Yeah, that's fair. To be completely honest, since we have never used tenant networks, I have a fairly minimal view of the capabilities of Neutron. So if some things that I say don't make any sense, that may be why!

Today, we don't care about private tenant networking. However I can imagine a time where we might (for a public cloud product, for example, this is probably important.)

But we definitely do want Neutron to be able to interact with the underlying infrastructure. I don't think we have an ask for Neutron to manage that ... we have that under control with other systems. We just want to be able to "plug in" to it, to to speak, using native Neutron.

So it's your second statement which applies best to us, I think. (Want the possibility of tenant topologies, but also want to interact with existing network infrastructure.)

Revision history for this message

Sam Morrison (sorrison) wrote on 2015-06-24:

#28

We have a similar setup to rackspace, we have 2 bridged provider networks and we want to scale these.

We have groups of compute nodes that are each connected to different L2 networks but for the user we want all these networks to appear as 1 entity (well 2 since we have a "public" and a "private" network) in neutron as the user doesn't care which one their instance gets attached to. The only thing that matters is that the compute node is attached to only one of these L2 networks.

Revision history for this message

Jian Wen (wenjianhn) wrote on 2015-06-25: Re: [Bug 1458890] Re: Add segment support to Neutron

#29

Download full text (4.1 KiB)

@Itsuro
This is the feature we(letv.com) are interested in.
Now we have a single private "segment" per availability zone.

On Mon, Jun 8, 2015 at 11:57 AM, Itsuro Oda <email address hidden> wrote:
> @Neil,
>>For example, if a 'Neutron network' is partitioned into three L2 segments, something needs to change about how DHCP >agents+servers are positioned and scheduled, such that there is a (or at least one) DHCP agent+server in each segment.
>
> I think the spec https://review.openstack.org/#/c/169612/ (Add availability_zone support) tries to address this.
> I think "availability_zone" and "segment" of this RFE are similar (though the availability_zone spec seems to focus on network nodes rather than compute nodes).
>
> --
> You received this bug notification because you are subscribed to
> neutron.
> Matching subscriptions: Quantum
> https://bugs.launchpad.net/bugs/1458890
>
> Title:
> Add segment support to Neutron
>
> Status in OpenStack Neutron (virtual network service):
> Confirmed
>
> Bug description:
> This is feedback from the Vancouver OpenStack Summit.
>
> During the large deployment team (Go Daddy, Yahoo!, NeCTAR, CERN,
> Rackspace, HP, BlueBox, among others) meeting, there was a discussion
> of network architectures that we use to deliver Openstack. As we
> talked it became clear that there are a number of challenges around
> networking.
>
> In many cases, our data center networks are architected with a
> differientation between layer 2 and layer 3. Said another way, there
> are distinct network "segments" which are only available to a subset
> of compute hosts. These topolgies are typically necessary to manage
> network resource capacity (IP addresses, broadcast domain size, ARP
> tables, etc.) Network topologies like these are not possible to
> describe with Neutron constructs today.
>
> The traditional solution to this is tunneling and overlay networks
> which makes all networks available everywhere in the data center.
> However, overlay networks represent a large increase in complexity
> that can be very difficult to troubleshoot. For this reason, many
> large deployers are not using overlay networks at all (or only for
> specific use cases like private tenant networks.)
>
> Beacuse Neutron does not have constructs that accurately describe our
> network architectures, we'd like to see the notion of a network
> "segment" in Neutron. A "segment" could mean a L2 domain, IP block
> boundary, or other partition. Operators could use this new construct
> to build accurate models of network topology within Neutron, making it
> much more usable.
>
> Example: The typical use case is L2 segments that are restrained
> to a single rack (or some subnet of compute hosts), but are still part
> of a larger L3 network. In this case, the overall Neutron network
> would describe the L3 network, and the network segments would be used
> to describe the L2 segments.
>
>
> WIth the network segment construct (which are not intended to be exposed to end users ), there is also a need for some scheduling logic around placement and addressing of instances on an appropriate network ...

A Related blueprint:
Add availability_zone support for API and DB
https://review.openstack.org/#/c/183369/

On Tue, Jul 21, 2015 at 5:06 AM, Robert Kukura
<1458890@bugs.launchpad.net> wrote:
> This bug was just brought to my attention on IRC, and I haven't read the
> comments yet, but a quick search shows no mention of the existing
> multiprovider extension, or of ML2, which implements multi-segment L2
> networks using this extension. ML2 also implements hierarchical port
> binding, which can dynamically manage segments within a rack or at any
> level of a network hierarchy. Are these potentially relevant? In what
> ways are they inadequate?
>
> --
> You received this bug notification because you are subscribed to
> neutron.
> Matching subscriptions: Quantum
> https://bugs.launchpad.net/bugs/1458890
>
> Title:
>   Add segment support to Neutron
>
> Status in neutron:
>   Triaged
>
> Bug description:
>   This is feedback from the Vancouver OpenStack Summit.
>
>   During the large deployment team (Go Daddy, Yahoo!, NeCTAR, CERN,
>   Rackspace, HP, BlueBox, among others) meeting, there was a discussion
>   of network architectures that we use to deliver Openstack.  As we
>   talked it became clear that there are a number of challenges around
>   networking.
>
>   In many cases, our data center networks are architected with a
>   differientation between layer 2 and layer 3.  Said another way, there
>   are distinct network "segments" which are only available to a subset
>   of compute hosts.  These topolgies are typically necessary to manage
>   network resource capacity (IP addresses, broadcast domain size, ARP
>   tables, etc.)   Network topologies like these are not possible to
>   describe with Neutron constructs today.
>
>   The traditional solution to this is tunneling and overlay networks
>   which makes all networks available everywhere in the data center.
>   However, overlay networks represent a large increase in complexity
>   that can be very difficult to troubleshoot.  For this reason, many
>   large deployers are not using overlay networks at all (or only for
>   specific use cases like private tenant networks.)
>
>   Beacuse Neutron does not have constructs that accurately describe our
>   network architectures, we'd like to see the notion of a network
>   "segment" in Neutron.  A "segment" could mean a L2 domain, IP block
>   boundary, or other partition.  Operators could use this new construct
>   to build accurate models of network topology within Neutron, making it
>   much more usable.
>
>       Example:  The typical use case is L2 segments that are restrained
>   to a single rack (or some subnet of compute hosts), but are still part
>   of a larger L3 network.  In this case, the overall Neutron network
>   would describe the L3 network, and the network segments would be used
>   to describe the L2 segments.
>
>
>   WIth the network segment construct (which are not intended to be exposed to end users ), there is also a need for some scheduling logic around placement and addressing of instances on an appropriate network segment based on availablity and capacity.  This also implies a means via API to report IP capacity of networks and segments, so we can filter out segments without capacity and the compute nodes that are tied to those segments.
>
>       Example:  The end user chooses the Neutron network for their
>   instance, which is actually comprised of several lower level network
>   segments within Neutron.  Scheduling must be done such that the
>   network segment chosen for the instance is available to the compute
>   node on which the instance is placed.  Additionally, the network
>   segment that's chosen must have available IP capacity in order for the
>   instance to be placed there.
>
>
>   Also, the scheduling for resize, migrate, ... should only consider the compute nodes allowed in the "network segment" where the VM is placed.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/neutron/+bug/1458890/+subscriptions

-- 
Best,

Jian

Revision history for this message

Carl Baldwin (carl-baldwin) wrote on 2015-07-21: Re: Add segment support to Neutron

#32

@Robert I have considered both the multi-provider extension and hierarchical port binding for this purpose. In fact, just hours before you added your comment to this bug, I wrote a message to the ML [1] soliciting feedback on a few possible approaches. In that post, I said the following about this:

"Overlay networks are not the answer to this. The goal of this effort
is to scale very large networks with many connected ports by doing L3
routing (e.g. to the top of rack) instead of using a large continuous
L2 fabric. Also, the operators interested in this work do not want
the complexity of overlay networks.

"It was suggested that hierarchical port binding could help here but I
see it as orthogonal to this. Hierarchical port binding extends the
L2 properties of a port to a hierarchical infrastructure to achieve
continuous L2 connectivity. It is also intended for overlay networks.
That isn't what we're doing here and I don't think it fits.

"I have also considered the multi-provider extension for this.
This is not yet clear to me either. First, my understanding was that
this extension describes multi-segment continuous L2 fabrics. Second,
there doesn't seem to be any host binding aspect to the multi-provider
extension. Third, not all L2 plugins support this extension. It
seems silly to require L2 plugin support in order to enable routing
between segments."

Hierarchical port binding looks great for overlaying smaller tenant-private L2 networks onto a large L3 routed infrastructure. But, what the operators want here is to directly plug in to the L3 routed infrastructure without the complexity of an overlay. Another way to think about this is that we're not after a multi-segmented L2 network, we're after a multi-segmented L3 network where the L2 traffic is confined to a segment. The use cases for this don't require guaranteed L2 connectivity.

[1] http://lists.openstack.org/pipermail/openstack-dev/2015-July/070028.html

Carl Baldwin (carl-baldwin) on 2015-07-21

description:

updated

Revision history for this message

Assaf Muller (amuller) wrote on 2015-07-25:

#33

Please see: https://review.openstack.org/#/c/205631 and https://launchpad.net/bugs/1478100, which aims to solve the issue with DHCP agent scheduling being oblivious to the physical_network tag of a Neutron network.

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2015-10-20:

#34

Discussion to continue on the spec proposal.

tags:

added: rfe-approved
removed: rfe

Armando Migliaccio (armando-migliaccio) on 2015-11-11

Changed in neutron:
importance:	Undecided → Wishlist

Armando Migliaccio (armando-migliaccio) on 2015-11-20

Changed in neutron:
milestone:	none → mitaka-1

Armando Migliaccio (armando-migliaccio) on 2015-11-20

Changed in neutron:
assignee:	nobody → Carl Baldwin (carl-baldwin)

Armando Migliaccio (armando-migliaccio) on 2015-12-03

Changed in neutron:
milestone:	mitaka-1 → mitaka-2

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-12-11:

#35

I realized last night that I had a hole in my understanding of how the
data path between VMs works, with a routed/segmented network. Then
while writing this comment I think I worked out the answer - so now
this is just a request for people to review and check that the
following is correct.

> Example: The typical use case is L2 segments that are restrained to a
> single rack (or some subnet of compute hosts), but are still part of a
> larger L3 network. In this case, the overall Neutron network would
> describe the L3 network, and the network segments would be used to
> describe the L2 segments.

So here is the data path between two VMs that are in in the same L3
network but on different L2 segments (=> different racks or pods):

VM A ---- Host B ----------- router ----------- Host B ---- VM D
10.0.1.2 L2 segment #1 L2 segment #2 10.0.2.2
10.0.1/24 10.0.2/24

My worry last night was: When a data packet is sent from A to D,
doesn't it need to be routed on B - as opposed to being bridged - in
order to know that its next hop is the router?

But actually I suppose routing must happen even before that, on VM A,
and everything will work, with B bridging, if the routes on A look
like:

10.0.1/24 dev eth0
default via 10.0.1.1

Is that all correct? Thanks - Neil

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-12-11:

#36

Grr, ASCII art didn't come out properly. The diagram in that last comment is supposed to show:

- VM A having IP address 10.0.1.2
- VM B having IP address 10.0.2.2
- L2 segment #1 between VM A and router, with IP subnet 10.0.1/24
- L2 segment #2 between router and VM B, with IP subnet 10.0.2/24

Revision history for this message

Carl Baldwin (carl-baldwin) wrote on 2015-12-16:

#37

@Neil, I'm assuming that in general the second "Host B" in your example should be "Host C". Hosts B and C *could* be the same host but in general they won't be.

Routing doesn't happen in the compute host. The compute host provides whatever L2 bridging is necessary to get traffic from the VM's vnic to the provider network. The details of this L2 transport depend on the plugin and network type in use. The router (with gateway 10.0.1.1) will receive the L2 traffic -- typically on a VLAN from the compute host -- and then route it.

I imagine that the "router" in your case is the logical combination of the top of rack routers for the two racks and whatever routing happens between the them. But, for this case, we can treat the logical effect of these routers as just one router.

Your final statement is correct that the routing will happen on the VM and the VM's routing tables will look like your example with the default gateway pointing to the router local to the segment (10.0.1.1)

In my current proposal, each segment is modeled as a self-contained provider network. It has a local router for the gateway, the gateway ip is that of the Subnet on the provider Network. There would be a DHCP server in each segment to provide the unique gateway to the VMs on that segment.

The proposal may shift toward using a single Network with multiple Segments to model the segments. In this case, since there is no L2 continuity between segments, we'll have to figure out how DHCP factors in.

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-12-16:

#38

Thanks @Carl. That's all clear now.

Carl Baldwin (carl-baldwin) on 2016-01-18

Changed in neutron:
milestone:	mitaka-2 → mitaka-3

Revision history for this message

Cedric Brandily (cbrandily) wrote on 2016-01-18:

#39

It's premature to set a milestone on this rfe at the moment

Changed in neutron:
milestone:	mitaka-3 → none

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2016-01-28:

#40

status update: http://eavesdrop.openstack.org/meetings/neutron_drivers/2016/neutron_drivers.2016-01-28-22.20.log.txt

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2016-01-29: Re: [Bug 1458890] Re: Add segment support to Neutron

#41

Download full text (3.6 KiB)

Thanks Armando.

Original Message
From: Armando Migliaccio
Sent: Thursday, 28 January 2016 23:31
To: Neil Jerram
Reply To: Bug 1458890
Subject: [Bug 1458890] Re: Add segment support to Neutron

status update:
http://eavesdrop.openstack.org/meetings/neutron_drivers/2016/neutron_drivers.2016-01-28-22.20.log.txt

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1458890

Title:
Add segment support to Neutron

Status in neutron:
Triaged

Bug description:
This is feedback from the Vancouver OpenStack Summit.

  During the large deployment team (Go Daddy, Yahoo!, NeCTAR, CERN,
  Rackspace, HP, BlueBox, among others) meeting, there was a discussion
  of network architectures that we use to deliver Openstack. As we
  talked it became clear that there are a number of challenges around
  networking.

  In many cases, our data center networks are architected with a
  differientation between layer 2 and layer 3. Said another way, there
  are distinct network "segments" which are only available to a subset
  of compute hosts. These topolgies are typically necessary to manage
  network resource capacity (IP addresses, broadcast domain size, ARP
  tables, etc.) Network topologies like these are not possible to
  describe with Neutron constructs today.

  The traditional solution to this is tunneling and overlay networks
  which makes all networks available everywhere in the data center.
  However, overlay networks represent a large increase in complexity
  that can be very difficult to troubleshoot. For this reason, many
  large deployers are not using overlay networks at all (or only for
  specific use cases like private tenant networks.)

  Beacuse Neutron does not have constructs that accurately describe our
  network architectures, we'd like to see the notion of a network
  "segment" in Neutron. A "segment" could mean a L2 domain, IP block
  boundary, or other partition. Operators could use this new construct
  to build accurate models of network topology within Neutron, making it
  much more usable.

      Example: The typical use case is L2 segments that are restrained
  to a single rack (or some subnet of compute hosts), but are still part
  of a larger L3 network. In this case, the overall Neutron network
  would describe the L3 network, and the network segments would be used
  to describe the L2 segments.

  WIth the network segment construct (which are not intended to be
  exposed to end users ), there is also a need for some scheduling logic
  around placement and addressing of instances on an appropriate network
  segment based on availablity and capacity. This also implies a means
  via API to report IP capacity of networks and segments, so we can
  filter out segments without capacity and the compute nodes that are
  tied to those segments.

      Example: The end user chooses the Neutron network for their
  instance, which is actually comprised of several lower level network
  segments within Neutron. Scheduling must be done such that the
  network segment chosen for the instance is available to the compute
  node on which the instance is placed. Addition...

Duplicates of this bug

Bug #1472704

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Support Routed Networks in Neutron

Remote bug watches

Bug watches keep track of this bug in other bug trackers.