[RFE] routed network for hypervisor

Bug #1846285 reported by Yang Youseok on 2019-10-02
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Undecided
Unassigned

Bug Description

Hi.

I want to discuss further extension about routed network, and wonder how community think about it.

From current understanding, routed network is to restrict L2 domain for specific segment which is usually at the rack level.

I think by expanding the idea that restrict L2 domain further, routed network would be more generic solution. My naive idea is that

- Make a segment for each hypervisor.
  - Admin does not have to deal with segment API.
    - Make new type driver to achieve that.

- Make some change for Neutron agent.
  - e.g. Distributed DHCP
  - e.g. Metadata proxy at DHCP
  - e.g. Simplified DVR L3 agent

- Add kernel static host route(/32) in integration bridge and enable proxy arp to route the VM.
  - We have a several routing in the hypervisor.
    - Make new L2 extension to achieve that.

- (Optional) Make new BGP agent to propagate kernel static routing to underlay ToR using iBGP
  - It's to automate underlay network
    - Neutron does not touch ToR. ToR BGP peer should be pre-defined by admin

- (Optional/Future) Support tenant network to use this concept.
  - It's also original future work for the routed network.
    - It can be more simplified in this concept since it does not care about segment anymore.

The gain is by this
- Admin does not care about segment concept.
- Network model is simplified. No concern about VM migration.
- Simplified L3 agent. L3 agent in this model should provide specified features(DNAT...). Major features of L3 agent to achieve L3 connectivities could be deleted.
- Achieve tenant network without major change.
- Truly scaled. One giant network could have over million numbers of ports.

For implementation, I think it's not very easy though, Neutron already has enough blocks to implement. For me, biggest concern is wether having a lot of segments(same as VM counts) in the networks would be problem or not.

We actually had maintained this architecture for several years, and it's quite stable. However
 it's built in our in-house there would be several defects caused by not developing with community. So I want to make our system more generic which implemented at upstream.

Any comments will be appreciated.

Thanks.

Tags: rfe Edit Tag help
Bernard Cafarelli (bcafarel) wrote :

Thanks for filling this RFE bug, you are correct that current routed networks have L2 separation at segment level (rack or remote site).
I suppose this would be a separate "mode" for hypervisor separation?
In all cases, I added the ref tag so it can be discussed in an upcoming neutron drivers meeting (or if drivers have additional questions for you before)

Changed in neutron:
status: New → Confirmed
tags: added: rfe
Yang Youseok (ileixe) wrote :

@Bernard

Thanks for the comments. I also will be drivers meeting today. :)

Slawek Kaplonski (slaweq) wrote :

Hi,

Thx for proposing this change. I have couple of questions about it:

1. You mentioned new type driver - so this will not work with all network types which we support currently? If not, what will it support? Only vlan/flat networks? Or maybe something else too?

2. You mentioned "some" changes to agents. E.g. metadata proxy in DHCP is possible today, what exactly You want to add/change there? Can You explain those required changes in bit more details?

3. What do You mean by adding /32 static route in integration bridge?

4. You mentioned "new BGP" agent - can't You use agent from neutron-dynamic-routing for that?

I will also ask our L3 experts to take a look into that RFE.

Yang Youseok (ileixe) wrote :

Hi, Slawek. Thanks for the comments.

Here is my answers from current understanding. I'm afraid that there were wrong thought from lack of my neutron concept. If there is, please let me know.
And also I have to say currently our cluster does not have network node at all since lack of L2 makes us to make all agents to be distributed in a hypervisor, Consider it for better understanding.

1. After your question, I realized there is no need to add new type driver. I think network types supported by routed network would be also fine for routed network for hypervisor. But if then, we have to add some 'flag' to distinguish conceptually three different network (current routed network, routed network for hypervisor, normal network). Currently, admin can create routed network with segmentation id. I think there should be any way to add a flag for new routed network of hypervisor since admin does not care about segment in this concept.

2. I need clarify to say specific changes about agent. There should be minor changes from agent side to accept this concept.

- Distributed DHCP agent.
Of course admin can provision dhcp agent for each hypervisor. But it makes bothersome things. I already make a request about it (https://bugs.launchpad.net/neutron/+bug/1806390)

- Metadata proxy
Metadata should be in a hypervisor so we have to impose a config restriction to work normally. (force_metadata, use_gateway_ips = True in LinuxInterfaceDriver)

- L3 agent.
Delete SNAT codes. We override router_info using this API(https://specs.openstack.org/openstack/neutron-specs/specs/stein/router-factory-with-l3-extension.html) and extract many unused things.

3. Since this concept does not have L2, we should 'route' VM traffic to upper switch to ensure connectivity. Current routed network does not warrant L3 connectivity in ToR switch level. For hypervisor level, we have to add routing to ensure connectivity in hypervisor level. That's why I said to adding /32 routing.

4. What I understood the neutron-dynamic-routing before, it's just for propagate routing for Neutron router. I'm not sure though, I can use the agent if it's flexible to achieve our purpose.

Thanks.

Yang Youseok (ileixe) wrote :

After looking at the detailed implementation of routed_network, I saw biggest challenge is to change current subnet/segment schema.

Now subnet has only one segment_id, and for this concept we have to change this 1:1 mapping to 1:N. To be specific, one subnet should have as many segments as the number of hypervisors.

DHCP agents for this architecture should be scheduled not only using segment bound subnets but also bound host_id. What I had made about distributed DHCP agent would be invalid since after that we actually changed agent notifier RPC.

I expect Neutron guys are already familiar with this approach, but for whom does not I leave some reference of this architecture for better understanding.

- https://engineering.linecorp.com/en/blog/verda-at-cloudnative-openstack-days-2019-2-2/
- https://netdevconf.info/1.2/slides/oct7/09_andrew_kong_netdefconf_2016.pdf
- http://info.tigera.io/rs/805-GFH-732/images/ProjectCalico-Datasheet.pdf

Miguel Lavalle (minsel) wrote :

Hi,

Thanks for your proposal. This is routed networks taken to it's ultimate extreme, which is really eradicating L2 domains from the deployment and have all the VM's traffic handled at L3. Would this be a fair characterization of your proposal? But is that is the case, why fake complete L3 on top of a Neutron implementation that is geared towards L2? Furthermore, why not use Calico that does everything at L3? Have you thought of the performance implications:

1) For routed networks, the Neutron server maintains a segment - host mapping. The overheaded of maintaining this for a huge number of segments may be very high. Have you measured the impact?

2) Neutron server also updates Nova via placement of the ipv4 inventory available in each segment. This data is useful in scheduling VMs to the correct compute hosts. Again, have you thought of the performance impact? Have you measured it?

Yang Youseok (ileixe) wrote :

Hi, Miguel. Thanks for your question. Your question has led me to think what I have to consider.

I will speak a little straightforward to convey exactly what I thought. Please shed light on me about my misunderstanding from my shallow experience.

Would this be a fair characterization of your proposal?
>> Yes, I think it's fair statement for my request.

why fake complete L3 on top of a Neutron implementation that is geared towards L2?
>> Because I assume that Neutron does not have to be stayed with an implementation that depends on L2 size. I think routed network itself came from same idea to solve several problems (mainly scalability I thought). I'm not talking that I want to bypass every routines in ML2 to fake L2 for L3 only routing. My idea is rather opposite of that. ML2 has flexible implementation so that I could add some logic to configure L2 size.

Furthermore, why not use Calico that does everything at L3
>> The answer is related to my previous comments. What I want to is just to resize the L2 scope, and at the same time I want to use Neutron functionality. Even if I do not mention implementation details about networking-calico, other many things in Neutron (e.g. LBaaS) are not supported in the implementation. It's first practical reason why I do not use it. There were several issues to use it (maintenance, ownership etc), but I think it's also out-of-scope of this request. We already had been used some kind of routing implementation, and what I suggest is to integrate them to Neutron itself if it's right direction.

Have you thought of the performance implications:
>> It's a open question to think about. Nothing has been done which integrate routed network yet. I willing to be getting started when this request is conceptually fair enough to everyone though, before that it's too early to talk about the performance.

But first let me tell you what I think currently,

1) The overheaded of maintaining this for a huge number of segments may be very high. Have you measured the impact?
>> Segment (+ HostMapping) is not complex data structure and imho, the increased load is same as (the number of hypervisors in a Rack) * (original routed network). The number of hypervisor in a rack could be roughly lower than 100, there should be only that much overhead by calculation. The only metadata to be considered is 'host' which is already maintained as DB index, I think it's fair enough.

2) Again, have you thought of the performance impact? Have you measured it?
>> I understand this restriction is came from the thing that Nova should not choose outer IP scope (segment). But in this concept there is no relation between subnet / segment (because subnet has many segments), this is no longer issue. Nova can choose every hosts in a subnet.

Thanks.

Miguel Lavalle (minsel) wrote :

Hi,

To continue the conversation, let's use a one network example:

1) You are going to have one giant network across the entire deployment.

2) This network has one segment per hypervisor

3) Each subnet is associated to all the segments

Is this what you are proposing?

Hi, Miguel.

Yes, 1,2,3, that is correct.

2019년 10월 18일 (금) 오전 8:11, Miguel Lavalle <email address hidden>님이 작성:
>
> Hi,
>
> To continue the conversation, let's use a one network example:
>
> 1) You are going to have one giant network across the entire deployment.
>
> 2) This network has one segment per hypervisor
>
> 3) Each subnet is associated to all the segments
>
> Is this what you are proposing?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1846285
>
> Title:
> [RFE] routed network for hypervisor
>
> Status in neutron:
> Confirmed
>
> Bug description:
> Hi.
>
> I want to discuss further extension about routed network, and wonder
> how community think about it.
>
> From current understanding, routed network is to restrict L2 domain
> for specific segment which is usually at the rack level.
>
> I think by expanding the idea that restrict L2 domain further, routed
> network would be more generic solution. My naive idea is that
>
> - Make a segment for each hypervisor.
> - Admin does not have to deal with segment API.
> - Make new type driver to achieve that.
>
> - Make some change for Neutron agent.
> - e.g. Distributed DHCP
> - e.g. Metadata proxy at DHCP
> - e.g. Simplified DVR L3 agent
>
> - Add kernel static host route(/32) in integration bridge and enable proxy arp to route the VM.
> - We have a several routing in the hypervisor.
> - Make new L2 extension to achieve that.
>
> - (Optional) Make new BGP agent to propagate kernel static routing to underlay ToR using iBGP
> - It's to automate underlay network
> - Neutron does not touch ToR. ToR BGP peer should be pre-defined by admin
>
> - (Optional/Future) Support tenant network to use this concept.
> - It's also original future work for the routed network.
> - It can be more simplified in this concept since it does not care about segment anymore.
>
> The gain is by this
> - Admin does not care about segment concept.
> - Network model is simplified. No concern about VM migration.
> - Simplified L3 agent. L3 agent in this model should provide specified features(DNAT...). Major features of L3 agent to achieve L3 connectivities could be deleted.
> - Achieve tenant network without major change.
> - Truly scaled. One giant network could have over million numbers of ports.
>
>
> For implementation, I think it's not very easy though, Neutron already has enough blocks to implement. For me, biggest concern is wether having a lot of segments(same as VM counts) in the networks would be problem or not.
>
> We actually had maintained this architecture for several years, and it's quite stable. However
> it's built in our in-house there would be several defects caused by not developing with community. So I want to make our system more generic which implemented at upstream.
>
> Any comments will be appreciated.
>
> Thanks.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/neutron/+bug/1846285/+subscriptions

Slawek Kaplonski (slaweq) wrote :

I think we have enough info to discuss about this on one of the next drivers meetings.

tags: added: rfe-triaged
removed: rfe
Miguel Lavalle (minsel) wrote :

I still have questions that I think have to be clarified by the submitter:

1) In note #4 above you describe a series of changes to the agents and other part of the codes. You use wording like "Delete SNAT codes" and "extract many unused things". While some deployers / users might embrace this proposal, the vast majority of them want to preserve the current behavior of L2 and routed networks. Under this proposal what happens to that vast majority of users / deployers? What is the proposal to maintain backwards compatibility?

2) In #7 submitter claims that maintaining the segment host mapping shouldn't create a major performance penalty. However, we have indication from deployers that the segment host mapping incurs a performance penalty, which under some situation, might not be acceptable. Please look at https://bugs.launchpad.net/neutron/+bug/1799328, which motivated https://review.opendev.org/#/c/612624/. In #9 submitter confirms that under this proposal there will be one segment per hypervisor. Why does the submitter think there won't be any discernible performance penalty?

3) Furthermore, when assigning IPs to ports, currently we use query the segment host mapping here: https://github.com/openstack/neutron/blob/f5a827c2be06f24a1f8025f120f16c12eb1b1f55/neutron/objects/subnet.py#L300-L364. Has submitter considered the performance impact in the IPAM functionality of his proposal?

4) When binding ports, we start the process with the network's segments: https://github.com/openstack/neutron/blob/f5a827c2be06f24a1f8025f120f16c12eb1b1f55/neutron/plugins/ml2/managers.py#L795-L796. Has submitter given any thought to how the binding process is going to behave when a network has hundreds or thousands of segments (one per hypervisor)?

Yang Youseok (ileixe) wrote :

@Miguel

Thanks for keeping attention for this issue.

1) imho, it's quite difficult to stay the original Neutron concept which Neutron router provide connectivity among different networks in that this proposal merely assume that L3 connectivity is already made in the hypervisor level. Currently, I think for this concept, I have to fix snat_enabled=False router for this configuration since although DNAT(Floating IP) or other concept of router (FwaaS) could be applied, it's hard to imagine providing SNAT functionally easy without major new concept like L3 isolation (VRF, MPLS...)

2)3)4) I don't think there is no performance problem at all using small tweak in current implementation. What I want to say is the model is simplified reducing complexities.

Conceptually, the problem that you worried is came from below relation.

Subnet1 - Segment1 - Host1,2,3
Subnet2 - Segment2 - Host4,5,6

In this case, many logic (IPAM/Host mapping/host binding...) have to consider how segment relate specific hosts. If we can change this to

Subnet1 - Segment 1 - Host1
        - Segment 2 - Host2
        - Segment 3 - Host3
        ....

Subnet2 - Segment 1 - Host1
        - Segment 2 - Host2
        - Segment 3 - Host3
        ...

We don't have to worry about how segment relate specific host, because it's just 1:1 and also there is no difference between subnets since all of them are not different.

Besides, the more I talk, the more I felt it's difficult to consider everything. Personally I think it's worth a try, but I'm not sure it's a change for the community. Please consider whether it is a possible change.

Many thanks

Slawek Kaplonski (slaweq) wrote :

Thx Miguel and Yang Youseok for working on this proposal. I marked it as triagged wrongly so now I put it back to be not yet triaged rfe. Lets wait until Miguel will really triage it first.

tags: added: rfe
removed: rfe-triaged
Miguel Lavalle (minsel) wrote :

To be clear. I think the proposal is interesting and has merit. However, based on the information the submitter has provided so far, it seems that implementing this in Neutron would force every OpenStack user to deploy Neutron in the way this RFE proposes. We cannot make that commitment. So let's continue exploring the idea and try to find a way to implement it

Yang Youseok (ileixe) wrote :

Thank you for your interest. I will clarify the detailed implementation. Now I understand the most important thing to be considered is to maintain how Neutron originally works. I am going to make some possible ways to achieve it.

Yang Youseok (ileixe) wrote :

@Miguel

Hi Miguel. For me, there is one obscure thing about routed network is how routed network assure L3 network connectivity between ToRs.

From the sentence of documents (https://docs.openstack.org/neutron/pike/admin/config-routed-networks.html), there is a sentence 'In the future, implementation of dynamic routing protocols may ease configuration of routed networks.' How was admin configure of infra network with/without dynamic routing protocol?

I think integration of L3 network into underlay(or infra?) network is key point to implement this extended routed network model because without it, I could not find ideal way to follow core concept of neutron.

My current naive idea is that
1. Abstract underlay network using current encapsulation(vxlan, gre...) mechanism, propagate routing entries to each hypervisors using tunnel. It looks like calico way except they does not encapsulate for tenant.

2. Pre configure ToR using BGP and hypervisor make some encapsulation(mpls..) between ToR and hypervisors. It looks like networking-bagpipe L3VPN implementation.

I think both way seems not to be easy since current neutron abstract underlay network considering them blackbox.

Any advice would be appreciated. Thanks.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers