[RFE] Distributed DHCP agent

Bug #1806390 reported by Yang Youseok
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Opinion
Wishlist
Yang Youseok

Bug Description

It was very old issue and ended with invalid feature though, I could not find ideal solution so that I raise this issue again. I wonder how other think of it.

It's heavily related to the old issue (https://bugs.launchpad.net/neutron/+bug/1468236), and I reconstruct the issue from my understanding.

Problems
- With giant shared provider network which has over than 10000 ports in a network.
- Several DHCP agents for the network. Even per hypervisor for Calico project.
- Scalability issue (DHCP lease file is not updated after the VM started) occurs.

Solutions from the reporter
1. Add distributed flag for the DHCP agent. And provision DHCP agent on every compute node.
2. Change DHCP agent notifier to specify DHCP agent per hosts
3. Do not spread DHCP flow outside of local hypervisor.

Conclusion
- Rejected because
- Solution step (2) add big complexity to agent notifier RPC.
- (3) is not a general solution.
- Even worse for migration. There were many side effects to we have to care about.
- There were building blocks that we can achieve the purpose. (It was mentioned on IRC, but I still does not understand what the building block that mentioned is.)

Our private cluster is very much like the Calico. We have an giant provider network and make them routable with quagga and there were DHCP agents per compute node. I believe that community has formed some consensus that this kind of architecture is pretty good at handling scale issues to see the approach like Routed network.

And to achieve the architecture with the lack of L2, modifying DHCP agent could not be avoided since its default HA behavior make critical DB performance issues.

But at the same time, I absolutely agreed with the comment which care about the unnecessary complexity for distributed approach like DVR.

So What I suggest is
- Do not modify current DHCP agent behaviors like notifier side API. It does not harm migration logic.
- Do not change the DHCP HA concept and L2 agent at all.
- Just add a distributed flag for DHCP agent. And add host filtering logic the handler side RPC (get_active_network_info, get_network_info) only when the DHCP agent is distributed.
- Operators have little bit new concept of distributed DHCP which the agent is only for ports within a local hypervisor.

Then we can achieve from the change
- Reduce the performance overhead. I found the performance penalty is related to DB side (getting ports with get_active_info(), and complete provisioning step with dhcp_ready_on_ports(). RPC fanout is minor.
- Make new concept which means DHCP agent failure domain is splitted.

Any comments are appreciated.

Tags: rfe
Yang Youseok (ileixe)
tags: added: rfe
description: updated
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Thank you for your RFE report.

Please bring this to the neutron-drivers IRC meeting (every Friday at 14:00 UTC):

http://eavesdrop.openstack.org/#Neutron_drivers_Meeting

Changed in neutron:
importance: Undecided → Wishlist
Revision history for this message
Yang Youseok (ileixe) wrote :

@Bence Thank you for the guide. I will take this at the meeting. :)

Revision history for this message
Brian Haley (brian-haley) wrote :

I thought with DVR enabled you could also start the dhcp-agent on each compute node and achieve this, but it's been a while since I tested that configuration, so I could be mis-remembering. It's worth a try.

Revision history for this message
Yang Youseok (ileixe) wrote :

@Brian Hi Brian. Yes, actually there is no problem with the concept of dhcp-agent on each compute node already. The issue is about performance degradation, not about working.

I was silly to just waiting for discussion of this issue at last meeting since I did not know the meeting's procedures... T_T

I hope I can discuss the issue at the meeting though, Let me give initial code commit first. It can help reviewer for better understanding.

Revision history for this message
Miguel Lavalle (minsel) wrote :

@Yang,

I have a some questions:

1) In your description you reference a prior discussion that took some years ago. Can you state, in your own words the problem that you are trying to solve?

2) In comment #4 above, you indicated that you were going to propose a patch, to facilitate the discussion. I searched in Gerrit and couldn't any patches. Do you still plan to propose a patch?

Revision history for this message
Yang Youseok (ileixe) wrote :

@Miguel

Hello Miguel. Thanks for the attention.

1) The problem we encounter, DHCP lease file is not updated until VM instance booted, so the VM does not have IP from times to time. (And the larger the scale, the more and more such a trend has occurred) It came from unique deployment architecture (DHCP agent per every compute nodes) which generates DB overhead, and what I suggest is to ease the DB overhead.

I did not profile so I not 100% sure for the root cause, but empirically guessed, DB overhead which I mentioned is generally come from the two functions below.

- get_active_network_info(): this function makes query for every ports in a giant network. We are currently having more than 20000 ports in a network. So whenever new port is added, almost 1000(number of DHCP agents) * 20000 (number of ports) = 2000000 queries are generated.

- dhcp_ready_on_ports(): even worse, this function makes more query for every ports from the `get_active_network_info()). What I found is after upgrade Newton release (after adopting provisioning_block scheme), missing DHCP lease is accelerated a lot.

2) I did not propose a patch yet. We actually has a custom code to avoid the situation, and I want to add more test to justify the solution. It's related to the scale issue so I have a plan to add rally test case for our use.

And at the same time, currently I am focusing on upgrading Openstack cluster in our company (We are way behind...), so I did not make it yet.

But I will definitely make a patch in that I don't want to maintain custom code and be reviewed by community for our code.

I hope my explanation would be a little bit helpful.

Thanks!

Revision history for this message
Miguel Lavalle (minsel) wrote :

@Yang,

Big networks like the one you describe is exactly the use case that routed networks was intended to address. Here's the presentation Carl Baldwin and I gave on this topic in Barcelona: https://www.youtube.com/watch?v=HwQFmzXdqZM&t=1235s.

In routed networks, each segment has a DHCP agent associated with it. That is middle ground between one or two DHCP agents for the entire network and your deployment, where you have one per compute. What do you think?

Revision history for this message
Miguel Lavalle (minsel) wrote :

I just emailed to your gmail address a copy of the presentation Carl and used in that presentation

Revision history for this message
Yang Youseok (ileixe) wrote :

@Miguel,

Hi Miguel. Thanks for the kindness.

Actually I found the routed network concept you said and investigate the architecture recently. I was so glad in that I thought I can migrate our network plugins to upstream way.

But there were few minor issues that we encountered.

I understand 'routed network' is conceptually letting L3 domain down to the Rack level. (and L2 domain is reduced at the same time). Our concept (32bit network similar to Calico does) is more intensively changing the concept which means letting L3 domain down to Hypervisor level. (and no L2 domain at all).

What I operated cluster with the network concept for several years, it has several benefits to solve the troublesome.

- No migration restrict: we don't care about where Rack to be provisioned with at all.
- No operation overhead: we don't need any mapping metadata for L2. No action needed for segment mapping. It means user could not use L2 protocol in VM, but my personal experience says no requirements at all for the use.
- No NAT for L3: since we already connect our provider network to infra network by BGP. We don't have to any NAT by L3.

So my conclusion is to maintain our codebase, but I willing to tack action if we can solve the problem that I mentioned by more right way.

Anyway, even this proposal came from the unique architecture, I think it's more like general problem. (And I naively the routed network also will encounter DHCP overhead problem soon, but I am not sure).

So.. Thanks for the comment! I was so glad to talk about the limitation to maintainer. :)

Revision history for this message
zhaobo (zhaobo6) wrote :

Nice to see this old problem again, ;-)

Revision history for this message
Miguel Lavalle (minsel) wrote :

@Yang,

Is still your plan to propose a Proof of Concept patch to facilitate the discussion of this RFE?

Revision history for this message
Yang Youseok (ileixe) wrote :

@Miguel,

Definitely, I will make a PR but I am not sure I could handle on it sooner or later.. T_T

Changed in neutron:
assignee: nobody → Yang Youseok (ileixe)
status: New → In Progress
Revision history for this message
Yang Youseok (ileixe) wrote :

@Miguel Sorry for late. I posted a PoC patch, so please review it when you possible. Thanks.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi @Yang, can You give us link to Your PoC patch?

Also Your proposal from one of above comments is:

> - Do not modify current DHCP agent behaviors like notifier side API. It does not harm migration logic.
> - Do not change the DHCP HA concept and L2 agent at all.
> - Just add a distributed flag for DHCP agent. And add host filtering logic the handler side RPC (get_active_network_info, get_network_info) only when the DHCP agent is distributed.

So if I will have DHCP HA (e.g. network should be on at least 3 dhcp agents), how You will select hosts on which information about port will be stored?

> - Operators have little bit new concept of distributed DHCP which the agent is only for ports within a local hypervisor.

What about API https://developer.openstack.org/api-ref/network/v2/#dhcp-agent-scheduler ? What it will report in case of this "distributed" flag set to True?

I think that this is quite complex issue and probably some spec would be useful to discuss and address such cases there.

Revision history for this message
Yang Youseok (ileixe) wrote :

@Slawek

Oh I did not catch the gerrit hook did not work. The PoC is in here

https://review.opendev.org/#/c/649219/

> So if I will have DHCP HA (e.g. network should be on at least 3 dhcp agents), how You will select > hosts on which information about port will be stored?

For the first approach, I did not care about HA for 'distributed' DHCP agents. Since I assume that Distributed DHCP should be deployed per hypervisors, it could be applied who care about failure domain.

To be more detailed, even we could have 3 DHCP agents for a network, if you encounter the very unlucky situation which all the 3 agents died, the networks bound to the agents would be failed to work. On distributed DHCP, even a DHCP agent died on a hypervisor, it just affects the VM in the hypervisor. Of course, we can think more about HA + distributed DHCP agents but imho it could be 2nd phase to be implemented due to the fact that it could accelerates code complexity a lot.

> What about API https://developer.openstack.org/api-ref/network/v2/#dhcp-agent-scheduler ? What it will report in case of this "distributed" flag set to True?

The PoC does not care about the API you specified though, I think it should be changed to 'distributed DHCP agent' has more specific networks in that it does not help other hypervisors. Also I imagine user have to be rejected their action to bind when they try to bind distributed DHCP agent.

Maybe it seems to be quite a naive idea, and.. that's why I want to be reviewed. Thanks for feedback! I will be waiting for your response.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi again,

I have one additional question about live-migration.

IIUC Your proposal, You want to have dhcp agents deployed on all compute nodes and let specific dhcp agent to configure only ports which are on this specific compute node, am I right?
How do You want e.g. do it during migration of vm? Currently DHCP agent don't need to be aware of something like that at all. But in Your case it will be additional step which needs to be done, right?

And also, question about alternative solutions. I know that e.g. networking-ovn is doing DHCP based on Openflow rules and it is done locally, so there is no dhcp agents at all in such case.
Maybe that would be more scalable solution which You could use alternatively?

Revision history for this message
Yang Youseok (ileixe) wrote :

Hi Slawek again,

Thanks for attention of this request.

Here is my answer what you asked.

1. IIUC Your proposal, You want to have dhcp agents deployed on all compute nodes and let specific dhcp agent to configure only ports which are on this specific compute node, am I right?
> Yes right.

How do You want e.g. do it during migration of vm? Currently DHCP agent don't need to be aware of something like that at all. But in Your case it will be additional step which needs to be done, right?
> No. In fact it does not need to have addition step for that. It's major difference of original request (https://review.opendev.org/#/c/184423/)

I found the original PR is to change agentnotifier side (which is changing default action sending all DHCP bound agents to send a specific agent). As you can see, past reviewer for the PR already warn the complexity because it's difficult to sync port-binding / agent notifying like many situations (e.g. migration..)

I agreed with the reviewer and wondered if there is another way. After that, I realized I don't have to change notifier side (just letting it send fanout to agents) with small change at the agent piggy backing API (get_active_networks_info) since major DB scale problem was caused by 'provisioning_block'.

So my point is since every distributed DHCP agents also receive fanout messages, we don't have to worry about the port-binding update timing. Distributed dhcp agent could work normally as they can receive port update event whenever port-binding updated.

2. Maybe that would be more scalable solution which You could use alternatively?
> Personally, I think OVN is one of the best scalable solution in current status. I definitely use it if we can make new clusters rather than changing existed clusters. T_T But there were several issues that we can just use it now.

First of all, we do not use Openvswitch. I can give why we prefer linuxbridge to openvswitch but it seems to be off-topic so I just omit it.

Only for 'distributed' DHCP, we don't want to rely agent-based implementation since imho Neutron has already enough features to do it. (Again this PR is just to optimize operation/performance).

Thanks!

Revision history for this message
Yang Youseok (ileixe) wrote :

After some investigation of routed network, I found out I could achieve distributed DHCP if segment are created for each hypervisor.

There were already scheduling logics using SegmentHostMapping, I think I don't have to change RPC at all.

Maybe what I need is just to implement routed network for hypervsior(https://bugs.launchpad.net/neutron/+bug/1846285)

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi Yang Youseok,

So do we still need to discuss about this RFE? Or maybe You can achieve what You need in different way and that's not needed anymore?

Revision history for this message
Yang Youseok (ileixe) wrote :

Hi Slawek.

The architecture I want to achieve by this request presume specific architecture (routed network for hypervisor), and honestly I felt it's very hard to imagine the former request (https://bugs.launchpad.net/neutron/+bug/1846285) to be implemented for now.

Also, I found community is trying to solve this kind of problem using OVN and I think it's more worthy to invest time even for me.

Anyway thanks to keep watching this issue. :) I changed this to opnion

Yang Youseok (ileixe)
Changed in neutron:
status: In Progress → Opinion
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/649219

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.