neutron

enable neutron support distributed DHCP agents

Bug #1468236 reported by shihanzhang on 2015-06-24

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	neutron	Won't Fix	Wishlist	shihanzhang

Bug Description

the current DHCP service in neutron is centralized, it suffers from several ailments in the large-scale scenarios
1. VM can't get IP at booting time, the most serious is that leads to metadata service can't work.
2. DHCP agent need much time to reboot if it has served for a large VMs.
3. network node has a large number of namespaces, especially in public cloud, there are so many tenants and private networks.

I think we can run dhcp-agent across all compute nodes, but it does not completely like current DVR, the main difference are as below:
1. it simplify the dhcp-agent scheduler in neutron-server, when we create a VM, neutron-server just send the RPC message by port host id.
2. the dhcp-agent running on a compute node just serve the VMs on this compute node, if this dhcp-agent down, it just affect the VMs running on this node.
3. move the network-binding-dhcp-agent from neutron-server to dhcp-agent, it can remove the race hapens between neutron-server multi-workers.

See original description

Tags:

shihanzhang (shihanzhang) on 2015-06-24

tags:

added: rfe

Revision history for this message

Kevin Benton (kevinbenton) wrote on 2015-06-29:

Why not run a dhcp agent on every compute node?

Revision history for this message

Assaf Muller (amuller) wrote on 2015-06-29:

You do, this spec is about optimizing that. You can read the spec for more details.

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-06-30:

hi Assaf Muller, thx for your explanation, you are right, this proposal will optimize that, if just run dhcp agent on every compute node without the change in this proposal , it has some problems as this spec describe.

Revision history for this message

Kyle Mestery (mestery) wrote on 2015-07-01:

Seems quite reasonable.

Changed in neutron:
status:	New → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-07-24: Related fix proposed to neutron-specs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/205429

Revision history for this message

Assaf Muller (amuller) wrote on 2015-08-26:

@shihanzhang, can you update the bug's description with a problem statement, and the high level architectural choices made? For example, why did you decide to make changes to the OVS agent? What are the pieces involved and why, etc.

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-09-11:

FWIW, this is also what we do with networking-calico, i.e. run a DHCP agent on every compute node.

Armando Migliaccio (armando-migliaccio) on 2015-11-11

Changed in neutron:
importance:	Undecided → Wishlist

shihanzhang (shihanzhang) on 2015-11-24

Changed in neutron:
assignee:	nobody → shihanzhang (shihanzhang)

Revision history for this message

Akihiro Motoki (amotoki) wrote on 2015-11-24:

Agree with Assaf. Could you update the bug description with a problem statement and a higher level approach,
particularly the problems with the current multiple DHCP agents approach.

Running dhcp-agent on each compute node itself sounds reasonable to me. it is similar to nova-network multi-host approach.

On the other hand, we need to asses the complexity and our code stability when it is introduced.

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-11-25:

amotoki, thanks very much for your suggestion, I have updated the description, hope this can be done in mitaka-1

description:

updated

shihanzhang (shihanzhang) on 2015-11-30

description:	updated
description:	updated

shihanzhang (shihanzhang) on 2015-11-30

description:	updated
description:	updated

shihanzhang (shihanzhang) on 2015-11-30

description:	updated
description:	updated

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-11-30:

#10

On the other hand, anyone working on this bug should be aware that there are also scaling issues with running a DHCP agent on every compute node.

(networking-calico does this, not because of the reasons stated in the bug description above, but because in the calico scenario there is no bridging of networks between compute nodes. So we have gained a little experience of it already.)

Broadly we have seen two issues with running a reference Neutron DHCP agent on every compute node.

1. Each DHCP agent still receives MAC/IP/hostname mapping updates for every VM on the relevant networks - rather than for just the VMs on its own compute node - and so can get behind in its updating of the Dnsmasq config; and that eventually results in DHCP not being ready when the VM is booting, and the VM not getting its IP addresses. This is the problem covered in more detail at https://bugs.launchpad.net/neutron/+bug/1453350. When running a DHCP agent on every compute node, however, the problem could also be mitigated substantially if there was a way for the Neutron server to send only the mapping updates that are relevant to each DHCP agent.

2. As there are more DHCP agents overall - 1 per compute node, instead of 1 or 2 per network - we see problems with load on the Neutron servers, when the number of compute nodes exceeds 250. Specifically, with 10 Neutron servers and >250 compute nodes, we see nova-compute->Neutron requests timing out (after 30s timeout), and hence VM launch failures. We have never exactly pinned down the nature of that loading, but we suspect a combination of (i) handling DHCP agent status reporting, (ii) handling DHCP agent resync requests, and (iii) fanning out port updates to such a large number of agents. We do know for sure that it's caused in some sense by the overall number of DHCP agents, because:

- 10 servers + 240 nodes with DHCP agents + 260 other compute nodes but with no DHCP agent => no VM boot problems

- 10 servers + 500 nodes with DHCP agents => lots of VM launch failures.

In addition there is a specific resync storm issue, which can be easily seen if lots of DHCP agents are started at about the same time. When that happens, the Neutron servers are overloaded and so can't answer many of the resync requests within 30s - so all of the DHCP agents whose requests were failed ask for a resync again...

I can dig out more detail of our exact observations, if that would be helpful - just ask. In the next comment I'll write more about how we've so far tried to address these problems for networking-calico.

On the other hand, anyone working on this bug should be aware that there are also scaling issues with running a DHCP agent on every compute node.

(networking-calico does this, not because of the reasons stated in the bug description above, but because in the calico scenario there is no bridging of networks between compute nodes.  So we have gained a little experience of it already.)

Broadly we have seen two issues with running a reference Neutron DHCP agent on every compute node.

1. Each DHCP agent still receives MAC/IP/hostname mapping updates for every VM on the relevant networks - rather than for just the VMs on its own compute node - and so can get behind in its updating of the Dnsmasq config; and that eventually results in DHCP not being ready when the VM is booting, and the VM not getting its IP addresses.  This is the problem covered in more detail at https://bugs.launchpad.net/neutron/+bug/1453350.  When running a DHCP agent on every compute node, however, the problem could also be mitigated substantially if there was a way for the Neutron server to send only the mapping updates that are relevant to each DHCP agent.

2. As there are more DHCP agents overall - 1 per compute node, instead of 1 or 2 per network - we see problems with load on the Neutron servers, when the number of compute nodes exceeds 250.  Specifically, with 10 Neutron servers and >250 compute nodes, we see nova-compute->Neutron requests timing out (after 30s timeout), and hence VM launch failures.  We have never exactly pinned down the nature of that loading, but we suspect a combination of (i) handling DHCP agent status reporting, (ii) handling DHCP agent resync requests, and (iii) fanning out port updates to such a large number of agents.  We do know for sure that it's caused in some sense by the overall number of DHCP agents, because:

- 10 servers + 240 nodes with DHCP agents + 260 other compute nodes but with no DHCP agent => no VM boot problems

- 10 servers + 500 nodes with DHCP agents => lots of VM launch failures.

In addition there is a specific resync storm issue, which can be easily seen if lots of DHCP agents are started at about the same time.  When that happens, the Neutron servers are overloaded and so can't answer many of the resync requests within 30s - so all of the DHCP agents whose requests were failed ask for a resync again...

I can dig out more detail of our exact observations, if that would be helpful - just ask.  In the next comment I'll write more about how we've so far tried to address these problems for networking-calico.

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-11-30:

#11

Architecture of reference Neutron DHCP agent Edit (35.5 KiB, image/png)

In networking-calico we've tried to solve most of these problems by creating a Calico-specific replacement for the reference Neutron DHCP agent, and you can see the code for this (work in progress) at https://review.openstack.org/#/c/241310/8/networking_calico/agent/dhcp_agent.py. The Calico DHCP agent shares most of the architecture of the reference Neutron DHCP agent, so we are retaining most of the existing value there; the difference is just replacing the top-level script (neutron-dhcp-agent -> calico-dhcp-agent) and class (DhcpAgentWithStateReport -> CalicoDhcpAgent), with CalicoDhcpAgent getting its information about MAC/IP/hostname mappings from Calico's etcd database instead of by RPC from the Neutron server.

...

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-11-30:

#12

Architecture of Calico DHCP agent Edit (35.1 KiB, image/png)

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-11-30:

#13

We then have a system where there is a Calico DHCP agent on each compute node, but the Neutron server is not aware of those because they don't report any agent state; hence the server does not have to handle DHCP agent state reports, or resync requests, or to fan out port updates to many agents. Instead the Calico plugin/mech driver writes information into a distributed etcd database, and the DHCP agents get the information that they need from that database.

(We do still need occasional resyncs between the Neutron DB and the etcd DB - but that is a problem that we already handle.)

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-11-30:

#14

hi neil, thanks very much for you good suggestion! for your question:
1. this proposal will change the notification that neutron-server sends to dhcp-agent, when nova create a VM, it creates a port with compute host id, then neutron-server receive this API request, neutron-server will just send notification to compute host that the VM running, other dhcp-agent will not receive this notification, so your first problem will not exist in this proposal.\

2. this proposal will change the RPC request from dhcp-agent to neutron-server, when dhcp-agent reboot, it will just get the port info running on this compute node

3. now neutron-server has separate agent state reporting queues, so I think your second problem will not exist.

if you have other problems, feel free to comment on this bug

Revision history for this message

Akihiro Motoki (amotoki) wrote on 2015-12-01:

#15

First I would like to clarify the problem statement.

Looking at the problem description, I see two problems:
* The number of ports used by dhcp-agent
* dnsmasq reloading time depending on the number of tenant IPs

I am still not sure how much these problems affects large-scale deployments.
If we introduce a distribute dhcp-agent, it increases the complexity.
i thnk it is important to asses the balance between the complexity and
the benefit we can get.

I think Neil has more experience about this because netwokring-calico tackles the similar problem.

Revision history for this message

Akihiro Motoki (amotoki) wrote on 2015-12-01:

#16

This is some follow-up comments on the current problem statement.

Almost all of my previous comment have been answered by Neil's comments above.
I think the problem statement can be more simple.

(If the description is changed, we cannot track the discussion
context, so I would like to quote the current description here)

> The existing DHCP agent suffers from several ailments in the
> large-scale scenarios:
> 1. Centralized DHCP agent can't serve well for VMs
> * VM can't get IP at first booting time(For a new port,
> neutron-server will send notification to dhcp-agent, then
> dhcp-agent receives this notification and re-load dnsmasq, during
> dnsmasq re-loading, it can't handle new request)
> * DHCP agent need much time to reboot if it has served for a large
> VMs

Is the second item a cause of the problem?
If so, there are several solutions. One is to reduce the number of VMs
which one dnsmasq instance serves. Another is to use a different DHCP
server (we have a proposal to use ISC DHCP server.).

> 2. When not running HA it has the problem of single point of failure.

Isn't it a deployer choice whether the deployer needs HA or not.
The proposed 'distrbited' way is one of ways to improve HA,
so it is not a problem.

> 3. When running HA you get redundancy but at a price:
> * More IPs consume(every DHCP agent will consume a IP in this
> network)

If we have more agents for a network, they need more neutron ports.
It is a problem if num_dhcp_agents_per_network is bigger, but
I wonder how many num_dhcp_agents_per_network do you expect?

> * Wasted resources (like CPU and Memory, these DHCP agents are in
> ACTIVE/ACTIVE mode, they have all IP/MAC info in this network).
> * More network load (Neutron notifications, Guest traffic).

Theoretically true, but I am not sure this is a real problem.
DHCP traffic is intermittent. How much does it affect a deployment?

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2015-12-01:

#17

Personnaly I'm opposed to this idea.
Implementation similar to DVR, involving ovs-agent severily complicate the code and IMO not worth the benefit.

Revision history for this message

Akihiro Motoki (amotoki) wrote on 2015-12-01:

#18

Regarding an approach, we can discuss it in a spec review once we have a consensus on this kind demand.
In general, it sounds there is a problem and some action is required.

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-12-01:

#19

@Eugene Nikanorov, 'similar to DVR' is just means running dhcp-agent on every compute node, but it has below benefit :
1. it will simplify the dhcp-agent schedule in neutron-server, when we create a VM, neutron-server just send the RPC message by port host id
2. the dhcp-agent running on a compute node just serve the VMs on this compute node, if this dhcp-agent down, it just affect the VMs running on this node
3. in large-scale, even if there are a large numbers VM in this cloud, but for a compute node, there is a certain amount of VMs on a compute node
4. it will not complicate ovs-agent the code, maybe it just need to add some flow

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-12-01:

#20

@Eugene Nikanorov, @amotoki, for current dhcp-agent, how could we do for the scenarios : in our public cloud, we have 2000 computes and 5 controller nodes and 5 network nodes, there are 10000 VMs running on it, we have faced same problems with Neil:
1. when a tenant boot a VM, sometimes the VM can't get fixed ip
2. when a network node reboot, the dhcp-agent need much time to recover it's service

we have implement this in our public cloud, it can solve our problems

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2015-12-03:

#21

shihanzhang, i can understand scalability problems. By the way, we've successfully tested a cloud with just 3 controllers running 12k simple VMs on 200+ nodes. DHCP was not an issue.

For your case you could just run DHCP agents on some computes or even on each compute.
That would be fine. However 'distributing' DHCP ports like in DVR approach will be as complex.
I don't think accessing DHCP server locally worth such effort.

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-12-04:

#22

hi enikanorov, thanks for your comments!
You can read Neil comments on #10, if we just running dhcp-agent on each compute without changing the dhcp-agent schedule and RPC method, there are many problems.

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2015-12-05:

#23

Yes, I have read it. The failures Neil is seeing are not connected to the fact, that each DHCP agent receives full mapping.
So I'll reply his points:

1. https://bugs.launchpad.net/neutron/+bug/1453350 has nothing to do with amount of VMs/amount of DHCP agents. Distributed DHCP will not solve this problem.

2. Load on servers.
Distributed DHCP will make this problem worse. Especially coupled with DVR, which puts 2 additional agents per compute.
In fact it is much more complicated query to fetch ports on particular compute than to fetch ports belonging to a network.

The problems with server load should have been fixed by moving to separate state reports queue.

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-12-06:

#24

hi enikanorov, for the bug #1453350 you have mentioned, I want to say that you are right, Distributed DHCP will not fully solve this problem, but it can mostly reduce the race, because the VM number on one compute node is certain and will not be very large.

2. why do you think "it is much more complicated query to fetch ports on particular compute", for query port, we just need host_id and network_id to filter, do you think it is much more complicated than just using network_id to filter?

shihanzhang (shihanzhang) on 2015-12-06

description:	updated
description:	updated

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2015-12-08:

#25

I am not convinced that going fully distributed is the answer to the scaling issues being reported here, but if we really wanted to go fully distributed we shouldn't use an agent-based approach.

Let's start the discussion at the drivers meeting. I am sure this will take a few rounds.

Changed in neutron:
status:	Confirmed → Triaged

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-12-08:

#26

Armando's plan sounds good to me. I'd like to clarify a few things, though, as I'm not sure I fully explained _why_ I wrote various comments above.

- I wrote about my experience of scaling problems with a distributed DHCP in order to say 'Please be careful, don't assume that distribution will magically fix everything, because actually it can _create_ other scaling problems.' Then I wrote some detail about what I think those problems are, and how we are addressing them in a Calico-specific DHCP agent; but I did not mean to imply that exactly similar approaches would work for the Neutron DHCP agent.

- There is one scaling issue that is, currently, exactly the same for distributed and non-distributed DHCP: When there is a large number of VMs, it takes a long time for neutron.agent.linux.Dnsmasq to rewrite dnsmasq's config files and then send a HUP signal to dnsmasq. Therefore if there is a steady stream of port_updates coming to the DHCP agent (for example, if VMs are constantly being created and terminated), it is possible for a queue of unprocessed port_updates to build up, with an increasing time lag between when a port_update was initiated by the server, and when DHCP for that port is actually ready. This is the problem described by https://bugs.launchpad.net/neutron/+bug/1453350, and I believe that it _is_ related to the total number of VMs, but _not_ (as the code currently stands) to the number of DHCP agents. https://review.openstack.org/#/c/220758/ is one approach that helps with this, but there are broadly three other options as well: (1) use distributed DHCP and only send a subset of port_updates to each DHCP agent, for the ports that that DHCP agent should be responsible for; (2) use a different DHCP server that provides a more dynamic update interface, or a better Dnsmasq interface if there is one; (3) arrange somehow that Nova will not start booting the VM until DHCP is known to be ready for that VM's port(s).

- Looking again at _this_ bug, I think it needs more clarification of what the specific problems are. Firstly it is a bit of a problem that there are 2 competing overall statements, in Bug Description and in comment #15. I would be good for shihanzhang and amotoki to agree an overall statement, and then put that in Bug Description. Secondly there are parts of both current statements that I don't understand, for example:

"network node has a large number of namespaces, especially in public cloud, there are so many tenants and private networks." Why is this a problem?

"The number of ports used by dhcp-agent" What is the problem here?

Thanks; I hope this helps and doesn't just confuse further.

Armando's plan sounds good to me.  I'd like to clarify a few things, though, as I'm not sure I fully explained _why_ I wrote various comments above.

- I wrote about my experience of scaling problems with a distributed DHCP in order to say 'Please be careful, don't assume that distribution will magically fix everything, because actually it can _create_ other scaling problems.'  Then I wrote some detail about what I think those problems are, and how we are addressing them in a Calico-specific DHCP agent; but I did not mean to imply that exactly similar approaches would work for the Neutron DHCP agent.

- There is one scaling issue that is, currently, exactly the same for distributed and non-distributed DHCP:  When there is a large number of VMs, it takes a long time for neutron.agent.linux.Dnsmasq to rewrite dnsmasq's config files and then send a HUP signal to dnsmasq.  Therefore if there is a steady stream of port_updates coming to the DHCP agent (for example, if VMs are constantly being created and terminated), it is possible for a queue of unprocessed port_updates to build up, with an increasing time lag between when a port_update was initiated by the server, and when DHCP for that port is actually ready.  This is the problem described by https://bugs.launchpad.net/neutron/+bug/1453350, and I believe that it _is_ related to the total number of VMs, but _not_ (as the code currently stands) to the number of DHCP agents.  https://review.openstack.org/#/c/220758/ is one approach that helps with this, but there are broadly three other options as well: (1) use distributed DHCP and only send a subset of port_updates to each DHCP agent, for the ports that that DHCP agent should be responsible for; (2) use a different DHCP server that provides a more dynamic update interface, or a better Dnsmasq interface if there is one; (3) arrange somehow that Nova will not start booting the VM until DHCP is known to be ready for that VM's port(s).

- Looking again at _this_ bug, I think it needs more clarification of what the specific problems are.  Firstly it is a bit of a problem that there are 2 competing overall statements, in Bug Description and in comment #15.  I would be good for shihanzhang and amotoki to agree an overall statement, and then put that in Bug Description.  Secondly there are parts of both current statements that I don't understand, for example:

"network node has a large number of namespaces, especially in public cloud, there are so many tenants and private networks."  Why is this a problem?

"The number of ports used by dhcp-agent"  What is the problem here?

Thanks; I hope this helps and doesn't just confuse further.

Revision history for this message

Kevin Benton (kevinbenton) wrote on 2015-12-08:

#27

@Neil, the way you are deploying the agent is not the same as running on every compute node though with the normal setup. Your topology will have scaling issues because you want every compute node to serve the hosts it has locally.

In a normal setup, if you run on every compute node, any given number of agents will have a small subset of the networks scheduled to it.

Say you have 100 VMs per network. 100 networks, and 100 compute nodes.

With a normal deployment of dhcp agent on every node and a setting of 2 dhcp agents per network, that makes 2 networks per dhcp agent so 200 dnsmasq entries that it has to manage. This is an easy amount for each agent to handle and that's assuming a packing efficiency of 100 VMs per node.

So the calico deployment of having each node respond to all VMs it hosts is completely different scaling considerations than each agent just having responsibility for the networks scheduled to it.

Revision history for this message

Nell Jerram (neil-jerram) wrote on 2015-12-08:

#28

Thanks, @Kevin; I already realized that the calico approach was different from normal Neutron DHCP scheduling and HA, but I hadn't fully understood how, so your explanation here is much appreciated.

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2015-12-08:

#29

Some discussion happened during today's drivers meeting [1].

The neutron dhcp architecture already allows a deployer to increase the number of dhcp agents to deal with scale and HA. In some extreme cases, a dhcp agent can run on each compute, but it's clear that even though the architecture is possible, it is not to be promoted because the extra control plane load may have negative impact on the end-to-end system. Supporting an agent-based model to support distributed dhcp as advised here (one dhcp agent per compute host) can lead to code and deployment complexity that I don't think we should encourage.

On the other hand, there are distributed strategies that do not rely on agents to serve dhcp traffic at all (see dragonflow or ovn) that the author should be aware.

[1] http://eavesdrop.openstack.org/meetings/neutron_drivers/2015/neutron_drivers.2015-12-08-15.03.log.html

Armando Migliaccio (armando-migliaccio) on 2015-12-08

Changed in neutron:
status:	Triaged → Won't Fix

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-09: Change abandoned on neutron-specs (master)

#30

Change abandoned by shihanzhang (<email address hidden>) on branch: master
Review: https://review.openstack.org/205429
Reason: won't fix in neutron

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-09: Change abandoned on neutron (master)

#31

Change abandoned by shihanzhang (<email address hidden>) on branch: master
Review: https://review.openstack.org/184423
Reason: won't fix in neutron

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.