neutron

DHCP port rescheduling causes ports to grow, internal DNS to be broken

Bug #1864711 reported by Arjun Baindur on 2020-02-25

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	neutron	New	Undecided	Unassigned

Bug Description

Suppose we have DHCP servers per network 2. And we have a # of DHCP agents > 2.

During a time of network instability, RabbitMQ issues, or even a DHCP host temporarily going down the DHCP port will get rescheduled.

Except it looks like it's not so much as getting rescheduled, but a brand new port with IP/MAC is created on a new host. The old port is only updated and marked as reserved, not deleted.

This causes two issues:

1. The # of DHCP ports grows. Even when the old host starts heartbeating again, it's port is not deleted. For example we had an environment with 3 DHCP servers per network, and a dozen or so DHCP hosts. It was observed that for some networks, there were 10+ DHCP ports allocated.

2. DNS is broken temporarily for VMs that still point to the old IPs. /etc/resolv.conf can only store 3 servers, and either way, Linux's 5 second default DNS timeout means the first server going down or second server going down causes a 5+ or 10+ delay, which breaks many other apps.

I'm not sure if this is a bug, or by design. For example if the same IP/mac were re-used, we could have a conflict on the data plane. Neutron-server has no idea if DHCP/DNS services are actually down - it just knows it's not receiving heartbeats over the control plane. Is that why a new port is allocated? Prefer to mitigate the risk of conflict?

As for why the old ports aren't deleted or scaled down when connectivity is restored, is this by design too?

Tags:

Revision history for this message

YAMAMOTO Takashi (yamamoto) wrote on 2020-02-27:

to my recollection it's how it works. i don't know the rationale of it though.

Revision history for this message

Bence Romsics (bence-romsics) wrote on 2020-03-02:

Point #2 may be a duplicate of this (the attempted fix was never merged, but the discussion may be relevant):

https://bugs.launchpad.net/neutron/+bug/1739219

Revision history for this message

Bence Romsics (bence-romsics) wrote on 2020-03-02:

For point #1 the rationale seems to be this:

https://bugs.launchpad.net/neutron/+bug/1288923
https://review.opendev.org/79018

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.