Failover of a network from one dhcp agent to another breaks DNS

Bug #1288923 reported by Ed Bak
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Ed Bak
Icehouse
New
Undecided
Unassigned

Bug Description

Failing over a network from one dhcp agent to another results in a new IP address for the dhcp port. This breaks dns for all vms on that network. This can be reproduced by simply doing a "neutron dhcp-agent-network-remove" and then a "neutron dhcp-agent-network-add" and observing that the dhcp port ip address will change.

Ed Bak (ed-bak2)
Changed in neutron:
assignee: nobody → Ed Bak (ed-bak2)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/79018

Changed in neutron:
status: New → In Progress
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

There was never a guarantee that moving the DHCP server from one network host to another would result in getting the same IP address. For example, if an agent's host failed and the agent went away without warning, it would not have the opportunity to release its DHCP port and a new agent -- presumably started by some HA setup -- would not be able to pick it up.

With that said, it used to be more likely that an intentional move of the DHCP server to another host would preserve the IP address . That was when IP addresses were returned to to the availability pool immediately rather than waiting for the pool to be exhausted. It was this change that made this scenario more likely to happen.

Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

It occurs to me that some might argue that it doesn't matter what IP address the DHCP server has or whether it changes at some point in time. This might have been a motivation for not reserving DHCP IP address(es) like we do for the default router IP address.

There is a real problem with that argument. The problem is that it ignores the fact that our DHCP server doubles as the DNS server for the local subnet. It is critical that the DNS server's IP address be consistent and predictable because running instances lose their DNS server if it changes. An oversight in the current design.

Kyle Mestery (mestery)
Changed in neutron:
importance: Undecided → Medium
milestone: none → juno-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/79018
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d5c0a37999f9e3a611a322baacabebc06b13283b
Submitter: Jenkins
Branch: master

commit d5c0a37999f9e3a611a322baacabebc06b13283b
Author: Ed Bak <email address hidden>
Date: Fri Mar 7 17:16:15 2014 +0000

    Provide way to reserve dhcp port during failovers

    This change provides a way to save the dhcp port when failing
    over a network from one dhcp agent to another. When a
    dhcp-agent-network-remove is issued, the dhcp port device_id is
    marked as reserved which causes it to not be deleted. When a
    subsequent dhcp-agent-network-add is issued, the reserved port
    is used and the device_id is corrected. This is desirable
    in order to maintain the dhcp port ip address so that dns doesn't
    get impacted. Unit test added.

    Change-Id: I531d7ffab074b01adfe186d2c3df43ca978359cd
    Closes-Bug: #1288923

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Isn't the merged solution incomplete?

It only works for one concurrent remove/add if I'm not wrong. If you do several removes, then several adds, the ports will get mixed.

I'd probably add the original UUID to the reserved port ID if that fits in the database schema.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/125919

Revision history for this message
Ed Bak (ed-bak2) wrote :

I believe that this is a complete solution.

The dhcp port is reserved by simply changing the device_id to "reserved_dhcp_port". The relationship to the network and subnet is maintained. Doing this keeps the allocated ip from returning to the pool and avoids allocating a new ip for a new dhcp port. You can do a dhcp-agent-network-remove followed by a dhcp-agent-network-add or you could do several removes followed by several adds. Doing several removes and then several adds would imply that a network is hosted by multiple dhcp agents. This could certainly be a valid situation. You would then end up with a pool of reserved dhcp ports after several removes. When you do the adds, the ports can get added back in any order and you will end up with your network hosted on new dhcp agents but with ports which have maintained their ip addresses. I don't think there is a problem. If I'm misunderstanding the use case, let me know.

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Hi Ed, I believe you are talking about multiple dhcp agents for the same network,
I'm talking about different dhcp agents, for different networks,
 imagine you have networks

A,B,C,D with agents a,b,c,d

then you remove a from a,b,c,d from their serving hosts (H1)

then you add them to serve on a different host (H2), but in different order b,c,a,d ... or a,b,c,d again...

Wouldn't you end with mixed ports for different networks?, or the net/subnet is used here when you lookup
for the port at re-association time?,

re-reading the code in a minute.

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

You use, for port in network.ports: .... to look up for the reserved ports, so at worst you could just mix
up ports within a network when having HA, which is perfectly fine.

Ed, thank you for taking your time to reply and explain. And sorry for the wrong evaluation.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/icehouse)

Reviewed: https://review.openstack.org/125919
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1386415271bd7e478a12e132f67a42aed2f41dae
Submitter: Jenkins
Branch: stable/icehouse

commit 1386415271bd7e478a12e132f67a42aed2f41dae
Author: Ed Bak <email address hidden>
Date: Fri Mar 7 17:16:15 2014 +0000

    Provide way to reserve dhcp port during failovers

    This change provides a way to save the dhcp port when failing
    over a network from one dhcp agent to another. When a
    dhcp-agent-network-remove is issued, the dhcp port device_id is
    marked as reserved which causes it to not be deleted. When a
    subsequent dhcp-agent-network-add is issued, the reserved port
    is used and the device_id is corrected. This is desirable
    in order to maintain the dhcp port ip address so that dns doesn't
    get impacted. Unit test added.

    Closes-Bug: #1288923
    (cherry picked from d5c0a37999f9e3a611a322baacabebc06b13283b)
    Conflicts:
     neutron/common/utils.py
     neutron/tests/unit/test_db_plugin.py

    Change-Id: I531d7ffab074b01adfe186d2c3df43ca978359cd

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.