dhcp_release not called causing new VMs to fail DHCP

Bug #1250644 reported by Carl Baldwin
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Carl Baldwin
Havana
Fix Released
Undecided
Unassigned

Bug Description

I've found some situations where dhcp_release is not called when a port is deleted. When this happens, dnsmasq refused to give out the IP to a new port when the IP address gets recycled. The result is that the VM with the new port cannot get its IP address on boot.

There are a few conceivable scenarios that lead to this. I will attempt to describe some in the comments.

Changed in neutron:
assignee: nobody → Carl Baldwin (carl-baldwin)
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

One situation that leads to this:

2013-10-18 19:52:11.987 port created
2013-10-18 19:52:22.747 port added to the network cache
2013-10-18 19:52:49.580 port is in the cache at this time
2013-10-18 19:57:18.999 Sync state starts, dhcp_agent network cache is being rebuilt
2013-10-18 19:57:19.005 port deleted from db by the api server
dhcp-agent network cache is completely rebuilt and doesn't contain the deleted port
2013-10-18 19:57:33.944 port_delete_end rpc received by dhcp-agent
2013-10-18 19:58:12.295 port_delete_end function is called
2013-10-18 19:58:12.295 Port not found in the cache so dhcp_release is not called
2013-10-18 19:58:12.314 cache dumped but port is not there

In summary, the port delete is sent by RCP message to the DHCP agent. Before the agent acts on this message, the periodic sync state operation starts and fetches the current state. This current state doesn't include the port. The sync state operation does not figure out that dhcp_release should be called.

By the time the RPC message is acted on, the port is no longer in the local cache and so dhcp_release cannot be called because we don't know enough from the cache to call it.

Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

If the agent is down while a port is deleted this will happen.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/56263

Changed in neutron:
status: New → In Progress
Revision history for this message
Isaku Yamahata (yamahata) wrote :

This case was hit when restarting dhcp_agent, right?
If so, it shouldn't be assumed that updating hosts file and executing dhcp_release are atomically done.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/56263
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=965542bfac90194bd032e5e6aeb6a507dcb11088
Submitter: Jenkins
Branch: master

commit 965542bfac90194bd032e5e6aeb6a507dcb11088
Author: Carl Baldwin <email address hidden>
Date: Tue Nov 12 22:52:47 2013 +0000

    Use information from the dnsmasq hosts file to call dhcp_release

    Certain situations can cause the DHCP agent's local cache to get out
    of sync with the leases held internally by dnsmasq. This method of
    detecting when to call dhcp_release is idempotent and not dependent on
    the cache. It is more robust.

    Change-Id: I4eafd9cfb94a77a2f0229f89de5483dad23725cf
    Closes-Bug: #1250644

Changed in neutron:
status: In Progress → Fix Committed
Changed in neutron:
importance: Undecided → Medium
milestone: none → icehouse-3
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-3 → 2014.1
Revision history for this message
Byron McCollum (byron-mccollum) wrote :

Please back port to Havana. This is causing considerable issues in our environment.

tags: added: havana-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/114328

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/havana)

Reviewed: https://review.openstack.org/114328
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cda2c446aad40254b7c527298db843b473e4907b
Submitter: Jenkins
Branch: stable/havana

commit cda2c446aad40254b7c527298db843b473e4907b
Author: Carl Baldwin <email address hidden>
Date: Tue Nov 12 22:52:47 2013 +0000

    Use information from the dnsmasq hosts file to call dhcp_release

    Certain situations can cause the DHCP agent's local cache to get out
    of sync with the leases held internally by dnsmasq. This method of
    detecting when to call dhcp_release is idempotent and not dependent on
    the cache. It is more robust.

    Change-Id: I4eafd9cfb94a77a2f0229f89de5483dad23725cf
    Closes-Bug: #1250644
    (cherry picked from commit 965542bfac90194bd032e5e6aeb6a507dcb11088)

tags: added: in-stable-havana
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.