dhcp scheduling is racey with first port creation

Bug #1431105 reported by Kevin Benton
24
This bug affects 8 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Kevin Benton

Bug Description

By deferring scheduling of networks to dhcp agents until the first VM port is created, there is a possibility to create the VM port and have it issuing DHCP requests before the DHCP agent gets a chance to create a port on the network, launch its dnsmasq process, etc.

By scheduling the agent at subnet creation time instead of the first port creation, we can buy extra time to help avoid this situation (DHCP scheduling notification sent before VM port creation rather than after). Additionally, scripts can watch the DHCP port state after creating the subnet to know when it should be safer to proceed.

Ultimately we will need a way to know that the DHCP agent is truly ready (dnsmasq process running with updated config), but this will require new state to be stored and passed from the agent which will probably require a spec so we can address 'readiness' in a broader fashion.

Tags: l3-ipam-dhcp
Changed in neutron:
assignee: nobody → Kevin Benton (kevinbenton)
Changed in neutron:
status: New → In Progress
Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

have you seen the race in the field?
or just code inspection?

Revision history for this message
Kevin Benton (kevinbenton) wrote :

race with a heavily loaded dhcp agent doing scale testing.

The instance usually eventually retries for DHCP but it could take several minutes.

Revision history for this message
Akihiro Motoki (amotoki) wrote :

Sounds reasonable. I believe the current behavior is to allow operators to decide agent assignment, but I think it is not an usual case.
Even if a network is scheduled when a subnet is created, operators can move dhcp agents as they want, so I don't see any problems on changing the timing of dhcp agent scheduling.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

If operators want to manually schedule, they can also do it after the network is created but before the subnet is created.

Changed in neutron:
importance: Undecided → Low
Changed in neutron:
importance: Low → Medium
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/163672
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=05f234481474aa05f59c4af459b4343d21397afc
Submitter: Jenkins
Branch: master

commit 05f234481474aa05f59c4af459b4343d21397afc
Author: Kevin Benton <email address hidden>
Date: Wed Mar 11 18:32:52 2015 -0700

    Schedule net to a DHCP agt on subnet create

    Change the DHCP notifier behavior to schedule a network
    to a DHCP agent when a subnet is created rather than
    waiting for the first port to be created.

    This will reduce the possibility to get a VM port created
    and have it send a DHCP request before the DHCP agent is
    ready. Before, the network would be scheduled to an agent
    as a result of the API call to create the VM port, so the
    DHCP port wouldn't be created until after the VM port.
    After this patch, the network will have been scheduled to
    a DHCP agent before the first VM port is created.

    There is still a possibility that the DHCP agent could be
    responding so slowly that it doesn't create its port and
    activate the dnsmasq instance before the VM sends traffic.
    A proper fix will ensure that the dnsmasq instance is
    truly ready to serve requests for a new port will require
    significantly more code for barriers (either on the subnet
    creation, port creation, or the nova boot process) are too
    complex to add this late in the cycle.

    This patch also eliminates the logic in the n1kv plugin that
    was already doing the same thing.

    Closes-Bug: #1431105
    Change-Id: I1c1caed0fdda6b801375a07f9252a9127058a07e

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-3
status: Fix Committed → Fix Released
tags: added: l3-ipam-dhcp
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.