neutron

dhcp scheduling is racey with first port creation

Bug #1431105 reported by Kevin Benton on 2015-03-12

This bug affects 8 people

Affects		Status	Importance	Assigned to	Milestone
	neutron	Fix Released	Medium	Kevin Benton	neutron 2015.1.0 "kilo"

Bug Description

By deferring scheduling of networks to dhcp agents until the first VM port is created, there is a possibility to create the VM port and have it issuing DHCP requests before the DHCP agent gets a chance to create a port on the network, launch its dnsmasq process, etc.

By scheduling the agent at subnet creation time instead of the first port creation, we can buy extra time to help avoid this situation (DHCP scheduling notification sent before VM port creation rather than after). Additionally, scripts can watch the DHCP port state after creating the subnet to know when it should be safer to proceed.

Ultimately we will need a way to know that the DHCP agent is truly ready (dnsmasq process running with updated config), but this will require new state to be stored and passed from the agent which will probably require a spec so we can address 'readiness' in a broader fashion.

See original description

Tags:

Kevin Benton (kevinbenton) on 2015-03-12

Changed in neutron:
assignee:	nobody → Kevin Benton (kevinbenton)

OpenStack Infra (hudson-openstack) on 2015-03-12

Changed in neutron:
status:	New → In Progress

Revision history for this message

YAMAMOTO Takashi (yamamoto) wrote on 2015-03-12:

have you seen the race in the field?
or just code inspection?

Revision history for this message

Kevin Benton (kevinbenton) wrote on 2015-03-12:

race with a heavily loaded dhcp agent doing scale testing.

The instance usually eventually retries for DHCP but it could take several minutes.

Revision history for this message

Akihiro Motoki (amotoki) wrote on 2015-03-12:

Sounds reasonable. I believe the current behavior is to allow operators to decide agent assignment, but I think it is not an usual case.
Even if a network is scheduled when a subnet is created, operators can move dhcp agents as they want, so I don't see any problems on changing the timing of dhcp agent scheduling.

Revision history for this message

Kevin Benton (kevinbenton) wrote on 2015-03-12:

If operators want to manually schedule, they can also do it after the network is created but before the subnet is created.

Armando Migliaccio (armando-migliaccio) on 2015-03-12

Changed in neutron:
importance:	Undecided → Low

Armando Migliaccio (armando-migliaccio) on 2015-03-13

Changed in neutron:
importance:	Low → Medium

Kevin Benton (kevinbenton) on 2015-03-17

description:

updated

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-17: Fix merged to neutron (master)

Reviewed: https://review.openstack.org/163672
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=05f234481474aa05f59c4af459b4343d21397afc
Submitter: Jenkins
Branch: master

commit 05f234481474aa05f59c4af459b4343d21397afc
Author: Kevin Benton <email address hidden>
Date: Wed Mar 11 18:32:52 2015 -0700

Schedule net to a DHCP agt on subnet create

    Change the DHCP notifier behavior to schedule a network
    to a DHCP agent when a subnet is created rather than
    waiting for the first port to be created.

    This will reduce the possibility to get a VM port created
    and have it send a DHCP request before the DHCP agent is
    ready. Before, the network would be scheduled to an agent
    as a result of the API call to create the VM port, so the
    DHCP port wouldn't be created until after the VM port.
    After this patch, the network will have been scheduled to
    a DHCP agent before the first VM port is created.

    There is still a possibility that the DHCP agent could be
    responding so slowly that it doesn't create its port and
    activate the dnsmasq instance before the VM sends traffic.
    A proper fix will ensure that the dnsmasq instance is
    truly ready to serve requests for a new port will require
    significantly more code for barriers (either on the subnet
    creation, port creation, or the nova boot process) are too
    complex to add this late in the cycle.

This patch also eliminates the logic in the n1kv plugin that
was already doing the same thing.

Closes-Bug: #1431105
Change-Id: I1c1caed0fdda6b801375a07f9252a9127058a07e

Changed in neutron:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2015-03-19

Changed in neutron:
milestone:	none → kilo-3
status:	Fix Committed → Fix Released

Eugene Nikanorov (enikanorov) on 2015-03-20

tags:

added: l3-ipam-dhcp

Thierry Carrez (ttx) on 2015-04-30

Changed in neutron:
milestone:	kilo-3 → 2015.1.0

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1219795

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.