Inconsistent DHCP port creation

Bug #1219795 reported by Brian D. Burns on 2013-09-02
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
neutron
Medium
mandar

Bug Description

When creating a subnet, the DHCP port is not being created until another
port is created. That is, unless ports have been created on the network.
And a call to create a subnet will return before the DHCP port is established
(if that occurs), which produces a race condition if creating a port immediately after creating the subnet.

This is easier is explain by example :)

The following can be run from a shell script to reproduce the issue:

neutron net-create test_net
neutron subnet-create test_net 198.18.0.0/24
neutron subnet-create test_net 2001:db8:abc:123::/64 --ip-version 6
sleep 5 # just to be sure
neutron port-list
# dhcp port has not been created yet
neutron port-create test_net --name test_port
# the dhcp port is created after the test_port.
# apparently before the port-create command returns,
# as no sleep is needed before the port-list.
neutron port-list
# test_port assigned 198.18.0.2 and 2001:db8:abc:123::2
# dhcp port assigned 198.18.0.3 and 2001:db8:abc:123::3
neutron port-delete test_port
neutron net-delete test_net

neutron net-create test_net
# this add/remove port before subnet creation causes the dhcp port
# to be setup when the subnets are created.
neutron port-create test_net --name test_port
neutron port-delete test_port
neutron subnet-create test_net 198.18.0.0/24
neutron subnet-create test_net 2001:db8:abc:123::/64 --ip-version 6
sleep 5 # subnet-create returns before the dhcp port is setup
neutron port-list
# dhcp port assigned 198.18.0.2 and 2001:db8:abc:123::2
neutron port-create test_net --name test_port
neutron port-list
# test_port assigned 198.18.0.3 and 2001:db8:abc:123::3
neutron port-delete test_port
neutron net-delete test_net

# this creates a condition where if you create the test_port
# after the subnets without sleeping, the test_port gets the ::2 ipv6
# address before the dhcp port.
neutron net-create test_net
neutron port-create test_net --name test_port
neutron port-delete test_port
neutron subnet-create test_net 198.18.0.0/24
neutron subnet-create test_net 2001:db8:abc:123::/64 --ip-version 6
neutron port-create test_net --name test_port
sleep 5 # in order to see the dhcp port
neutron port-list
# test_port assigned 198.18.0.3 and 2001:db8:abc:123::2
# dhcp port assigned 198.18.0.2 and 2001:db8:abc:123::3
neutron port-delete test_port
neutron net-delete test_net

It seems the solution would be to make sure the DHCP port check is performed
after a subnet is created/updated, and not return until that is completed.
It doesn't seem like this would be necessary when creating/updating a port.
(unless it's the DHCP port that's being modified?)

Brian D. Burns (iosctr) on 2013-09-02
description: updated
tags: added: l3-ipam-dhcp
Changed in neutron:
assignee: nobody → Sean McCully (sean-mccully)
Changed in neutron:
status: New → Confirmed

Fix proposed to branch: master
Review: https://review.openstack.org/45082

Changed in neutron:
status: Confirmed → In Progress
Sean McCully (sean-mccully) wrote :

Fix Proposed, if no ports exists then DHCP Agent will create a new port attached to dhcp device. If a port exists not attached to device, then attach port to DHCP device.

Brian D. Burns (iosctr) wrote :

I tried the patch, but I'm still seeing the same behavior.
Additionally, now if I create 2 ports, the first port is taken for DHCP. If I'm creating a port via the API, I intend to use it. I think it was better when a new port was created.

Sean McCully (sean-mccully) wrote :

https://review.openstack.org/#/c/23252/2

After the first port is created for a subnet with dhcp enabled, a dhcp port is either created or consumes an available port.

It way are you still seeing the same behaviour?
If you create a port via the API, then you will use it? I don't see how this is not the case?

Brian D. Burns (iosctr) wrote :

The original race condition still exists. From above:

neutron net-create test_net
neutron port-create test_net --name test_port
neutron port-delete test_port
neutron subnet-create test_net 198.18.0.0/24
neutron subnet-create test_net 2001:db8:abc:123::/64 --ip-version 6
neutron port-create test_net --name test_port
sleep 5 # in order to see the dhcp port
neutron port-list
# test_port assigned 198.18.0.3 and 2001:db8:abc:123::2
# dhcp port assigned 198.18.0.2 and 2001:db8:abc:123::3

Also, since the new patch attempts to consume an unused port for dhcp,
it's consuming the port I'm creating:

neutron net-create test_net
neutron subnet-create test_net 198.18.0.0/24
neutron subnet-create test_net 2001:db8:abc:123::/64 --ip-version 6
neutron port-create test_net --name test_port
neutron port-show test_port

Also, note that while the device_id was set when consuming this port,
the device_owner was not (should be 'network:dhcp'). As a result,
the existence of this port will prevent deleting the network.

The only example that works correctly is the 2nd initial example above:

neutron net-create test_net
neutron port-create test_net --name test_port_temp
neutron port-delete test_port_temp
neutron subnet-create test_net 198.18.0.0/24
neutron subnet-create test_net 2001:db8:abc:123::/64 --ip-version 6
sleep 5
neutron port-list
# dhcp port assigned 198.18.0.2 and 2001:db8:abc:123::2
neutron port-create test_net --name test_port
neutron port-list
# test_port assigned 198.18.0.3 and 2001:db8:abc:123::3

There are 2 reasons why this works.

1. Adding and removing a port from the network before the subnets are created
creates a condition where the dhcp port is created when the subnet is created.
Without that add/remove port, this does not occur.

2. You must sleep after creating the subnet to give the dhcp port time to be
created before creating your port to avoid the race condition.

Sean McCully (sean-mccully) wrote :

1. I provided a link to the changes where this was implemented.
2. The dhcp port creation is determined by the dhcp agent, if not explicitly created. This happens with async communication which can appeared delayed.
3. Some of the behaviour you described is set to work that way, from the initial bug report.
4. The patch sets to fix where an additional port is created on top of the first port created for a subnet/network.

Sean McCully (sean-mccully) wrote :

This is the change being proposed based on the current workflow, much of this is works as expected. The behaviour here that becomes problematic is when an additional port is created, that may or may not be intended for use other then for dhcp.

Maybe that clarifies a little better.

Brian D. Burns (iosctr) wrote :

Why do these two operations have different results?

- create network
- create subnet
No DHCP port will have been be created at this point.

- create network
- add a port, then delete it
- create subnet
A DHCP port will have been created.

Both of these should cause the DHCP port to be allocated.

If the DHCP port check is performed whenever subnets are created/updated,
then I don't understand why these operations would be needed when creating a port.

Sean McCully (sean-mccully) wrote :

----- Reference Change ---
https://review.openstack.org/#/c/23252/2

In regards to why the DHCP port is not created initially was part of the change referenced above. This patch changes that an unnecessary port be created if not needed for a DHCP port to be created during initial network creation.

Brian D. Burns (iosctr) wrote :

So you're saying that after I create the network, I must create/delete a port in order to "trigger" initial network creation. Once this is done, then a call to create a subnet will create the dhcp port if needed. Otherwise, the dhcp port will not be created until the first port is created.

As for the referenced commit, I think I would have rather seen a "defer" option that could be set when creating a network if desired.

As for this patch, which will allocate an existing unattached port, the problem I have is that it may use the port you're creating. I'd rather see it create the dhcp port, then create the port requested, so it's not considered when looking for unused ports. I realize the dhcp allocation is async, which makes this difficult. If you create the requested port first, is there a way to exclude it from the dhcp port selection? Problem is, if you create 2 ports, even the second may get created before the dhcp port is allocated.

Also, as I mentioned above, when the dhcp port is setup on the create port call, it's setting the device_id on the port, but not the device_owner. This works if the dhcp port is setup on the create subnet call.

Sean McCully (sean-mccully) wrote :

Yes, and I see your point about the deferred option and don't think thats a bad idea and may be at least entertained if submitted as a feature-request.

So the problem I see with creating an additional port, is it becomes a requirement that you have to create a port that may or may not get used. Before the dhcp becomes active, this method is preferred that way if you only need the dhcp port then your not creating ports that are not needed.
 If you are explicit in creating the first port with a the device it is going to be used by, then you will see an additional port created for DHCP.

Will add the device owner, also if initial port has device owner that is not dhcp then additional port should be created for dhcp then as well.

Brian D. Burns (iosctr) wrote :

Wouldn't you just skip the dhcp port creation if the subnet is not dhcp-enabled?

Also, another idea would be to reserve a port for dhcp. Like next-to-last in subnet.

Brian D. Burns (iosctr) wrote :

What's with Patch #6? Now we're back to no device_owner...

This works:

- create network
- create/delete port # trigger network creation
- create subnet
- create port

But this does not:

- create network
- create subnet
- create port

The port being created is attached to device_id, but no device_owner is set.

Sean McCully (sean-mccully) wrote :

Hi Brian, it might make more sense to comment on the changes from review.openstack.com.

So gate-tempest-devstack-vm-neutron is failing when I set the the device_owner. I have to look into tempest and find out why, I'll let you know.

Brian D. Burns (iosctr) wrote :

OK. I'd been keeping it here, since I started this here.
I'm watching the code review, so I'll comment there if it's change specific.
Wish I was more familiar with the source (and python), I could probably form better questions :)

Changed in neutron:
importance: Undecided → Critical
importance: Critical → Medium
Changed in neutron:
assignee: Sean McCully (sean-mccully) → nobody
mandar (mandar-sherikar) on 2014-06-04
Changed in neutron:
assignee: nobody → mandar (mandar-sherikar)
Don Bowman (donbowman) wrote :

Attached is a heat stack which will reproduce this all the time.

$ heat stack-create -f 1219795.yaml -P num=30 d1

and then look in instances. You will see some instances have been assigned an IP, and some not.
What has happened is that neutron has allocated an IP to dnsmasq, and then tried to allocate that IP to an instance.

e.g.: the dhcp range is 172.16.0.10-172.16.0.254.

 172.16.0.11 -> i1
 172.16.0.12 -> i2
 172.16.0.13 -> dhcp/dnsmasq
 172.16.0.13 -> i3. FAIL

I think this can be closed, as it was fixed in:

https://review.openstack.org/#/c/163672/

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers