[DHCP] Race condition during port processing events in DHCP agent
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
Rodolfo Alonso |
Bug Description
In the DHCP agent, network, subnet and port events are stored in a "ResourceProces
Since [1], the Neutron server sends a high priority cast to one random DHCP agent to speed up the port creation process. That means this event, regardless of the timestamp, will be processed first. That improves the VM creation process.
This bug exposes a problem detected in case that (1) only one single DHCP agent is attending to those events and (2) the queue processing is not fast enough to handle the events when arrive.
For example (seen in a customer deployment with thousands of ports assigned to the same DHCP agent):
- A port is created and processed. The port ID is stored in the DHCP agent cache.
- The port is deleted and the event arrives to the DHCP agent (*but* the event is not processed).
- Another port with the same IP address is created. The Neutron server will allow this process because the IP address is not used anymore. This port creation event arrives to the DHCP agent.
- The event queue is processed. At this point, this queue has both events: one deleting a port with IP address x.x.x.x and another event creating a port with the same IP address. Because the port creation event is processed before, a resync process occurs [2].
There is a very easy way to reproduce this error with a small trick: to add a time.sleep(10) at the beginning of [3]. Then we need to finish this processing thread sending a trivial operation, e.g. creating another port. Then process loop [4] will spawn a new thread that will stop in the sleep command; at this point we need to send both events, the deletion and the creation:
- Add the time.sleep(10) in the resource processing queue and restart the DHCP agent.
- Add a port with an IP. The DHCP agent will process this event correctly.
$ openstack port create --fixed-ip ip-address=
- This command should be done at the same time:
a) Create another port to consume the waiting thread.
b) Wait a small amount of time to let the thread finish and allow the process_loop to start another one. This new thread will stop in the sleep command.
c) Delete the port.
d) Create a port with the same IP --> the thread will have both events in the queue. As commented, because the queue processes first the creation event, that will trigger the unwanted resync:
$ openstack port create --network private port_trivial; sleep 3; openstack port delete $port; openstack port create --fixed-ip ip-address=
Log: http://
[1]https:/
[2]https:/
[3]https:/
[4]https:/
Changed in neutron: | |
importance: | Undecided → Medium |
assignee: | nobody → Rodolfo Alonso (rodolfo-alonso-hernandez) |
description: | updated |
description: | updated |
Changed in neutron: | |
status: | New → Fix Released |
This issue was fixed in the openstack/neutron 15.3.3 release.