ipavailabilityranges race condition when allocating from same range on multiple neutron-servers

Bug #1214115 reported by Michael H Wilson
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Hua Zhang

Bug Description

Lets say that we start with an allocation_pool_id that looks like this:

+--------------------------------------+----------------+----------------+
| allocation_pool_id | first_ip | last_ip |
+--------------------------------------+----------------+----------------+
| 0f175416-a378-463b-9a84-18528f396e6f | 192.168.1.10 | 192.168.1.254 |
+--------------------------------------+----------------+----------------+

We then allocate a few of those IPs, let's say 10-20, our pool now looks like this:

+--------------------------------------+----------------+----------------+
| allocation_pool_id | first_ip | last_ip |
+--------------------------------------+----------------+----------------+
| 0f175416-a378-463b-9a84-18528f396e6f | 192.168.1.20 | 192.168.1.254 |
+--------------------------------------+----------------+----------------+

Now, we try and free a couple of those IPs, let's say 16, 17 and 18 now we have this in the db:

+--------------------------------------+----------------+----------------+
| allocation_pool_id | first_ip | last_ip |
+--------------------------------------+----------------+----------------+
| 0f175416-a378-463b-9a84-18528f396e6f | 192.168.1.16 | 192.168.1.18 |
| 0f175416-a378-463b-9a84-18528f396e6f | 192.168.1.20 | 192.168.1.254 |
+--------------------------------------+----------------+----------------+

The race condition I'm about to describe will probably hamper the above operation, but that's okay. Let's just pretend for the sake of illustration. Now let's suppose that I have 2 neutron-servers running, one gets a request to allocate 192.168.1.16 and the other gets a request to free 192.168.1.15. Both servers are going to generate UPDATEs to the DB, they will look something like this:

SERVER 1: UPDATE ipavailabilityranges SET first_ip = '192.168.1.17' WHERE first_ip = '192.168.1.16'
SERVER 2: UPDATE ipavailabilityranges SET first_ip = '192.168.1.15' WHERE first_ip = '192.168.1.16'

Depending on order, how busy your neutron-servers are and how busy your database is one of the above statements is going to fail. That's okay, it reports the failure up through the API, the issue we see is that retries also tend to fail since usually only one operation affecting a single row in the table ever succeeds. If you have a very active neutron API and lots of free and allocate requests you end up getting into a very unusable state where active periods for the API are full of errors and get bogged down and fail until activity stops.

This is one example of the race condition. There are obviously other ways to trigger it if you sit down and look at the applicable piece of code. Some kind of concurrency management is probably in order, not sure what the best way to solve this would be however...

Changed in neutron:
importance: Undecided → High
status: New → Confirmed
Hua Zhang (zhhuabj)
Changed in neutron:
assignee: nobody → Hua Zhang (zhhuabj)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/43275

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Michael H Wilson (geekinutah) wrote :

https://review.openstack.org/#/c/58017/ effectively fixes this issue. Marking as fix commited.

Changed in neutron:
status: In Progress → Fix Committed
Changed in neutron:
milestone: none → icehouse-3
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
sridhar basam (sri-7) wrote :

Can this fix be backported to havana?

Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-3 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.