Fixed IP quota not checked at API level

Bug #1161188 reported by Sam Morrison
This bug report is a duplicate of:  Bug #1161661: Rescheduling loses reasons. Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
High
Alexander Pugachev

Bug Description

When I create an instance when I'm already using all my allocated fixed IPs the instance is created fine but it errors in nova-network that no fixed ips are available.

This error message doesn't get to the user. I think this should return an error message at the API level and the instance shouldn't be able to be created in the first place?

Revision history for this message
Michael Still (mikal) wrote :

This is a fair point.

The problem is that the way quotas work in grizzly is that you reserve a fixed ip "opportunity", then use it, and optionally roll back if the something went wrong. So, the quota is done in the code that allocates and uses the fixed ip. I note that this is the same as how it is done with floating ips.

I do think adding a pre-check that determines if the API call has any possible chance of success before bothering to bring up an instance is a good idea though.

Changed in nova:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Joshua Harlow (harlowja) wrote :

Michael, does this mean the reservation code isn't right if this bug can happen? How can a fixedip/floatingip be reserved as u stated ('reserve ip opportunity') but later that reservation is not valid? That would seem to be a problem with said reservation code, no? If we can't reserve quotas and guarantee said reservation works (or is retained until reservation X times out from not being completed) then that makes me wonder about quotas really meaning much....

Revision history for this message
Michael Still (mikal) wrote :

Josh -- the problem is that the compute API code doesn't handle the reservation -- the networking API does. The descriptive exception isn't being bubbled back. So, the quota code is correct, just poorly placed to be informative to the user.

Fixed IPs are allocated in nova/network/manager.py allocate_fixed_ip()
Which is called by nova/network/manager.py allocate_fixed_ips()
Which is called by nova/network/manager.py allocate_for_instance() via the network RPC API
Which is called by nova/compute/manager.py _allocate_network()
Which is called by nova/compute/manager.py _run_instance()
Which is called by nova/compute/manager.py run_instance()

Which is the call that the user made (via the scheduler).

There are two possible fixes here that I can see -- either do a quick and dirty check in the compute manager to see if its even vaguely possible for request to succeed (i.e. the quota isn't currently all allocated), or to correctly bubble the exception back. There are limits on how much non-security code is likely to land in folsom, which makes this complicated as well.

Revision history for this message
Michael Still (mikal) wrote :

I think I misspoke a little. It looks on further investigation that the scheduler is deliberately eating the exception. This is part of its retry logic.

Changed in nova:
assignee: nobody → Alexander Pugachev (apugachev)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/29555

Changed in nova:
status: Triaged → In Progress
Revision history for this message
Joe Gordon (jogo) wrote :

This sounds like its a bug in the retry logic not the fixed IP logic. Checking if any fixed IPs are available earlier on is a nice idea but it can lead to race conditions. A fixed IP is made available moments later, if there are two concurrent tasks. I would rather fix the retry logic (there is another bug for this https://bugs.launchpad.net/nova/+bug/1161661).

Revision history for this message
Alexander Pugachev (apugachev) wrote :

The fix is proposed for situation when nothing nasty happens, just a routine API call fails somewhere in nova-network because instance reservation check is not aware of fixed IP quota.

It does not make situation worse with race condition: currently we can accept API call, case it further and at the same moment something fast can eat last fixed IP.

Please consider unsetting "-1".

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.