Nova fails on Quantum port quota too late

Bug #1172808 reported by Phil Day
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Opinion
Wishlist
Unassigned

Bug Description

Currently Nova will only hit any port quota limit in Quantum in the compute manager - as that's where the code to create ports exists - resulting in the instance going to an error state (after its bounced through three hosts).

Seems to me that for Quantum the ports should be created in the API call (so that the error can be sent back to the user), and the port then passed down to the compute manager.

(Since a user can pass a port into the server create call I'm assuming this would be OK)

Revision history for this message
Mark McClain (markmcclain) wrote :

We'll continue to track this bug in Quantum to discuss the changes, but ultimately we'll have to update Nova.

Changed in quantum:
status: New → Opinion
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Phil, what API call are you referring too? The Compute API call where the other quota checks take place? I would avoid creating the port because it may be unnecessary (what if there are no hosts available for scheduling? Then we need to clear the port). Perhaps it's more effective to query Quantum for quota levels and reserve the resource.

Not sure how Nova handles quota limits that are under the responsibility of other projects; also currently Nova handles this Quota limits:

+-----------------------------+-------+
| Property | Value |
+-----------------------------+-------+
| metadata_items | 128 |
| injected_file_content_bytes | 10240 |
| ram | 51200 |
| floating_ips | 10 |
| key_pairs | 100 |
| instances | 10 |
| security_group_rules | 20 |
| injected_files | 5 |
| cores | 20 |
| fixed_ips | -1 |
| injected_file_path_bytes | 255 |
| security_groups | 10 |
+-----------------------------+-------+

May we need to add ports too?

Revision history for this message
Phil Day (philip-day) wrote :

Yep, by "API Call" I mean that the port quota needs to be checked and ports created or reserved before a response is sent back to the user (as with all Nova quotas).

I take you point about what happens if the request can't be scheduled (or fails) - but we already have to clear up resources within Nova so I don't see why we can't delete the quantum ports in the same way.

At the moment i don;t think there is any case where quotas are checked between projects - introducing a reservation mechanism in the Quantum API for nova to use seems a bit heavy to me.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

To your 2nd point: you are right, but I tend to follow the principle of creating something right before I need it..I do the same with variable initialization ;) Creating a port just for the sake of checking that we hit the quota limit sounds wrong to me. I grant you that there should be a number of quota checks at the very beginning of the spawning process, and port quota is definitely one of them.

To your 3rd point: it looks like that there is no sync of quotas between projects, and I am not proposing anything heavy duty. However Nova Quotas already has a reservation mechanism; maybe a first step in this direction may be to pull Quotas from other projects and cache them in Nova. The reservation may happen on the cached copy (that could be refreshed on request or periodically); not ideal, but it's a start.

Revision history for this message
Phil Day (philip-day) wrote :

Hi Armando,

I don't see it as creating a port just for the sake of checking the quota - I see it as creating the port earlier at the point where you can still meaningfully return an error to the user. The port will then be kept and passed down with the request just as if the user had created it and passed it into Nova.

Whether Nova holds a port from quantum through the create requests or a reservation on a port from quantum still comes down to Nova having to hold and manage an object from Quantum for a transaction that spans multiple Nova nodes, so I don't really see the need to introduce a reservation object. Why not just hold the port ?

I think centralizing quota management into Nova would be moving the wrong way - there is probably a need for a common external quota service that can be shared by all projects and that would be better that adding more integration into Nova itself. There was a discussion about this in Portland, but the concerns were how to make it sufficiently robust and scale-able. I think that's maybe a whole different topic (which also includes Domain level quotas, etc) than just addressing this issue

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Hi Phil,

Sorry, maybe I didn't explain myself properly. What I mean is that I feel that is wrong to create a port early in the process just to avoid a potential error down the line. It needlessly leak implementation details all the way to the API. Creating just a port is something that makes sense in Quantum, but with nova-network still lying around, moving the operation higher up in the stack may cause some serious refactoring (if I understand it correctly, the operation takes place when allocating the network). That said, I 100% agree with you that potential errors conditions should be checked before hopping on the compute node. That's where Quota validation comes in: quota levels should be checked at the very beginning, and that's when you can respond meaningfully to users who do not meet their requirements. The problem you have is a manifestation of the fact that port quota is not checked at all, unlike other quotas like core, ram etc. To me, addressing the above-mentioned issue by creating the port early enough does not feel like the right solution, but that's my personal opinion.

I wasn't proposing to centralize quota management into Nova, and by all means I was not advocating for (yet another) -aaS project. Adding another potential bootleneck/dependency for all the other projects does not sound very wise to me, and having Nova pulling quotas from other projects (since it already initiates the communication with them) seemed like a reasonable first approach than not checking quota levels at all, and thus risking to run into the very issue you ran into.

Thanks,
Armando

Revision history for this message
Phil Day (philip-day) wrote :

Hi Armando,

I don't agree with your reasoning that its creating it early just to avoid a latter error - its creating it at the point where we can still report any errors to the user. That's just good practice in validating as much as you can as early as possible.

I don't consider it leaky as the user can see the ports that are allocated to them in the Quantum API - much more transparent that the internal reservations in Nova.

There should be no need for a major refactor as a user can already pass a port into the API - and the fact that this is only meaningful for Quantum is already handled.

Quota validation as early as possible is only meaningful if its backed up by reserving the resource in some way - for quantum as it stands that means allocating the port, for nova resources it means holding a reservation. Without creating a resource reservation mechanism for Quantum (either in the Quantum API, or as an external service) then I still don't understand how you're going to get around this.

Regards,
Phil

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

If we are talking about the quick fix, then I agree with you: creating the port will work and the user will avoid to have a VM in error because the limit on allowed ports has been reached. If we are talking about the right fix, then we must agree to disagree.

IMHO, Nova API calls take a lot longer that they should, and that's also down to the fact that they make quite a number of remote calls to external services; sure...let's add another one, that'll speed them up.

Revision history for this message
Phil Day (philip-day) wrote :

And calling out to Quantum to just check & reserve the quota will be quicker that creating the port ? Remember that the port can't be bound to a physical host at this stage (just like any port passed in by the user) - so I don't see why it would be a heavy processing load ?

Its still not clear to me what you're proposing as an alternative that doesn't involve creating a central quota reservation service that all services can use

Michael Still (mikal)
summary: - Nove fails on Quantum port quota too late
+ Nova fails on Quantum port quota too late
Changed in nova:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Aaron Rosen (arosen) wrote :

Hi,

Sorry to jump into this late.

Currently when you launch an instance nova-api queries quantum to see if the network uuid is in quantum. In my opinon, I think it makes sense for nova-api to do a port-create() rather than a get-networks()) to fail early rather than later as Phil brings up. I also agree that we don't want to make nova-api the bottle neck here. That said, it's doing a get_networks() anyways that we could change to a create_port() which would solve the issue.

My 2 cents.

Revision history for this message
Gary Kotton (garyk) wrote :

In nova the Quota checks are done here - https://github.com/openstack/nova/blob/master/nova/compute/api.py#L528.
Why not do the Quantum checks for the ports here too?

Personally I think that the port allocation idea raised by Aaron is a nice idea.
Thanks
Gary

Revision history for this message
Aaron Rosen (arosen) wrote :

The idea was acutally raised by Phil (but i'm happy to take credit.. : ) ).

Revision history for this message
Mark McClain (markmcclain) wrote :

I think we should create the ports up front. Failing fast is more beneficial to the user.

Revision history for this message
Aaron Rosen (arosen) wrote :

This also allows quantum to make decisions if a port is part of a default security group or not. Currently it's just hardcoded to return default if no security group is passed in.

Aaron Rosen (arosen)
no longer affects: quantum
Revision history for this message
Phil Day (philip-day) wrote :

Another thing that can fail the same way is Quantum running out of IP addresses on the network - that also only get's picked up in the compute manager at the moment.

So when we create the port up front we should also allocate the IP address.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/49455
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1d9a0a620d78ab54f7a3da61b803a97cdbdd01f2
Submitter: Jenkins
Branch: master

commit 1d9a0a620d78ab54f7a3da61b803a97cdbdd01f2
Author: Phil Day <email address hidden>
Date: Wed Oct 2 23:14:35 2013 +0000

    Check Neutron port quota during validate_networks in API

    Unless ports are passed into Nova it will create ports on the
    requested networks as part of the network allocation in
    the compute manager. However if the user exceeds their
    port quota the instance will end up in an Error state, having
    first been re-scheduled a number of times.

    It would be much better if the quota failure was detected as
    part of the network validation in the API server, so that an
    error can be reported to the user and the creation failed.

    A full fix would include reserving or creating the ports at
    this stage, but there is no reservation mechanism in the
    Neutron API, and port creation depends in some cases
    on mac addresses only available on the compute manager.

    Instead this change just validates the quota and adjusts the
    max_count to be consistent with that quota, which doesn't
    guarantee that the create will work, but does catch the
    majority of cases.

    Refs bug: 1172808

    Change-Id: Iaaee059a6746fad68049712f94b2f8cfea6ab8dc

Revision history for this message
haruka tanizawa (h-tanizawa) wrote :

Sync is wrong?
I don't have a permission.
Assigned to -> Phil Day

Changed in nova:
status: Triaged → Fix Committed
Changed in nova:
milestone: none → juno-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-2 → 2014.2
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

A patch that does a partial revert of https://review.openstack.org/49455 from comment #16 and is under discussion at the time of writing so I am linking it here.

https://review.openstack.org/#/c/175742/

Basically - just checking quotas and not reserving them is a bit of a fool's errand. We should eithere have a reserve-rollback api in Neutron, or as has been suggested above - create the port quickly and then update it with additional information once we have it (when the request reaches the compute host)

Changed in nova:
status: Fix Released → Confirmed
milestone: 2014.2 → none
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

This wishlist bug has been open a year without any activity. I'm going to move it to "Opinion / Wishlist", which is an easily-obtainable queue of older requests that have come on. This bug can be reopened (set back to "New") if someone decides to work on this.

FWIW, I guess this is also part of the bp "get-me-a-network": http://specs.openstack.org/openstack/neutron-specs/specs/liberty/get-me-a-network.html

Changed in nova:
status: Confirmed → Opinion
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.