GET /allocations/{uuid} on a consumer with no allocations provides no generation

Bug #1778591 reported by Chris Dent
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Medium
Unassigned

Bug Description

If we write some allocations with PUT /allocations/{uuid} at modern microversions, a consumer record is created for {uuid} and a generation is created for that consumer. Each subsequent attempt to PUT /allocations/{uuid} must include a matching consumer generation.

If the allocations for a consumer are cleared (either DELETE, or PUT /allocations/{uuid} with an empty dict of allocations) two things go awry:

* the consumer record, with a generation, stays around
* GET /allocations/{uuid} returns the following:

   {u'allocations': {}}

That is, no generation is provided, and we have no way figure one out other than inspecting the details of the error response.

Some options to address this:

* Return the generation in that response
* When the allocations for a consumer go empty, remove the consumer
* Something else?

Tags: placement
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/578139

Revision history for this message
Chris Dent (cdent) wrote :

After messing with this (in creating the demo test linked in the review above) I'm now pretty unsure on what the right thing to do is.

Is the right fix making sure that if a consumer has no allocations, that consumer record is destroyed, including in the failure shown in the above test? If that's done, then the next effort to set allocations will still accept null.

Or is sending 0 somehow right in that case?

Revision history for this message
Eric Fried (efried) wrote :

I'm starting to convince myself that we should, in a new microversion:

- Make GET /allocations/{nonexistent-consumer} do the sane thing, which is error 404.
- Make real REST endpoints for consumers.
  - GET /consumers/{uuid} responds 404 if no such consumer, or 200 with payload containing proj, user, gen.
  - PUT /consumers/{uuid} to create (or modify, but we could disallow that - can you change a consumer's proj/user?). Payload contains proj, user, gen.
  - How do you delete a consumer? DELETE is probably okay, even though it doesn't allow you to specify the generation. It'll bounce 409 if the consumer has allocations. This is akin to DELETEing a provider - we don't bother with generations there either.
- Require the consumer to be created before it can be allocated. No more of this implicit creation BS. (Older microversions would still do this, of course.)
- No implicit deletion of consumers either. (Ditto.) A consumer with zero allocations is a consumer with zero allocations.

Revision history for this message
Eric Fried (efried) wrote :

BTW, sending 0 is in no way right in that case.

Revision history for this message
Jay Pipes (jaypipes) wrote :

I disagree that the consumer record, with a generation, staying around, is "going awry". The reason is because you aren't deleting the consumer. You are deleting the allocations for that consumer.

For the GET /allocations/{uuid} response, that should be the following:

```yaml
{
  'consumer_generation': <GENERATION>,
  'project_id': <EXTERNAL_PROJECT_ID>,
  'user_id': <EXTERNAL_USER_ID>,
  'allocations': {}
}
```

at least, 1.28+ should return the above. If it doesn't, that's a bug IMHO.

Revision history for this message
Jay Pipes (jaypipes) wrote :

Not sure this should be in Triaged status since we don't yet have an agreement on a) whether there is a bug, b) what that bug actually is, and c) what the proposed solution should be.

Revision history for this message
Jay Pipes (jaypipes) wrote :

Also, I agree that there is an issue (mentioned by you and alex_xu on IRC) that if there is a 409 (or other failure) in the PUT /allocations/{consumer_uuid} that occurs *after* the consumer record has been created for a new consumer, that that creates an issue where the caller will need to pass a consumer_generation because there will be a consumer record created.

Thinking about this, I guess in the situation of the allocations failing to be created for a new consumer, we should delete the created consumer record.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/578139
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=623cd722e02c58356ba985a0bd5a7cfa3701effc
Submitter: Zuul
Branch: master

commit 623cd722e02c58356ba985a0bd5a7cfa3701effc
Author: Chris Dent <email address hidden>
Date: Tue Jun 26 16:54:34 2018 +0100

    [placement] demonstrate part of bug 1778591 with a gabbi test

    The test creates a resource provider with a small amount of VCPU
    inventory. It then tries to allocate a large number VCPU via a new
    consumer (and thus using consumer_generation of null).

    This fails with a 409 because capacity is violated.

    Then it queries to GET allocations on that consumer, to see if consumer
    generation is available. It is not.

    The it tries to allocate a smaller amount and though it might be
    expected that the generation should still be null, it takes a 0
    for the allocation to be accepted.

    Once we decide what the right behavior is here, we can fix it.

    Change-Id: I196834efc2cb6ece4204abbf8544c8f2621aaa36
    Related-Bug: #1778591

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/579654

Changed in nova:
assignee: nobody → Jay Pipes (jaypipes)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Jay Pipes (<email address hidden>) on branch: master
Review: https://review.openstack.org/579654
Reason: We decided to delete consumers when no allocations, so this is no longer a valid patch.

Revision history for this message
Jay Pipes (jaypipes) wrote :

We'll be deleting consumers that have no allocations, so this bug is no longer relevant.

Changed in nova:
status: In Progress → Won't Fix
assignee: Jay Pipes (jaypipes) → nobody
Revision history for this message
Eric Fried (efried) wrote :

This bug is still relevant. Excerpt from https://review.openstack.org/#/c/579163/:

The current behavior

  status: 200
  {
      "allocations": {}
  }

is wrong because the response payload doesn't conform to the expected format, which would contain a consumer_generation, project_id, and user_id. That those fields don't make sense in a context where there's no consumer is another motivator for making this a 4xx failure.

Changed in nova:
status: Won't Fix → Triaged
Changed in nova:
assignee: nobody → Eric Fried (efried)
status: Triaged → In Progress
Revision history for this message
Eric Fried (efried) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/579163

Changed in nova:
assignee: Eric Fried (efried) → Jay Pipes (jaypipes)
Revision history for this message
Eric Fried (efried) wrote :

Assuming we can agree this bug is valid (per reasoning in comment #12 or otherwise), we need to decide whether the error should be 404 or 400. Per third bullet here [1], "If a request contains a reference to a nonexistent resource in the body (not URI), the code should be 400 Bad Request. Do not use 404 NotFound..." In this case the nonexistent resource is in the URI, but it's a consumer, not an allocation. IMO this still warrants 404 rather than 400; that's my vote.

[1] https://specs.openstack.org/openstack/api-wg/guidelines/http/response-codes.html#failure-code-clarifications

Revision history for this message
Chris Dent (cdent) wrote :

I agree it should be 404. If you GET /allocations/{some uuid} and for that uuid there are no allocations, the response is 404.

If we called the second member of the path the {allocations uuid} instead of {consumer uuid} it would probably be more immediately clear: There are no allocations.

Where 400 has come up as option on this URI is in write operations where the write can be a create (as with PUT /allocations/{some id}). If the id is malformed a 400 makes more sense than a 404 because in a create the thing doesn't exist and we want to be able to say "you can change your request to use a different id and it may work".

Revision history for this message
Eric Fried (efried) wrote :

Good point, Chris. To be clear:

 GET /allocations/{properly-formed UUID with no corresponding consumer record} => 404
 PUT /allocations/{thing that isn't a UUID} => 400

The latter will be covered by https://review.openstack.org/#/c/580373/

Revision history for this message
Eric Fried (efried) wrote :

But what about GET /allocations/{thing that isn't a UUID} ?

Revision history for this message
Jay Pipes (jaypipes) wrote :

We don't have a `GET /allocations/{thing that isn't a UUID}`.

That said, it should probably have always been:

`GET /consumers/{consumer_uuid}/allocations`

to match the reciprocal:

`GET /resource_providers/{rp_uuid}/allocations`

that we already have.

Revision history for this message
Chris Dent (cdent) wrote :

I was doing a review of bugs, and stumbled on this one as being old and sort of undecided.

I suspect a lot of the confusion we're experiencing on issues like this can be cleared up if we're able to write up a few things and thereby identify some of inconsistencies between what we want and what we've got:

* What's a "consumer"?
* Why does that, whatever it is, need to show up in placement itself?
* What is the lifetime or lifecycle of that thing?

I think this is important because my brain (bless it's broken little heart) is still struggling to comprehend consumers as something relevant in placement itself. To me, there are allocations associated with an _external_ thing, by uuid, which have a lifecycle directly tied to the existence of that external thing. For the duration of the lump of allocations there is a known project and user associated.

If we are reifying the concept of "consumer" in placement it is presumably something different from "the lump of allocations". What is it? What is it for?

If we can get that written up in an email or spec or something I'll likely stop coming to a confused halt every time I cross this topic. My apparent resistance isn't simply "let's not have more things" it also (and more) "I don't know what this thing is."

Jay Pipes (jaypipes)
Changed in nova:
assignee: Jay Pipes (jaypipes) → nobody
Revision history for this message
Jay Pipes (jaypipes) wrote :
Download full text (3.8 KiB)

@cdent: A consumer is the abstract thing that has one or more resources allocated to it. While currently only representing a Nova instance, a consumer may in the future represent a Nova instance, a (detached) Cinder volume, a (detached) Neutron port or any other thing that can have allocations associated with it.

We could probably make the same argument that you are making about the concept of a resource provider... "it's just a UUID that has inventory (and traits) associated with it." Why do we need to have a GET /resource_providers endpoint at all? Why not just use GET /inventories/{uuid} where {uuid} is just the "external identifier" for the thing that has inventory of resources?

And my answer to that is because conceptually it makes sense to name the thing that has some other relation associated with it. That's the basis of entity-relationship modeling that I'm familiar with, at least :)

From a less-abstract-less-Jay's-personal-need-for-congruence perspective, I've been thinking about this issue with regards to how we tackle the pile of technical debt that is called "instance groups" in Nova. In order to solve various affinity and anti-affinity problems -- which are inherently a placement/scheduling domain -- the relationship between two consumers needs to be modeled.

We currently model this in Nova-land with the instance_groups and instance_group_member (sic) and instance_group_policy (sic) tables that live in the nova_api database [1].

I personally believe it would be easier to solve distance and affinity constraints in the placement service if the placement service were the thing to understand the relationship between different consumers (i.e. the groups the consumers belong to). The reason I believe this is that placement is really the only service (other than Keystone) that has a truly global view [2]. If we keep the table that stores instance group membership in Nova, then we limit the ability of Nova to group instances across multiple Nova deployments (this is going to be critical for edge deployments, BTW).

Similarly, if we keep the instance group membership in Nova, we have no ability to group dissimilar types of consumers together (e.g. group a Nova instance, a (detached) Neutron network and a (detached) Cinder volume together...)

In order to have instance^Wconsumer group information in placement, we'll need some way of associating a consumer with another consumer. We could do that explicitly with some sort of "consumer group" concept. Or we could do it implicitly using a generic "tags" concept (which was my original idea for implementing instance groups in Nova after I saw the proposed implementation). Regardless of whether we went with an explicit groups or implicit tags model, we would still want *some* entity other than "an external identifier for a set of allocations" to be available in our API for associating one consumer to another :)

We would most naturally do that using something like:

 PUT /consumer_groups/group1
 {
   "members": [
      "consumer1",
      "consumer2",
      ...
   ]
 }

or:

 PUT /consumers/consumer1/tags
 {
   "tags": [
     "group1"
   ]
 }
 PUT /consumers/consumer2/tags
 {
   "tags": [
   ...

Read more...

Revision history for this message
Ed Leafe (ed-leafe) wrote :

Adding this here so that my 2 cents from prior discussions elsewhere is recorded in this discussion.

In my mind, all that a consumer is to placement is a way to represent a batch of allocations created by a PUT (or POST in the multi case). That's it. It is a way for external clients of placement to determine who (in their system) is using what (quotas?), and to easily remove allocations when the consumer no longer needs them. When you delete the allocations for a consumer, the consumer is now meaningless, and should be deleted. Any GET referring to that consumer should return a 404.

In the case where a client's entity creates allocations, deletes them, and then creates new allocations, placement would create the consumer record, delete it, and re-create it again. Having a consumer record that doesn't refer to anything is meaningless, and messy.

Returning an empty dict for allocations is wrong. Just return 404.

Revision history for this message
Chris Dent (cdent) wrote :

For sake of discussion and not because I'm a horrible person:

What about a group or generic-tag-or-metadata field on the allocations lump?

How would that be better or worse than a reified consumer concept?

Is it, in part, because the group/groupings need to survive the lifecycle of an allocations lump? If so, why do they need to survive?

Again, I'm trying hard not to present as recalcitrant. Think of me as a curious 5 year old.

Revision history for this message
Chris Dent (cdent) wrote :

Since there's been a lot of churn since this conversation started, and because we're over on storyboard now, and other issues have taken greater priority, I'm going to mark this as a wontfix. Not because we won't fix the conceptual situation, but that the concept isn't really caught by the bug.

We'll get to it when it matters.

Changed in nova:
status: In Progress → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Eric Fried (<email address hidden>) on branch: master
Review: https://review.opendev.org/579163
Reason: I'm abandoning this because placement code is being deleted from nova and the nova bug is marked wontfix. If the issue still exists in placement, there'll be a story for it, and the fix should be proposed there.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.