nova-compute resource tracker ignores 'reserved' while reporting 'max_unit'

Bug #1749404 reported by Bence Romsics
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Low
Unassigned

Bug Description

The following inventory was reported after a fresh devstack build:

curl --silent \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --header "OpenStack-API-Version: placement latest" \
    --header "X-Auth-Token: ${TOKEN:?}" \
    -X GET http://127.0.0.1/placement/resource_providers/8d4d7926-df76-42e5-b5da-67893468f5cb/inventories | json_pp
{
   "resource_provider_generation" : 1,
   "inventories" : {
      "DISK_GB" : {
         "max_unit" : 19,
         "min_unit" : 1,
         "allocation_ratio" : 1,
         "step_size" : 1,
         "reserved" : 0,
         "total" : 19
      },
      "MEMORY_MB" : {
         "allocation_ratio" : 1.5,
         "max_unit" : 5967,
         "min_unit" : 1,
         "reserved" : 512,
         "step_size" : 1,
         "total" : 5967
      },
      "VCPU" : {
         "allocation_ratio" : 16,
         "min_unit" : 1,
         "max_unit" : 2,
         "reserved" : 0,
         "step_size" : 1,
         "total" : 2
      }
   }
}

IMO the correct max_unit value of the MEMORY_MB resource would be (total - reserved). But today it equals the total value.

nova commit: 9e9b3e1
devstack commit: fbdefac
devstack config: ENABLED_SERVICES+=,placement-api,placement-client

description: updated
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

That could be understandable that max_unit should be limited to what you can actually consume, so I tend to agree with you, but fixing that would imply two things :
 - it would be a huge upgrade problem, because every compute would now need to modify its inventory
 - we would have a dependency between max_unit and reserved, which I feel shouldn't

For that reason, since we already verify the reserved space, I think it's just something like a detail that wouldn't need to be modified.

Changed in nova:
status: New → Won't Fix
Revision history for this message
Chris Dent (cdent) wrote :

I disagree with Sylvain on this one so going to re-open, but it is low-ish priority because the impact isn't significant: if max_unit is greater than reserved and allocation_ratio is 1 requesting a single max_unit resource will fail in an expected way that does not involve max_unit: total-reserved < requested resource.

Ideally we shouldn't have that conflict, so I think this is worth fixing (setting max_unit to total-reserved). It's not that much of an upgrade problem because compute nodes are constantly verifying and updating their inventories anyway. If a bunch of compute nodes are restarted at the same time there will be a lot of inventory writes, but this probably won't be an issue[1].

[1] https://anticdent.org/placement-scale-fun.html

Changed in nova:
status: Won't Fix → Triaged
importance: Undecided → Low
Revision history for this message
Ed Leafe (ed-leafe) wrote :

I thought that max_unit was simply a way of limiting the size of a request to some sort of "sane" amount for the resource. If max_unit is 50 and there is an available inventory of 100, it limits you from requesting all 100. However, if resource begin to be consumed, and available inventory has now dropped down to 25, you wouldn't expect max_unit to change, would you? It seems that you are asking that max_unit be able to change to reflect available inventory, and that isn't right. I would classify this as Won't Fix, but I'll wait for others to respond.

Revision history for this message
Bence Romsics (bence-romsics) wrote :

@Ed: I don't think max_unit should reflect the available inventory. That would unnecessarily limit what can be done with the transactional POST /allocations. Consider this: From a resource class a provider has total=4, reserved=1 and a single allocation of let's say 2 (and max_unit is big enough). Then I think a transaction that replaces the current allocation of 2 with a single allocation of 3 (either of the same or of another consumer) is a valid transaction. Therefore max_unit should not be artificially limited below (total-reserved).

On the other hand I agree that this bug has low importance. I don't think this bug is going to be a problem until somebody has a use case where bigger-than-single-host resources are needed and they want to automate splitting it up to the fewest number but biggest size chunks possible. A use case I don't have.

Revision history for this message
Ed Leafe (ed-leafe) wrote :

@Bence - if that were the case, then max_unit is redundant. We already check for available resources (total-reserved-allocated). The whole point of min/max_unit (and step_size) is to allow operators to set the allowable "chunk" size of a request for a given inventory of resource. They have nothing to do with the total amount of available resources.

Revision history for this message
Jay Pipes (jaypipes) wrote :

max_unit is intended, as @edleafe states, to limit requested amount of resources to a sane amount. It is there to provide protection against consuming an amount of resources that would represent an unrealistic request when the allocation_ratio > 1.0.

For example, consider the default CPU allocation ratio is 16.0. Now consider a host has 32 physical CPU processors, and 2 of those are reserved for the host via the reserved_host_cpus CONF option. The max_unit would be 32, because that is the theoretical "sane" limit of vCPUs that a single instance can consume on the host. Remember that the host doesn't have 2 *physical* CPUs reserved for itself. It has 2 VCPU reserved for itself.

I'm going to close this as Won't Fix because this is the same behaviour that has existed in Nova since the very beginning (these things were constructed and passed as the "limits" parameter to the resource claim).

Bence, I believe that your use case is more properly addressed with the cpu_dedicated_set and cpu_shared_set work that is currently underway and described partly in this blueprint:

https://review.openstack.org/#/c/555081/

In the case of dedicated CPUs, the allocation ratio would be 1.0 and the (total - reserved) value would indeed be the limiting factor in calculating whether an instance would be able to consume that amount of resources.

Changed in nova:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.