Allocation data should be taken from a placement API

Bug #1938078 reported by Dariusz Smigiel
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Prometheus Openstack Exporter Charm
Confirmed
Undecided
Unassigned

Bug Description

When we configured ram-allocation-ratio and cpu-allocation-ratio as 0 for nova-cloud-controller and updated prometheus-openstack-exporter with appropriate values, POE resports negative values of "-32"

https://github.com/CanonicalLtd/prometheus-openstack-exporter/blob/747a1475a718cd4811fb945d2acd7e3b2e019db7/prometheus-openstack-exporter#L379

Adam Dyess (addyess)
Changed in charm-prometheus-openstack-exporter:
status: New → Confirmed
Revision history for this message
Adam Dyess (addyess) wrote (last edit ):

in fact, starting in openstack release train and beyond, all these allocation ratios (VCPU/mem/Disk) all are governed by the placement API -- not N-C-C's relation data.

This is fundamentally broken

Each Hypervisor according to the placement api gets to schedule different ratios for each of the class types: Disk, VCPU, and memory.

You can have
 4 Hypervisors which are scheduled 1:1 and 16 vCPUs
 4 Hypervisors which are scheduled 1:4 and 16 vCPUs
 4 Hypervisors which are scheduled 1:8 and 16 vCPUs
 4 Hypervisors which are scheduled 1:16 and 16 vCPUs

If each one of these machines had a single VM that took 16 vCPUs
* the first group would have all of its vCPUs consumed
* the second group would have 1/4th of its vCPUs consumed
* the third group would have 1/8th of its vCPUs consumed
* the fourth group would have 1/16th of its vCPUs consumed

imagine if the nova-cloud-controller relation said the cpu allocation ratio cloud-wide was 1:2

p-o-e would take the number of CPUs on each host and multiply by 2 to determine the "max vcpus available" (nope this is already wrong)

All the graphs in grafana would look like there were
16 vcpus left to schedule from the first group - really 0
16 vcpus left to schedule from the second group - really 48
16 vcpus left to schedule from the third group - really 112
16 vcpus left to schedule from the third group - really 240

This charm must use the relation to the keystone identity service and query the placement API for EACH resource (hypervisor in this case) and get the allocation ratio from each to use as the correct multiplier per hypervisor rather than use some explicit

config['openstack_allocation_ratio_vcpu']

the same is true for all `allocation_ratios`

summary: - Incorrect schedulable_instances value when allocation is set to 0
+ Allocation data should be taken from a placement API
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.