RT overrides default allocation_ratios for ram cpu and disk

Bug #1742747 reported by Maciej Jozefczyk
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
Maciej Jozefczyk

Bug Description

Description
===========

Resource tracker overrides default allocation ratio values with values from configuration files without checking it those values are "valid ones".

Allocation ratios values are taken directly from configuration files. This is a good approach unless allocation ratios in configuration file are set to 0.0. Here comes a problem. Default configuration parameter sets those ratios to be 0.0:
https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L397
So if allocation ratio is set as 0.0 (or not set, because 0.0 is default value), we would have issues with send this ratio with RT update to placement.
*BUT here comes the solution*:
https://github.com/openstack/nova/blob/master/nova/objects/compute_node.py#L198

When we read ComputeNode object from DB we also check if ratios are 0.0, if yes we override them (CPU-16x, RAM-1.5x, DISK-1x).

But just after initialization of ComputeNode object here:
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py?utf8=✓#L539
We copy actual resources to it (thanks to _copy_resources).

We override allocations from ComputeNode to those that are taken from configuration file - yes, thats ok. If operator wants to change ratios - he will do it in conf file and then restart the service.

But what if he would leave those parameters untouched in config? Here comes the problem!
Those params would be always set to 0.0 - placement api doesn't like it and raise:
InvalidInventoryCapacity: Invalid inventory for 'VCPU' on resource provider '52559824-5fb1-424b-a4cf-79da9199447d'. The reserved value is greater than or equal to total.
The exception is raised here:
https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L228

Some code around problem:
Code:
> /opt/stack/nova/nova/compute/resource_tracker.py(610)
 602 def _copy_resources(self, compute_node, resources):
 603 """Copy resource values to supplied compute_node."""
 604 # purge old stats and init with anything passed in by the driver
 605 self.stats.clear()
 606 self.stats.digest_stats(resources.get('stats'))
 607 compute_node.stats = copy.deepcopy(self.stats)
 608
 609 # update the allocation ratios for the related ComputeNode object
 610 -> compute_node.ram_allocation_ratio = self.ram_allocation_ratio
 611 compute_node.cpu_allocation_ratio = self.cpu_allocation_ratio
 612 compute_node.disk_allocation_ratio = self.disk_allocation_ratio
 613
 614 # now copy rest to compute_node
 615 compute_node.update_from_virt_driver(resources)
(Pdb++) self.cpu_allocation_ratio
0.0

self.cpu_allocation_ratio comes directly from config:
https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L397
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L148

Environment
===========
Latest master

How to reproduce
===========
1. Spawn devstack
2. Leave configuration files untouched
3. Observe overrides in
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py?utf8=✓#L611
4. Watch how RT sends it to placement and placement responds with 400 - bad request.

Changed in nova:
assignee: nobody → Maciej Jozefczyk (maciej.jozefczyk)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/532924

Changed in nova:
status: New → In Progress
Revision history for this message
Jay Pipes (jaypipes) wrote :

The problem is actually more complicated now due to the issue detailed in this thread:

http://lists.openstack.org/pipermail/openstack-operators/2018-January/014748.html

I believe what we will need to do is change the behaviour of the RT when the nova-compute's nova.conf CONF.cpu_allocation_ratio is 0.0 we should look up the value of the first host aggregate the compute node belongs to and look for a cpu_allocation_ratio metadata key, and if found, use that.

For the cases when no host aggregates exist or no cpu_allocation_ratio metadata key exists for a host aggregate associated to the compute node, then the defaults should follow the policy that is documented in the configuration option already:

https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L416-L421

Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

I'm ok with change to watch aggregation metadatas first and then use the policy.

Waiting for Melanie and mgagne opinion about that.

Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

Ok after some conversation on IRC - I don't want to touch the if-logic for now.

The only thing that could be helpful for me in https://review.openstack.org/#/c/520024 is to set default values instead zeros.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Maciej Jozefczyk (<email address hidden>) on branch: master
Review: https://review.openstack.org/532924

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.