OpenStack Compute (nova)

RT overrides default allocation_ratios for ram cpu and disk

Bug #1742747 reported by Maciej Jozefczyk on 2018-01-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	In Progress	Undecided	Maciej Jozefczyk

Bug Description

Description
===========

Resource tracker overrides default allocation ratio values with values from configuration files without checking it those values are "valid ones".

Allocation ratios values are taken directly from configuration files. This is a good approach unless allocation ratios in configuration file are set to 0.0. Here comes a problem. Default configuration parameter sets those ratios to be 0.0:
https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L397
So if allocation ratio is set as 0.0 (or not set, because 0.0 is default value), we would have issues with send this ratio with RT update to placement.
*BUT here comes the solution*:
https://github.com/openstack/nova/blob/master/nova/objects/compute_node.py#L198

When we read ComputeNode object from DB we also check if ratios are 0.0, if yes we override them (CPU-16x, RAM-1.5x, DISK-1x).

But just after initialization of ComputeNode object here:
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py?utf8=✓#L539
We copy actual resources to it (thanks to _copy_resources).

We override allocations from ComputeNode to those that are taken from configuration file - yes, thats ok. If operator wants to change ratios - he will do it in conf file and then restart the service.

But what if he would leave those parameters untouched in config? Here comes the problem!
Those params would be always set to 0.0 - placement api doesn't like it and raise:
InvalidInventoryCapacity: Invalid inventory for 'VCPU' on resource provider '52559824-5fb1-424b-a4cf-79da9199447d'. The reserved value is greater than or equal to total.
The exception is raised here:
https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L228

Some code around problem:
Code:
> /opt/stack/nova/nova/compute/resource_tracker.py(610)
602 def _copy_resources(self, compute_node, resources):
603 """Copy resource values to supplied compute_node."""
604 # purge old stats and init with anything passed in by the driver
605 self.stats.clear()
606 self.stats.digest_stats(resources.get('stats'))
607 compute_node.stats = copy.deepcopy(self.stats)
608
609 # update the allocation ratios for the related ComputeNode object
610 -> compute_node.ram_allocation_ratio = self.ram_allocation_ratio
611 compute_node.cpu_allocation_ratio = self.cpu_allocation_ratio
612 compute_node.disk_allocation_ratio = self.disk_allocation_ratio
613
614 # now copy rest to compute_node
615 compute_node.update_from_virt_driver(resources)
(Pdb++) self.cpu_allocation_ratio
0.0

self.cpu_allocation_ratio comes directly from config:
https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L397
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L148

Environment
===========
Latest master

How to reproduce
===========
1. Spawn devstack
2. Leave configuration files untouched
3. Observe overrides in
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py?utf8=✓#L611
4. Watch how RT sends it to placement and placement responds with 400 - bad request.

Maciej Jozefczyk (maciejjozefczyk) on 2018-01-11

Changed in nova:
assignee:	nobody → Maciej Jozefczyk (maciej.jozefczyk)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-11: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/532924

Changed in nova:
status:	New → In Progress

Revision history for this message

Jay Pipes (jaypipes) wrote on 2018-01-29:

The problem is actually more complicated now due to the issue detailed in this thread:

http://lists.openstack.org/pipermail/openstack-operators/2018-January/014748.html

I believe what we will need to do is change the behaviour of the RT when the nova-compute's nova.conf CONF.cpu_allocation_ratio is 0.0 we should look up the value of the first host aggregate the compute node belongs to and look for a cpu_allocation_ratio metadata key, and if found, use that.

For the cases when no host aggregates exist or no cpu_allocation_ratio metadata key exists for a host aggregate associated to the compute node, then the defaults should follow the policy that is documented in the configuration option already:

https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L416-L421

Revision history for this message

Maciej Jozefczyk (maciejjozefczyk) wrote on 2018-01-29:

I'm ok with change to watch aggregation metadatas first and then use the policy.

Waiting for Melanie and mgagne opinion about that.

Revision history for this message

Maciej Jozefczyk (maciejjozefczyk) wrote on 2018-02-27:

Ok after some conversation on IRC - I don't want to touch the if-logic for now.

The only thing that could be helpful for me in https://review.openstack.org/#/c/520024 is to set default values instead zeros.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-07-10: Change abandoned on nova (master)

Change abandoned by Maciej Jozefczyk (<email address hidden>) on branch: master
Review: https://review.openstack.org/532924

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Allow for initial default allocation ratios

Remote bug watches

Bug watches keep track of this bug in other bug trackers.