CPU_Allocation_Ratio from nova.conf doesn't update exisiting providers

Bug #1799727 reported by Andrew Waugaman on 2018-10-24
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Matt Riedemann
Rocky
High
Matt Riedemann

Bug Description

(OpenStack 14)

After changing the value of cpu_allocation_ratio in nova.conf from 16 to 1 and restarting nova containers, the ProviderTree still uses the old value

(A patch with extra debugging is applied to the system for ProviderTree information: https://review.openstack.org/#/c/597560/ )

Nova.conf setting:

cpu_allocation_ratio=1

[root@compute-0 ~]# docker restart nova_compute nova_libvirt
nova_compute
nova_libvirt

Inside Nova-compute.log

2018-10-23 19:19:49.217 1 DEBUG oslo_service.service [req-9539f623-b342-4c5d-ab93-6ffacdbd8358 - - - - -] cpu_allocation_ratio = 1.0 log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3023

2018-10-23 19:19:51.990 1 DEBUG nova.scheduler.client.report [req-9490c7f4-2157-44ef-a81c-ea3e6bf9be21 - - - - -] Updating ProviderTree inventory for provider ca60934d-074d-4628-ae61-3c3bbc9e5543 from _refresh_and_get_inventory using data: {u'VCPU': {u'allocation_ratio': 16.0, u'total': 6, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 6}, u'MEMORY_MB': {u'allocation_ratio': 1.0, u'total': 6143, u'reserved': 4096, u'step_size': 1, u'min_unit': 1, u'max_unit': 6143}, u'DISK_GB': {u'allocation_ratio': 1.0, u'total': 19, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 19}} _refresh_and_get_inventory /usr/lib/python2.7/site-packages/nova/scheduler/client/report.py:754
2018-10-23 19:19:51.990 1 DEBUG nova.compute.provider_tree [req-9490c7f4-2157-44ef-a81c-ea3e6bf9be21 - - - - -] Updating inventory in ProviderTree for provider ca60934d-074d-4628-ae61-3c3bbc9e5543 with inventory: {u'VCPU': {u'allocation_ratio': 16.0, u'total': 6, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 6}, u'MEMORY_MB': {u'allocation_ratio': 1.0, u'total': 6143, u'reserved': 4096, u'step_size': 1, u'min_unit': 1, u'max_unit': 6143}, u'DISK_GB': {u'allocation_ratio': 1.0, u'total': 19, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 19}} update_inventory /usr/lib/python2.7/site-packages/nova/compute/provider_tree.py:172

description: updated
Matt Riedemann (mriedem) wrote :

"OpenStack 14" is a RHOSP vendor product version - what is the actual upstream release you're testing against? 18.0.0 (Rocky GA)? 18.0.2 (latest stable/rocky release)?

tags: added: compute resou
tags: added: placement resource-tracker
removed: resou
description: updated
Andrew Waugaman (awaugama) wrote :

Sorry, I'm using 18.0.1

Matt Riedemann (mriedem) wrote :

I have a recreate in a functional test, will push it up shortly.

Changed in nova:
status: New → Confirmed
importance: Undecided → High
Matt Riedemann (mriedem) wrote :

I believe the bug is right here:

https://github.com/openstack/nova/blob/835faf3f40af6b0e07c585690982a997d6a2ac47/nova/compute/provider_tree.py#L128

That is just comparing the keys in the dict, not the values, so:

>>> old
{'a': 1, 'b': 2}
>>> new
{'a': 1, 'b': 2}
>>> new['b'] = 3
>>> old
{'a': 1, 'b': 2}
>>> new
{'a': 1, 'b': 3}
>>> old != new
True
>>> set(old) == set(new)
True
>>> set(old)
set(['a', 'b'])
>>> set(new)
set(['a', 'b'])
>>>

Which means it's saying inventory hasn't changed, which is true for the keys but not the values.

That was added in change: https://review.openstack.org/#/c/470575/ (queens)

Changed in nova:
status: Confirmed → Triaged
Matt Riedemann (mriedem) wrote :

I guess this shouldn't be a problem in queens because the resource tracker didn't use that provider tree to report back the inventory changes:

https://github.com/openstack/nova/blob/stable/queens/nova/compute/resource_tracker.py#L883

That was done in Rocky:

https://review.openstack.org/#/c/520246/

no longer affects: nova/queens
Matt Riedemann (mriedem) wrote :

I was looking at https://github.com/openstack/nova/blob/835faf3f40af6b0e07c585690982a997d6a2ac47/nova/compute/provider_tree.py#L128 incorrectly, that will return True from has_inventory_changed if the keys are different, meaning we don't need to check the values, which is correct.

Changed in nova:
status: Triaged → Confirmed
Matt Riedemann (mriedem) wrote :

OK I think this is the regression:

https://review.openstack.org/#/c/569132/

With that change, once we set the allocation ratios in the inventory data in the provider tree, that's what gets passed to _normalize_inventory_from_cn_obj, and _normalize_inventory_from_cn_obj won't set the allocation ratios from the compute node object if they are already in the inventory dict pass to _normalize_inventory_from_cn_obj - which once set, never gets unset...and that's the bug.

Fix proposed to branch: master
Review: https://review.openstack.org/613126

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Matt Riedemann (mriedem) on 2018-10-24
tags: added: libvirt

Reviewed: https://review.openstack.org/613115
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=45f36cebabdbaad10e69c1a99f5ad4f091885f82
Submitter: Zuul
Branch: master

commit 45f36cebabdbaad10e69c1a99f5ad4f091885f82
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 24 13:08:33 2018 -0400

    Add functional recreate test for bug 1799727

    This adds a functional test which recreates the
    bug where config-driven reserved and allocation ratio
    overrides are not being reflected in resource provider
    inventory once initially set.

    The reserved and allocation_ratio values set in the
    FakeDriver.update_provider_tree method, added in change
    I69d760aaf931d46f011cfd229b88f400837662e8, are removed
    here otherwise they hard-code the values which get sent
    to placement and ResourceTracker._normalize_inventory_from_cn_obj
    won't update the reserved / ratios based on config. The
    fake virt driver shouldn't really need to hard-code these
    values since the RT will provide those based on config.

    Change-Id: Ie66d6f4c83a7d6fc64a64dbd752e427cee1356d0
    Related-Bug: #1799727

Reviewed: https://review.openstack.org/613126
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ca279c68a54b054cb54d45ee9d2eed8ade9a6db5
Submitter: Zuul
Branch: master

commit ca279c68a54b054cb54d45ee9d2eed8ade9a6db5
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 24 13:48:06 2018 -0400

    Provide allocation_ratio/reserved amounts from update_provider_tree()

    The purpose of the RT._normalize_inventory_from_cn_obj method is
    to set allocation_ratio and reserved amounts on standard resource
    class inventory records that get sent to placement if the virt driver
    did not specifically set a ratio or reserved value (which none but
    the ironic driver do).

    If the allocation_ratio or reserved amount is in the inventory
    data dict from the virt driver, then the normalize method ignores
    it and lets the virt driver take priority.

    However, with change I6a706ec5966cdc85f97223617662fe15d3e6dc08,
    any virt driver that implements the update_provider_tree() interface
    is storing the inventory data on the ProviderTree object which gets
    cached and re-used, meaning once allocation_ratio/reserved is set
    from RT._normalize_inventory_from_cn_obj, it doesn't get unset and
    the normalize method always assumes the driver provided a value which
    should not be changed, even if the configuration value changes.

    We can make the config option changes take effect by changing
    the semantics between _normalize_inventory_from_cn_obj and
    drivers that implement the update_provider_tree interface, like
    for the libvirt driver. Effectively with this change, when a driver
    implements update_provider_tree(), they now control setting the
    allocation_ratio and reserved resource amounts for inventory they
    report. The libvirt driver will use the same configuration option
    values that _normalize_inventory_from_cn_obj used. The only difference
    is in update_provider_tree we don't have the ComputeNode facade to
    get the "real" default values when the allocation_ratio is 0.0, so
    we handle that like "CONF.cpu_allocation_ratio or 16.0". Eventually
    that will get cleaned up with blueprint initial-allocation-ratios.

    Change-Id: I72c83a95dabd581998470edb9543079acb6536a5
    Closes-Bug: #1799727

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/613991
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4e3616b562647ea140f6f67d1bf888f562b4b6d7
Submitter: Zuul
Branch: master

commit 4e3616b562647ea140f6f67d1bf888f562b4b6d7
Author: Matt Riedemann <email address hidden>
Date: Mon Oct 29 12:55:03 2018 -0400

    No longer call _normalize_inventory_from_cn_obj from upt flow

    With change I72c83a95dabd581998470edb9543079acb6536a5 we no
    longer have a need to call _normalize_inventory_from_cn_obj for
    in-tree drivers that implement the update_provider_tree() interface.
    That change also documented the expectation on the upt interface
    that virt drivers should report allocation ratios and reserved
    amounts if necessary, like the libvirt driver does with config
    option values.

    This change removes the call to _normalize_inventory_from_cn_obj
    from the ResourceTracker when the virt driver implements the
    update_provider_tree() method.

    Change-Id: Ib3591f583453d98245382be0f7a04b6195d67106
    Related-Bug: #1799727

Reviewed: https://review.openstack.org/614563
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=57ca9288ce33550b73deee22ca902cc35f2b4819
Submitter: Zuul
Branch: stable/rocky

commit 57ca9288ce33550b73deee22ca902cc35f2b4819
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 24 13:08:33 2018 -0400

    Add functional recreate test for bug 1799727

    This adds a functional test which recreates the
    bug where config-driven reserved and allocation ratio
    overrides are not being reflected in resource provider
    inventory once initially set.

    The reserved and allocation_ratio values set in the
    FakeDriver.update_provider_tree method, added in change
    I69d760aaf931d46f011cfd229b88f400837662e8, are removed
    here otherwise they hard-code the values which get sent
    to placement and ResourceTracker._normalize_inventory_from_cn_obj
    won't update the reserved / ratios based on config. The
    fake virt driver shouldn't really need to hard-code these
    values since the RT will provide those based on config.

    Conflicts:
          nova/virt/fake.py

    NTOE(mriedem): The conflict is due to not having change
    I69d760aaf931d46f011cfd229b88f400837662e8 in Rocky. Rather
    than backport that change, this backport simply adds the
    update_provider_tree method to the fake virt driver. The
    other noticeable difference is the "allocations" kwarg is
    ommitted since that was added in Stein with change
    Ic062446e5c620c89aec3065b34bcdc6bf5966275.

    Change-Id: Ie66d6f4c83a7d6fc64a64dbd752e427cee1356d0
    Related-Bug: #1799727
    (cherry picked from commit 45f36cebabdbaad10e69c1a99f5ad4f091885f82)

tags: added: in-stable-rocky

Reviewed: https://review.openstack.org/614564
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=26c5ee100fb7d75721355878ade9167ff88a70e6
Submitter: Zuul
Branch: stable/rocky

commit 26c5ee100fb7d75721355878ade9167ff88a70e6
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 24 13:48:06 2018 -0400

    Provide allocation_ratio/reserved amounts from update_provider_tree()

    The purpose of the RT._normalize_inventory_from_cn_obj method is
    to set allocation_ratio and reserved amounts on standard resource
    class inventory records that get sent to placement if the virt driver
    did not specifically set a ratio or reserved value (which none but
    the ironic driver do).

    If the allocation_ratio or reserved amount is in the inventory
    data dict from the virt driver, then the normalize method ignores
    it and lets the virt driver take priority.

    However, with change I6a706ec5966cdc85f97223617662fe15d3e6dc08,
    any virt driver that implements the update_provider_tree() interface
    is storing the inventory data on the ProviderTree object which gets
    cached and re-used, meaning once allocation_ratio/reserved is set
    from RT._normalize_inventory_from_cn_obj, it doesn't get unset and
    the normalize method always assumes the driver provided a value which
    should not be changed, even if the configuration value changes.

    We can make the config option changes take effect by changing
    the semantics between _normalize_inventory_from_cn_obj and
    drivers that implement the update_provider_tree interface, like
    for the libvirt driver. Effectively with this change, when a driver
    implements update_provider_tree(), they now control setting the
    allocation_ratio and reserved resource amounts for inventory they
    report. The libvirt driver will use the same configuration option
    values that _normalize_inventory_from_cn_obj used. The only difference
    is in update_provider_tree we don't have the ComputeNode facade to
    get the "real" default values when the allocation_ratio is 0.0, so
    we handle that like "CONF.cpu_allocation_ratio or 16.0". Eventually
    that will get cleaned up with blueprint initial-allocation-ratios.

    Conflicts:
          nova/virt/driver.py

    NOTE(mriedem): The conflict is due to not having change
    Ic062446e5c620c89aec3065b34bcdc6bf5966275 in Rocky which
    changed the update_provider_tree() signature and docstring.

    Change-Id: I72c83a95dabd581998470edb9543079acb6536a5
    Closes-Bug: #1799727
    (cherry picked from commit ca279c68a54b054cb54d45ee9d2eed8ade9a6db5)

This issue was fixed in the openstack/nova 18.1.0 release.

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers