placement allocation_ratio initialized with 0.0

Bug #1789654 reported by Thomas Goirand
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Pike
In Progress
High
Tony Breeds
Queens
In Progress
High
Tony Breeds
Rocky
Fix Released
High
Matt Riedemann

Bug Description

After I just finished packaging Rocky, I wanted to test it with puppet-openstack. Then I couldn't boot VMs after the puppet run, because allocation_ration in placement is set to 0.0 by default:

# openstack resource provider list
+--------------------------------------+-------------------+------------+
| uuid | name | generation |
+--------------------------------------+-------------------+------------+
| f9716941-356f-4a2e-b5ea-31c3c1630892 | poi.infomaniak.ch | 2 |
+--------------------------------------+-------------------+------------+
# openstack resource provider show f9716941-356f-4a2e-b5ea-31c3c1630892
+------------+--------------------------------------+
| Field | Value |
+------------+--------------------------------------+
| uuid | f9716941-356f-4a2e-b5ea-31c3c1630892 |
| name | poi.infomaniak.ch |
| generation | 2 |
+------------+--------------------------------------+
# openstack resource provider inventory list f9716941-356f-4a2e-b5ea-31c3c1630892
+----------------+----------+-----------+------------------+-------+----------+----------+
| resource_class | reserved | step_size | allocation_ratio | total | min_unit | max_unit |
+----------------+----------+-----------+------------------+-------+----------+----------+
| VCPU | 0 | 1 | 0.0 | 4 | 1 | 4 |
| DISK_GB | 0 | 1 | 0.0 | 19 | 1 | 19 |
| MEMORY_MB | 512 | 1 | 0.0 | 7987 | 1 | 7987 |
+----------------+----------+-----------+------------------+-------+----------+----------+

Later on, setting-up correct allocation_ratio fixed the problem:
# openstack resource provider inventory class set --allocation_ratio 16.0 --total 4 f9716941-356f-4a2e-b5ea-31c3c1630892 VCPU
+------------------+------------+
| Field | Value |
+------------------+------------+
| max_unit | 2147483647 |
| min_unit | 1 |
| step_size | 1 |
| reserved | 0 |
| allocation_ratio | 16.0 |
| total | 4 |
+------------------+------------+
# openstack resource provider inventory list f9716941-356f-4a2e-b5ea-31c3c1630892
+----------------+------------------+----------+------------+-----------+----------+-------+
| resource_class | allocation_ratio | reserved | max_unit | step_size | min_unit | total |
+----------------+------------------+----------+------------+-----------+----------+-------+
| DISK_GB | 0.0 | 0 | 19 | 1 | 1 | 19 |
| MEMORY_MB | 0.0 | 512 | 7987 | 1 | 1 | 7987 |
| VCPU | 16.0 | 0 | 2147483647 | 1 | 1 | 4 |
+----------------+------------------+----------+------------+-----------+----------+-------+

so, after this, I could boot VMs normally. Though clearly, allocation_ratio should not be zero by default.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Logs for the failed debian rocky run:

http://logs.openstack.org/75/597175/1/check/puppet-openstack-integration-4-scenario001-tempest-debian-stable-luminous/fd38fcf/logs/

This was also reported by the xenserver CI:

http://lists.openstack.org/pipermail/openstack-dev/2018-August/133896.html

My guess is the local ProviderTree cache of inventory thinks nothing has changed when we set allocation_ratio on the provider inventory so it never actually updates the inventory remotely and we're left with initial values of 0.0.

Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/597553

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/597560

Revision history for this message
Matt Riedemann (mriedem) wrote :

https://review.openstack.org/#/c/597613/ should get the nova debug log patch into the xen CI testing.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.2 KiB)

Got some debug logs from a failed xen CI run:

http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/13/597613/2/check/dsvm-tempest-neutron-network/cc81140/logs/screen-n-cpu.txt.gz

We see that the inventory is updated properly on startup with the correct allocation ratios:

Aug 29 16:56:06.926641 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.resource_tracker [None req-99d9d496-6720-4837-a2ee-560605fd1afe None None] Using cpu_allocation_ratio 16.0 for node: 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07

Aug 29 16:56:07.057208 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: DEBUG nova.compute.provider_tree [None req-99d9d496-6720-4837-a2ee-560605fd1afe None None] Updating resource provider 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 generation from 0 to 1 during operation: update_inventory {{(pid=24436) _update_generation /opt/stack/new/nova/nova/compute/provider_tree.py:161}}

Aug 29 16:56:07.057499 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.provider_tree [None req-99d9d496-6720-4837-a2ee-560605fd1afe None None] Updating inventory in ProviderTree for provider 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 with inventory: {'VCPU': {'allocation_ratio': 16.0, 'total': 8, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 8}, 'MEMORY_MB': {'allocation_ratio': 1.5, 'total': 12795, 'reserved': 512, 'step_size': 1, 'min_unit': 1, 'max_unit': 12795}, 'DISK_GB': {'allocation_ratio': 1.0, 'total': 47, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 47}}

But then a bit later the cpu_allocation_ratio is set to 0.0 in the _normalize_inventory_from_cn_obj method:

Aug 29 16:58:05.483508 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.resource_tracker [None req-a869fa19-aa9d-4335-9816-42ff29b64d48 None None] Using cpu_allocation_ratio 0.0 for node: 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07

Aug 29 16:58:05.614421 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: DEBUG nova.scheduler.client.report [None req-a869fa19-aa9d-4335-9816-42ff29b64d48 None None] Updated inventory for 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 at generation 2: {'VCPU': {'allocation_ratio': 0.0, 'total': 8, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 8}, 'MEMORY_MB': {'allocation_ratio': 0.0, 'total': 12795, 'reserved': 512, 'step_size': 1, 'min_unit': 1, 'max_unit': 12795}, 'DISK_GB': {'allocation_ratio': 0.0, 'total': 47, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 47}} {{(pid=24436) _update_inventory_attempt /opt/stack/new/nova/nova/scheduler/client/report.py:965}}

So it looks like at some random point the ComputeNode.cpu_allocation_ratio is being set to 0.0 but I don't see any of my debug logging for that in the logs:

https://review.openstack.org/#/c/597560/3/nova/objects/compute_node.py

From the startup logs when it dumps nova-cpu.conf, we see the cpu_allocation_ratio option is not set (default is 0.0):

Aug 29 16:56:03.687759 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: DEBUG oslo_service.service [None req-e4bc9238-0584-459a-aa2c-dc5d425d198a None None] cpu_allocation_ratio = 0.0 {...

Read more...

Revision history for this message
Matt Riedemann (mriedem) wrote :

My working theory is that the ResourceTracker._copy_resources method is setting the ComputeNode.cpu_allocation_ratio (and other alloc ratios) to 0.0 based on config here:

https://github.com/openstack/nova/blob/f534495a427d1683bc536cf003ec02edbf6d8a45/nova/compute/resource_tracker.py#L622

And because of https://review.openstack.org/#/c/520024/ we are no longer calling ComputeNode.save() via ResourceTracker._update via ResourceTracker._init_compute_node so the ComputeNode.save() doesn't get called which would otherwise call ComputeNode._from_db_object which would fix the allocation ratio 0.0 values to the hard-coded defaults:

https://github.com/openstack/nova/blob/f534495a427d1683bc536cf003ec02edbf6d8a45/nova/objects/compute_node.py#L188

I've added more debug logging to the debug patch and will get another xenserver CI run.

Having said this, I'm not sure why this wouldn't be breaking us in the "normal" gate, i.e. tempest-full job.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/598176

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

I also wonder if this is somehow contributing to the problem: https://review.openstack.org/#/c/518294/ - but that was in Queens. But maybe that combined with https://review.openstack.org/#/c/520024/ is causing a side effect.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (7.2 KiB)

We have a new xen CI run with the more detailed debug logs now and it's basically confirming what I think the problem was, but apparently this is a race because this time the xen CI job passed.

Starting with the logs here:

http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/13/597613/2/check/dsvm-tempest-neutron-network/621833d/logs/screen-n-cpu.txt.gz

This is the initial inventory update for the newly created compute node and resource provider on start of nova-compute:

Aug 30 08:02:12.144029 dsvm-devstack-citrix-lon-nodepool-1379396 nova-compute[24292]: DEBUG nova.scheduler.client.report [None req-f4b08178-4b7a-4fba-b57a-91612721f970 None None] Updated inventory for 9c58942c-d183-455a-a760-991e4430e816 at generation 1: {'VCPU': {'allocation_ratio': 16.0, 'total': 4, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 4}, 'MEMORY_MB': {'allocation_ratio': 1.5, 'total': 12795, 'reserved': 512, 'step_size': 1, 'min_unit': 1, 'max_unit': 12795}, 'DISK_GB': {'allocation_ratio': 1.0, 'total': 47, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 47}} {{(pid=24292) _update_inventory_attempt /opt/stack/new/nova/nova/scheduler/client/report.py:967}}

Then when the update_available_resource periodic task runs, we see that _copy_resources updates the in-memory ComputeNode.*_allocation_ratio values to 0.0 from the config:

Aug 30 08:03:09.458344 dsvm-devstack-citrix-lon-nodepool-1379396 nova-compute[24292]: INFO nova.compute.resource_tracker [None req-9b1b9924-b89e-4a03-9a69-c9fff17594e3 None None] ComputeNode.cpu_allocation_ratio changed from 16.0 to 0.0 in _copy_resources.

And the _resource_change method, called from RT._update, confirms the change:

Aug 30 08:03:09.549234 dsvm-devstack-citrix-lon-nodepool-1379396 nova-compute[24292]: INFO nova.compute.resource_tracker [None req-9b1b9924-b89e-4a03-9a69-c9fff17594e3 None None] Compute node resources have changed.
Aug 30 08:03:09.549407 dsvm-devstack-citrix-lon-nodepool-1379396 nova-compute[24292]: Old: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='{"model": "Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz", "vendor": "GenuineIntel", "features": ["fpu", "de", "tsc", "msr", "pae", "mce", "cx8", "apic", "sep", "mca", "cmov", "pat", "clflush", "mmx", "fxsr", "sse", "sse2", "syscall", "nx", "lm", "constant_tsc", "rep_good", "nopl", "pni", "pclmulqdq", "ssse3", "cx16", "sse4_1", "sse4_2", "movbe", "popcnt", "aes", "rdrand", "hypervisor", "lahf_lm", "abm", "fsgsbase", "bmi1", "bmi2", "erms"], "topology": {"cores": 1, "threads": 1, "sockets": 4}}',created_at=2018-08-30T15:02:11Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=27,free_disk_gb=47,free_ram_mb=12283,host='localhost',host_ip=192.168.33.1,hypervisor_hostname='localhost',hypervisor_type='XenServer',hypervisor_version=7001000,id=1,local_gb=47,local_gb_used=0,mapped=0,memory_mb=12795,memory_mb_used=512,metrics='[]',numa_topology=None,pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.5,running_vms=0,service_id=None,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec],updated_at=None,uuid=9c58942c-d183-455a-a760-991e4430e816,vcpus=4,vcpus_used=0)

...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/598365

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/598176
Reason: Let's hope this works:

https://review.openstack.org/#/c/598365/

Revision history for this message
Eric Fried (efried) wrote :

Anecdotal reports say we were failing PowerVM CI 90% of the time. I'm going to run https://review.openstack.org/#/c/598365/ through a couple of times, but initial results look positive.

Revision history for this message
Matt Riedemann (mriedem) wrote :

According to Rado in https://review.openstack.org/#/c/270116/ this also impacts the vmware CI.

Revision history for this message
Matt Riedemann (mriedem) wrote :

BTW, if xenserver, powervm and vmware CI are all failing on this, it's not only intermittent but seems to also be isolated to virt drivers that don't implement the update_provider_tree method for whatever reason. Must be something in that alternative flow through the scheduler report client to report inventory updates to placement, likely because in the case of the libvirt driver, which does implement update_provider_tree, we're passing the same provider tree that the driver worked on to the scheduler report client, and in the non-libvirt case, we're passing an inventory dict to scheduler report client and the scheduler report client is updating its own view of the provider tree, which might be stale somehow.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/599670

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/599672

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/599673

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/598365
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2588af87c862cfd02d860f6b860381e907b279ff
Submitter: Zuul
Branch: master

commit 2588af87c862cfd02d860f6b860381e907b279ff
Author: Matt Riedemann <email address hidden>
Date: Thu Aug 30 17:57:24 2018 -0400

    Don't persist zero allocation ratios in ResourceTracker

    The ComputeNode object itself has a facade which provides the
    actual default values for cpu, disk and ram allocation ratios
    when the config option values are left at the default (0.0).

    When we initially create a ComputeNode in the ResourceTracker,
    the *_allocation_ratio values in the object are going to be
    unset, and then _copy_resources, called from _init_compute_node,
    will set the values to the config option values, again, defaulted
    to 0.0, but the ComputeNode object, after calling create() or
    save(), will change those *on the object* to the non-0.0 we
    actually know and love (16.0 for cpu, 1.5 for ram, and 1.0 for disk).

    During the update_available_resource periodic, we'll again go
    through _init_compute_node and _copy_resources in the ResourceTracker
    which will set the configured values (default of 0.0) onto the
    ComputeNode object, which makes the _resource_change method, called
    from _update, return True and trigger a ComputeNode.save() call
    from the _update method. At that point we're *persisting* the 0.0
    allocation ratios in the database, even though ComputeNode.save
    will change them to their non-0.0 default values *on the object*
    because of the _from_db_object call at the end of ComputeNode.save.

    So even if the ComputeNode object allocation ratio values are the
    non-0.0 defaults, we'll *always* update the database on every
    periodic even if nothing else changed in inventory.

    This change modifies the _copy_resource method to only update the
    ComputeNode fields if the configured ratios are not the 0.0 default.

    Change-Id: I43a23a3290db0c835fed01b8d6a38962dc61adce
    Related-Bug: #1789654

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/599670
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c45adaca5dd241408f1e29b657fe6ed42c908b8b
Submitter: Zuul
Branch: master

commit c45adaca5dd241408f1e29b657fe6ed42c908b8b
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 12:11:57 2018 -0400

    Document unset/reset wrinkle for *_allocation_ratio options

    This is a follow up to I43a23a3290db0c835fed01b8d6a38962dc61adce
    which makes the cpu/disk/ram_allocation_ratio config "sticky" in
    that once set to a non-default value, it is not possible to reset
    back to the default behavior (when config is 0.0) on an existing
    compute node record by unsetting the option from nova.conf. To
    reset back to the defaults, the non-0.0 default would have to be
    explicitly put into config, so cpu_allocation_ratio=16.0 for example.

    Alternatively operators could delete the nova-compute service
    record via the DELETE /os-services/{service_id} REST API and
    restart the nova-compute service to get a new compute_nodes record,
    but that workaround is messy and left undocumented in config.

    Change-Id: I908615d82ead0f70f8e6d2d78d5dcaed8431084d
    Related-Bug: #1789654

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/597560

Revision history for this message
Eric Fried (efried) wrote :
Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/599672
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=01265b98c4cd2b1377e891a06ce748fc6f8f3425
Submitter: Zuul
Branch: stable/rocky

commit 01265b98c4cd2b1377e891a06ce748fc6f8f3425
Author: Matt Riedemann <email address hidden>
Date: Thu Aug 30 17:57:24 2018 -0400

    Don't persist zero allocation ratios in ResourceTracker

    The ComputeNode object itself has a facade which provides the
    actual default values for cpu, disk and ram allocation ratios
    when the config option values are left at the default (0.0).

    When we initially create a ComputeNode in the ResourceTracker,
    the *_allocation_ratio values in the object are going to be
    unset, and then _copy_resources, called from _init_compute_node,
    will set the values to the config option values, again, defaulted
    to 0.0, but the ComputeNode object, after calling create() or
    save(), will change those *on the object* to the non-0.0 we
    actually know and love (16.0 for cpu, 1.5 for ram, and 1.0 for disk).

    During the update_available_resource periodic, we'll again go
    through _init_compute_node and _copy_resources in the ResourceTracker
    which will set the configured values (default of 0.0) onto the
    ComputeNode object, which makes the _resource_change method, called
    from _update, return True and trigger a ComputeNode.save() call
    from the _update method. At that point we're *persisting* the 0.0
    allocation ratios in the database, even though ComputeNode.save
    will change them to their non-0.0 default values *on the object*
    because of the _from_db_object call at the end of ComputeNode.save.

    So even if the ComputeNode object allocation ratio values are the
    non-0.0 defaults, we'll *always* update the database on every
    periodic even if nothing else changed in inventory.

    This change modifies the _copy_resource method to only update the
    ComputeNode fields if the configured ratios are not the 0.0 default.

    Change-Id: I43a23a3290db0c835fed01b8d6a38962dc61adce
    Related-Bug: #1789654
    (cherry picked from commit 2588af87c862cfd02d860f6b860381e907b279ff)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/599673
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a039f8397702d15718ebcec0fdb9cfeb6155f6a1
Submitter: Zuul
Branch: stable/rocky

commit a039f8397702d15718ebcec0fdb9cfeb6155f6a1
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 12:11:57 2018 -0400

    Document unset/reset wrinkle for *_allocation_ratio options

    This is a follow up to I43a23a3290db0c835fed01b8d6a38962dc61adce
    which makes the cpu/disk/ram_allocation_ratio config "sticky" in
    that once set to a non-default value, it is not possible to reset
    back to the default behavior (when config is 0.0) on an existing
    compute node record by unsetting the option from nova.conf. To
    reset back to the defaults, the non-0.0 default would have to be
    explicitly put into config, so cpu_allocation_ratio=16.0 for example.

    Alternatively operators could delete the nova-compute service
    record via the DELETE /os-services/{service_id} REST API and
    restart the nova-compute service to get a new compute_nodes record,
    but that workaround is messy and left undocumented in config.

    Change-Id: I908615d82ead0f70f8e6d2d78d5dcaed8431084d
    Related-Bug: #1789654
    (cherry picked from commit c45adaca5dd241408f1e29b657fe6ed42c908b8b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/597553
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9c842d1aa68d67383114f783c946492282832e5b
Submitter: Zuul
Branch: master

commit 9c842d1aa68d67383114f783c946492282832e5b
Author: Matt Riedemann <email address hidden>
Date: Wed Aug 29 10:41:25 2018 -0400

    Log the operation when updating generation in ProviderTree

    Today when a provider generation is updated we get a mostly
    unhelpful message in the logs like:

      Updating resource provider bc0f4c8e-96e2-4e68-a06e-f7e0ac9aac6b
      generation from 0 to 1

    What we really want to know with that is in what context did the
    generation change, i.e. did inventory or traits change?

    This adds the actual operation to the log message when generation
    changes.

    Change-Id: I9b61f1dfb8db06e02ff79e19c055c673094a4ed2
    Related-Bug: #1789654

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/597560
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ade240b392563914c1bddc228a3eb00585177d4c
Submitter: Zuul
Branch: master

commit ade240b392563914c1bddc228a3eb00585177d4c
Author: Matt Riedemann <email address hidden>
Date: Wed Aug 29 10:56:09 2018 -0400

    Add debug logs for when provider inventory changes

    Because of the ProviderTree caching involved in both
    the ResourceTracker and SchedulerReportClient, it can
    be hard to know if inventory changes are being correctly
    reported to the placement service as expected, especially
    with things like configurable allocation ratios. This
    adds debug logs to the SchedulerReportClient and
    ProviderTree to determine if inventory has changed and
    when it's flushed back to placement.

    Change-Id: Ia6ab3ab4dea08b479bb6b794f408fd2e6f678c50
    Related-Bug: #1789654

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/613263

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/613271

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/647291

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/647292

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/647291
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7a3b072442db74418fa7fe626c637627d219c71f
Submitter: Zuul
Branch: stable/queens

commit 7a3b072442db74418fa7fe626c637627d219c71f
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 12:11:57 2018 -0400

    Document unset/reset wrinkle for *_allocation_ratio options

    This is a follow up to I43a23a3290db0c835fed01b8d6a38962dc61adce
    which makes the cpu/disk/ram_allocation_ratio config "sticky" in
    that once set to a non-default value, it is not possible to reset
    back to the default behavior (when config is 0.0) on an existing
    compute node record by unsetting the option from nova.conf. To
    reset back to the defaults, the non-0.0 default would have to be
    explicitly put into config, so cpu_allocation_ratio=16.0 for example.

    Alternatively operators could delete the nova-compute service
    record via the DELETE /os-services/{service_id} REST API and
    restart the nova-compute service to get a new compute_nodes record,
    but that workaround is messy and left undocumented in config.

    Change-Id: I908615d82ead0f70f8e6d2d78d5dcaed8431084d
    Related-Bug: #1789654
    (cherry picked from commit c45adaca5dd241408f1e29b657fe6ed42c908b8b)
    (cherry picked from commit a039f8397702d15718ebcec0fdb9cfeb6155f6a1)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/647292
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f8c0c4671c0b4967d9ccf9d84df575475fe3dddc
Submitter: Zuul
Branch: stable/pike

commit f8c0c4671c0b4967d9ccf9d84df575475fe3dddc
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 4 12:11:57 2018 -0400

    Document unset/reset wrinkle for *_allocation_ratio options

    This is a follow up to I43a23a3290db0c835fed01b8d6a38962dc61adce
    which makes the cpu/disk/ram_allocation_ratio config "sticky" in
    that once set to a non-default value, it is not possible to reset
    back to the default behavior (when config is 0.0) on an existing
    compute node record by unsetting the option from nova.conf. To
    reset back to the defaults, the non-0.0 default would have to be
    explicitly put into config, so cpu_allocation_ratio=16.0 for example.

    Alternatively operators could delete the nova-compute service
    record via the DELETE /os-services/{service_id} REST API and
    restart the nova-compute service to get a new compute_nodes record,
    but that workaround is messy and left undocumented in config.

    Change-Id: I908615d82ead0f70f8e6d2d78d5dcaed8431084d
    Related-Bug: #1789654
    (cherry picked from commit c45adaca5dd241408f1e29b657fe6ed42c908b8b)
    (cherry picked from commit a039f8397702d15718ebcec0fdb9cfeb6155f6a1)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/613271
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=705284e18c3a9f2c93ecb15bd1bfee5d7710b684
Submitter: Zuul
Branch: stable/queens

commit 705284e18c3a9f2c93ecb15bd1bfee5d7710b684
Author: Matt Riedemann <email address hidden>
Date: Thu Aug 30 17:57:24 2018 -0400

    Don't persist zero allocation ratios in ResourceTracker

    The ComputeNode object itself has a facade which provides the
    actual default values for cpu, disk and ram allocation ratios
    when the config option values are left at the default (0.0).

    When we initially create a ComputeNode in the ResourceTracker,
    the *_allocation_ratio values in the object are going to be
    unset, and then _copy_resources, called from _init_compute_node,
    will set the values to the config option values, again, defaulted
    to 0.0, but the ComputeNode object, after calling create() or
    save(), will change those *on the object* to the non-0.0 we
    actually know and love (16.0 for cpu, 1.5 for ram, and 1.0 for disk).

    During the update_available_resource periodic, we'll again go
    through _init_compute_node and _copy_resources in the ResourceTracker
    which will set the configured values (default of 0.0) onto the
    ComputeNode object, which makes the _resource_change method, called
    from _update, return True and trigger a ComputeNode.save() call
    from the _update method. At that point we're *persisting* the 0.0
    allocation ratios in the database, even though ComputeNode.save
    will change them to their non-0.0 default values *on the object*
    because of the _from_db_object call at the end of ComputeNode.save.

    So even if the ComputeNode object allocation ratio values are the
    non-0.0 defaults, we'll *always* update the database on every
    periodic even if nothing else changed in inventory.

    This change modifies the _copy_resource method to only update the
    ComputeNode fields if the configured ratios are not the 0.0 default.

    Change-Id: I43a23a3290db0c835fed01b8d6a38962dc61adce
    Related-Bug: #1789654
    (cherry picked from commit 2588af87c862cfd02d860f6b860381e907b279ff)
    (cherry picked from commit 01265b98c4cd2b1377e891a06ce748fc6f8f3425)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/pike)

Change abandoned by melanie witt (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/613263
Reason: Abandoning per rado.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.