Set cpu_allocation_ratio by placement cli , lead to compute_nodes table keep old cpu_allocation_ratio

Bug #2017114 reported by lijie.xie
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Hello, I encountered some problems when using the cpu_allocation_ratio and I hope to get help from the community.

old nova.conf
cpu_allocation_ratio=10.0

new nova.conf:
cpu_allocation_ratio=None
initial_cpu_allocation_ratio=4.0

Restart compute service, and execute placement cli:
openstack resource provider inventory set 959339b3-6d23-4780-8052-a51067c00659 --resource VCPU:allocation_ratio=5 --amend

()[root@nova-maintenance-7f9cc4b8b8-mbpsq /]# openstack resource provider inventory set 959339b3-6d23-4780-8052-a51067c00659 --resource VCPU:allocation_ratio=5 --amend
+----------------+------------------+----------+----------+----------+-----------+-------+
| resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total |
+----------------+------------------+----------+----------+----------+-----------+-------+
| VCPU | 5.0 | 1 | 6 | 0 | 1 | 6 |
| MEMORY_MB | 1.0 | 1 | 64116 | 52224 | 1 | 64116 |
| DISK_GB | 1.0 | 1 | 1637 | 0 | 1 | 1637 |
+----------------+------------------+----------+----------+----------+-----------+-------+

compute_nodes table record: this is old value
MariaDB [nova]> select host,cpu_allocation_ratio,uuid from compute_nodes;
+-------------------+----------------------+--------------------------------------+
| host | cpu_allocation_ratio | uuid |
+-------------------+----------------------+--------------------------------------+
| node-2.domain.tld | 10 | 959339b3-6d23-4780-8052-a51067c00659 |
+-------------------+----------------------+--------------------------------------+

NUMATopologyFilter will use compute.cpu_allocation_ratio to host_passes numa vm, not inventory VCPU allocation_ratio. This will result in scheduling errors.

scheduler log:
2023-04-20 16:31:46.878 14 INFO nova.scheduler.manager [req-dce98e42-b992-4787-a1e1-7e7bfc3b2df7 68bf65903042427ba614d71ae1202221 54ebca1a767a4b7bbf2ef98c91cdf4e0 - default default] Starting to schedule for instances: ['042f39ce-6887-418f-869b-80cac9987ab9']
2023-04-20 16:31:46.988 14 WARNING nova.scheduler.utils [req-dce98e42-b992-4787-a1e1-7e7bfc3b2df7 68bf65903042427ba614d71ae1202221 54ebca1a767a4b7bbf2ef98c91cdf4e0 - default default] Info cache for instance 042f39ce-6887-418f-869b-80cac9987ab9 could not be found.: nova.exception.InstanceInfoCacheNotFound: Info cache for instance 042f39ce-6887-418f-869b-80cac9987ab9 could not be found.
2023-04-20 16:31:46.998 14 INFO nova.scheduler.host_manager [req-dce98e42-b992-4787-a1e1-7e7bfc3b2df7 68bf65903042427ba614d71ae1202221 54ebca1a767a4b7bbf2ef98c91cdf4e0 - default default] Update host state from compute node: ComputeNode(cpu_allocation_ratio=10.0,cpu_info='{"arch": "x86_64", "model": "Broadwell-IBRS", "vendor": "Intel", "topology": {"cells": 1, "sockets": 16, "cores": 1, "threads": 1}, "features": ["pat", "pclmuldq", "avx2", "ibrs-all", "rtm", "arat", "lahf_lm", "arch-capabilities", "avx512vbmi", "wbnoinvd", "ssbd", "sse4.1", "abm", "md-clear", "sse4.2", "avx512vnni", "umip", "rdseed", "fma", "mtrr", "stibp", "avx512bw", "xsaves", "lm", "rdtscp", "sha-ni", "clflushopt", "hypervisor", "movbe", "vpclmulqdq", "bmi1", "fpu", "avx512vbmi2", "erms", "f16c", "vme", "sep", "vmx", "clflush", "ss", "tsx-ctrl", "smap", "pcid", "3dnowprefetch", "tsc", "skip-l1dfl-vmentry", "pni", "mce", "avx512vl", "tsc-deadline", "cx16", "avx512dq", "pge", "smep", "avx512-vpopcntdq", "syscall", "avx512f", "apic", "xsavec", "avx", "fsgsbase", "avx512cd", "mds-no", "nx", "avx512ifma", "xsave", "aes", "msr", "vaes", "pku", "spec-ctrl", "cmov", "de", "ssse3", "la57", "pae", "adx", "fxsr", "rdctl-no", "pse36", "clwb", "tsc_adjust", "cx8", "mca", "sse", "bmi2", "rdrand", "hle", "pse", "sse2", "gfni", "xgetbv1", "taa-no", "xsaveopt", "avx512bitalg", "pdpe1gb", "x2apic", "invpcid", "mmx", "pschange-mc-no", "popcnt"]}',created_at=2022-12-07T03:30:37Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=103,free_disk_gb=1637,free_ram_mb=10868,host='node-3.domain.tld',host_ip=192.168.10.5,hypervisor_hostname='node-3.domain.tld',hypervisor_type='QEMU',hypervisor_version=4002000,id=8,local_gb=1637,local_gb_used=0,mapped=1,memory_mb=64116,memory_mb_used=53248,metrics='[]',numa_topology='{"nova_object.name": "NUMATopology", "nova_object.namespace": "nova", "nova_object.version": "1.2", "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", "nova_object.namespace": "nova", "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset": [10, 11, 12, 13, 14, 15], "pcpuset": [10, 11, 12, 13, 14, 15], "memory": 64116, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[15], [10], [11], [12], [13], [14]], "mempages": [{"nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": 14316589, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", "reserved", "total", "used"]}, {"nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048, "total": 4096, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", "reserved", "total", "used"]}, {"nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", "reserved", "total", "used"]}], "network_metadata": {"nova_object.name": "NetworkMetadata", "nova_object.namespace": "nova", "nova_object.version": "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, "nova_object.changes": ["tunneled", "physnets"]}, "socket": null}, "nova_object.changes": ["socket", "cpu_usage", "pinned_cpus", "id", "network_metadata", "pcpuset", "memory_usage", "mempages", "siblings", "cpuset", "memory"]}]}, "nova_object.changes": ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.0,running_vms=1,service_id=None,stats={failed_builds='0',io_workload='0',num_instances='1',num_os_type_None='1',num_proj_54ebca1a767a4b7bbf2ef98c91cdf4e0='1',num_task_None='1',num_vm_active='1'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-04-20T08:28:33Z,uuid=53907eb2-1d14-447d-87da-c23c16f7df7a,vcpus=6,vcpus_used=2)
2023-04-20 16:31:47.000 14 INFO nova.scheduler.host_manager [req-dce98e42-b992-4787-a1e1-7e7bfc3b2df7 68bf65903042427ba614d71ae1202221 54ebca1a767a4b7bbf2ef98c91cdf4e0 - default default] Update host state from compute node: ComputeNode(cpu_allocation_ratio=10.0,cpu_info='{"arch": "x86_64", "model": "Broadwell-IBRS", "vendor": "Intel", "topology": {"cells": 1, "sockets": 16, "cores": 1, "threads": 1}, "features": ["arch-capabilities", "avx512vnni", "smep", "3dnowprefetch", "umip", "md-clear", "pat", "avx2", "hle", "taa-no", "pdpe1gb", "pse36", "ss", "pschange-mc-no", "avx512vl", "sse4.1", "fxsr", "avx512bw", "gfni", "vme", "avx512f", "avx512bitalg", "aes", "msr", "sse4.2", "fsgsbase", "movbe", "mds-no", "clflushopt", "smap", "clwb", "xsaves", "fpu", "apic", "avx512cd", "sha-ni", "abm", "ssbd", "bmi1", "avx", "mmx", "spec-ctrl", "avx512dq", "mce", "lahf_lm", "pni", "xsave", "rdseed", "ibrs-all", "rdrand", "vpclmulqdq", "avx512vbmi2", "pge", "mtrr", "tsc-deadline", "ssse3", "nx", "arat", "cmov", "tsc_adjust", "vaes", "sse2", "tsc", "wbnoinvd", "x2apic", "cx8", "rdtscp", "lm", "clflush", "fma", "avx512ifma", "vmx", "mca", "xsaveopt", "rdctl-no", "avx512vbmi", "xsavec", "de", "tsx-ctrl", "popcnt", "syscall", "f16c", "invpcid", "hypervisor", "erms", "pae", "pse", "pku", "sse", "rtm", "pcid", "adx", "cx16", "stibp", "pclmuldq", "skip-l1dfl-vmentry", "bmi2", "la57", "xgetbv1", "sep", "avx512-vpopcntdq"]}',created_at=2022-12-07T03:30:39Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=103,free_disk_gb=1637,free_ram_mb=11892,host='node-2.domain.tld',host_ip=192.168.10.4,hypervisor_hostname='node-2.domain.tld',hypervisor_type='QEMU',hypervisor_version=4002000,id=11,local_gb=1637,local_gb_used=0,mapped=1,memory_mb=64116,memory_mb_used=52224,metrics='[]',numa_topology='{"nova_object.name": "NUMATopology", "nova_object.namespace": "nova", "nova_object.version": "1.2", "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", "nova_object.namespace": "nova", "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset": [10, 11, 12, 13, 14, 15], "pcpuset": [10, 11, 12, 13, 14, 15], "memory": 64116, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[15], [10], [11], [12], [13], [14]], "mempages": [{"nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": 13792305, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", "used", "total", "reserved"]}, {"nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048, "total": 5120, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", "used", "total", "reserved"]}, {"nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["size_kb", "used", "total", "reserved"]}], "network_metadata": {"nova_object.name": "NetworkMetadata", "nova_object.namespace": "nova", "nova_object.version": "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, "nova_object.changes": ["tunneled", "physnets"]}, "socket": null}, "nova_object.changes": ["socket", "cpu_usage", "mempages", "memory", "memory_usage", "pcpuset", "id", "siblings", "cpuset", "pinned_cpus", "network_metadata"]}]}, "nova_object.changes": ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.0,running_vms=0,service_id=None,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-04-20T08:00:04Z,uuid=959339b3-6d23-4780-8052-a51067c00659,vcpus=6,vcpus_used=0)
2023-04-20 16:31:47.001 14 INFO nova.filters [req-dce98e42-b992-4787-a1e1-7e7bfc3b2df7 68bf65903042427ba614d71ae1202221 54ebca1a767a4b7bbf2ef98c91cdf4e0 - default default] Starting with 2 host(s)
2023-04-20 16:31:47.001 14 INFO nova.filters [req-dce98e42-b992-4787-a1e1-7e7bfc3b2df7 68bf65903042427ba614d71ae1202221 54ebca1a767a4b7bbf2ef98c91cdf4e0 - default default] Filter AvailabilityZoneFilter returned 2 host(s)

lijie.xie (none2021)
description: updated
Revision history for this message
Uggla (rene-ribaud) wrote :

Hello rune32bit,

Thanks for submitting this bug, can you please add the version you used in the ticket.

Changed in nova:
status: New → Incomplete
Revision history for this message
lijie.xie (none2021) wrote :

Hello Uggla,
I use Wallaby Openstack, i looked at the latest master branch and there seems to be no changes.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
Revision history for this message
Haidong Pang (haidong-pang) wrote :

I've encountered a similar issue in my openstack cluster.

After setting the cpu_allocation_ratio for the resource provider inventory in placement,it doesn't seem to affect the nova-scheduler.

Nova-scheduler always fetches cpu_allocation_ratio value from the compute_nodes table.

I'm wondering if we can patch some fields of the HostState instance by calling placement-api when the nova-scheduler initializes the HostState instance?

Considering performance issues in large-scale clusters, add a new handler in placement for bulk querying resource provider inventories seems like a good solution.
By consolidating multiple query transactions of providers into single transaction, we could potentially address performance concerns.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.