No logs if scheduling fails due to pages requirements

Bug #1947396 reported by Facundo Ciccioli
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Unassigned

Bug Description

On a customer's environment with the NUMATopologyFilter enabled, when trying to create an instance using a flavor which has these properties:

aggregate_instance_extra_specs:cloud_metadata='true'
aggregate_instance_extra_specs:cpu_allocation_ratio='1.0'
hw:cpu_max_sockets='1'
hw:cpu_policy='dedicated'
hw:cpu_sockets='1'
hw:cpu_thread_policy='require'
hw:emulator_threads_policy='isolate'
hw:mem_page_size='2MB'
hw:numa_nodes='1'
hw:pmu='False'

we are getting "No valid hosts found" fault eventually on the openstack server show output. Letting aside why this is happening, I'd like to report an improvement which could be done to the NUMATopologyFilter's logging.

With debug logging enabled, this is a piece of the scheduler's log:

2021-10-14 15:23:08.047 34021 DEBUG nova.virt.hardware [...] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw=None,cpu_policy='dedicated',cpu_thread_policy='require',cpu_topology=<?>,cpuset=set([...]),cpuset_reserved=None,id=0,memory=16384,pagesize=2048) on host_cell NUMACell(cpu_usage=0,cpuset=set([]),id=0,memory=96415,memory_usage=32768,mempages=[NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([...]),pinned_cpus=set([...]),siblings=[...]) _numa_fit_instance_cell /usr/lib/python3/dist-packages/nova/virt/hardware.py:1078
2021-10-14 15:23:08.048 34021 DEBUG nova.virt.hardware [...] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw=None,cpu_policy='dedicated',cpu_thread_policy='require',cpu_topology=<?>,cpuset=set([...]),cpuset_reserved=None,id=0,memory=16384,pagesize=2048) on host_cell NUMACell(cpu_usage=0,cpuset=set([]),id=1,memory=96733,memory_usage=22528,mempages=[NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([...]),pinned_cpus=set([...]),siblings=[...]) _numa_fit_instance_cell /usr/lib/python3/dist-packages/nova/virt/hardware.py:1078
2021-10-14 15:23:08.048 34021 DEBUG nova.scheduler.filters.numa_topology_filter [...] [instance: ...] ..., redacted fails NUMA topology requirements. The instance does not fit on this host. host_passes /usr/lib/python3/dist-packages/nova/scheduler/filters/numa_topology_filter.py:110

I've redacted some parts for privacy and some other for clarity. Those messages are repeated for each compute tested.

The issue is that there's no indication of why the VM doesn't fit on the host.

Looking at the code I narrowed it down to the numa_fit_instance_to_host function on nova/virt/hardware.py. The raising of the exception exception.MemoryPageSizeNotSupported by the _numa_cell_supports_pagesize_request function doesn't generate any log.

I think it might be useful to get this information to the logs to ease on the debugging of the filter's working (as it is done for other reasons of the instance not passing the filter).

tags: added: logging scheduler
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.