hide_hypervisor_id extra_specs in nova flavor cannot pass AggregateInstanceExtraSpecsFilter

Bug #1841932 reported by yao ning
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Stephen Finucane
Train
Fix Released
Undecided
Unassigned
Ussuri
Fix Released
Undecided
Unassigned

Bug Description

Description
===========
when we enable nova AggregateInstanceExtraSpecsFilter, and then we need to passthrough a nvidia gpu so that we need to set hide_hypervisor_id in nova flavor extra specs. hide_hypervisor_id cannot pass the AggregateInstanceExtraSpecsFilter because of # Either not scope format, or aggregate_instance_extra_specs scope.

See the codes below:
            # Either not scope format, or aggregate_instance_extra_specs scope
            scope = key.split(':', 1)
            if len(scope) > 1:
                if scope[0] != _SCOPE:
                    continue
                else:
                    del scope[0]
            key = scope[0]

Steps to reproduce
==================
in nova.conf
[filter_scheduler]
enabled_filters = ....,AggregateInstanceExtraSpecsFilter,...

create a flavor like "g3.8xlarge" and setting extra_specs "hide_hypervisor_id":

nova flavor-key g3.8xlarge set hide_hypervisor_id=true

then create a instance with flavor g3.8xlarge, it will report "Filter AggregateInstanceExtraSpecsFilter returned 0 hosts" in nova schedualer log.

Environment
===========
(nova-scheduler)[nova@control1 /]$ rpm -qa | grep nova
openstack-nova-common-18.2.1-0.1.el7.noarch
openstack-nova-scheduler-18.2.1-0.1.el7.noarch
python-nova-18.2.1-0.1.el7.noarch
python2-novaclient-11.0.0-1.el7.noarch

I think this is a BUG in AggregateInstanceExtraSpecsFilter, can I suggest to remove the "not scope format" support in AggregateInstanceExtraSpecsFilter? or add a explicitly scope for "hide_hypervisor_id". Otherwise, I cannot use AggregateInstanceExtraSpecsFilter and hide_hypervisor_id at the same time.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Yeah I think the problem is the hide_hypervisor_id extra spec doesn't have something like the "os:" prefix. I don't think removing the scope checking in the filter is an option since that would break behavior for existing flavors that are properly scoped. So what we'd probably have to either do is (1) add compat for os:hide_hypervisor_id and reference that in docs and code or (2) just hard-code a check in the filter itself for that extra spec which is easy but gross.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Actually, wait, what were the metadata values on the host aggregate you're trying to use? Looking at the code, it handles unscoped extra specs and should be looking for an aggregate metadata key of "hide_hypervisor_id" that matches the flavor extra spec key.

Can you enable debugging in the scheduler service and recreate this and provide the logs that show the filter rejecting the instance using that flavor?

Changed in nova:
status: New → Incomplete
Revision history for this message
yao ning (mslovy11022) wrote :
Download full text (7.3 KiB)

2019-09-02 21:58:46.460 24 DEBUG nova.scheduler.filters.pci_passthrough_filter [req-25bf5f0d-8aa3-4a27-b59f-279c7ad85fcf 16c5e33fdb17417eabc1bf6b41cb7d06 f4f2d26bb0034d5f8e6bcdbc901211df - default default] (compute4, compute4) ram: 24530MB disk: 7481344MB io_ops: 0 instances: 1 doesn't have the required PCI devices (InstancePCIRequests(instance_uuid=<?>,requests=[InstancePCIRequest])) host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/pci_passthrough_filter.py:54
2019-09-02 21:58:46.461 24 DEBUG nova.scheduler.filters.pci_passthrough_filter [req-25bf5f0d-8aa3-4a27-b59f-279c7ad85fcf 16c5e33fdb17417eabc1bf6b41cb7d06 f4f2d26bb0034d5f8e6bcdbc901211df - default default] (compute1, compute1) ram: 193490MB disk: 7481344MB io_ops: 0 instances: 108 doesn't have the required PCI devices (InstancePCIRequests(instance_uuid=<?>,requests=[InstancePCIRequest])) host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/pci_passthrough_filter.py:54
2019-09-02 21:58:46.462 24 DEBUG nova.scheduler.filters.pci_passthrough_filter [req-25bf5f0d-8aa3-4a27-b59f-279c7ad85fcf 16c5e33fdb17417eabc1bf6b41cb7d06 f4f2d26bb0034d5f8e6bcdbc901211df - default default] (compute2, compute2) ram: 215506MB disk: 7481344MB io_ops: 0 instances: 14 doesn't have the required PCI devices (InstancePCIRequests(instance_uuid=<?>,requests=[InstancePCIRequest])) host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/pci_passthrough_filter.py:54
2019-09-02 21:58:46.463 24 DEBUG nova.scheduler.filters.pci_passthrough_filter [req-25bf5f0d-8aa3-4a27-b59f-279c7ad85fcf 16c5e33fdb17417eabc1bf6b41cb7d06 f4f2d26bb0034d5f8e6bcdbc901211df - default default] (compute3, compute3) ram: 251346MB disk: 7481344MB io_ops: 0 instances: 2 doesn't have the required PCI devices (InstancePCIRequests(instance_uuid=<?>,requests=[InstancePCIRequest])) host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/pci_passthrough_filter.py:54
2019-09-02 21:58:46.463 24 DEBUG nova.filters [req-25bf5f0d-8aa3-4a27-b59f-279c7ad85fcf 16c5e33fdb17417eabc1bf6b41cb7d06 f4f2d26bb0034d5f8e6bcdbc901211df - default default] Filter PciPassthroughFilter returned 1 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2019-09-02 21:58:46.464 24 DEBUG nova.scheduler.filters.aggregate_instance_extra_specs [req-25bf5f0d-8aa3-4a27-b59f-279c7ad85fcf 16c5e33fdb17417eabc1bf6b41cb7d06 f4f2d26bb0034d5f8e6bcdbc901211df - default default] (compute5, compute5) ram: 122269MB disk: 7481344MB io_ops: 0 instances: 1 fails instance_type extra_specs requirements. Extra_spec hide_hypervisor_id is not in aggregate. host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/aggregate_instance_extra_specs.py:67
2019-09-02 21:58:46.465 24 INFO nova.filters [req-25bf5f0d-8aa3-4a27-b59f-279c7ad85fcf 16c5e33fdb17417eabc1bf6b41cb7d06 f4f2d26bb0034d5f8e6bcdbc901211df - default default] Filter AggregateInstanceExtraSpecsFilter returned 0 hosts
2019-09-02 21:58:46.466 24 DEBUG nova.filters [req-25bf5f0d-8aa3-4a27-b59f-279c7ad85fcf 16c5e33fdb17417eabc1bf6b41cb7d06 f4f2d26bb0034d5f8e6bcdbc901211df - default default] Filtering removed all hosts for the request with instance ID '8e198...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
Revision history for this message
sean mooney (sean-k-mooney) wrote :

this happens because when

the extra spec was intoduced by https://blueprints.launchpad.net/nova/+spec/hide-hypervisor-id-flavor-extra-spec

it failed to namespace the extra spec properly.

it should have been added to the hw: namespace.

if it was then the AggregateInstanceExtraSpecsFilter would work with it.

the only way to fix this is to deprecate teh use of hide_hypervisor_id and replace the namespaced version hw:hide_hypervisor_id or alter the AggregateInstanceExtraSpecsFilter to specifcly ignore that value.

but we also need the metadata keys to confirm.

note we also discussed some alteration to this filter and extra specs in general as part of the flavor extraspec validation work such as requireing the use of "aggregate_instance_extra_specs:..." going froward and removing support for unscoped extra specs.

Changed in nova:
status: Expired → New
Revision history for this message
sean mooney (sean-k-mooney) wrote :

note a workaround for this is to create 2 new host aggreates one with
hide_hypervisor_id=true and the other with hide_hypervisor_id=false then add all host to both aggreates.

this will prevent schduling based on hide_hypervisor_id extra spec as all host will support all allowed values for hide_hypervisor_id but it will enable you to schdule vms in this case.

the longterm fix is to namespace hide_hypervisor_id and or remove support for unnamespaced extra specs form the AggregateInstanceExtraSpecsFilter

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

I'm capturing my thoughts here since we discussed it over IRC.

Tl;dr: the issue is not about the filter itself, this is more the fact that the filter requires you to explicitely create aggregates having the metadata keys matching with the extraspecs you gonna provide.

If you don't want to have such dependency, you need to make this extra spec prefixed by a namespace in https://review.opendev.org/#/c/555861/10/nova/virt/libvirt/driver.py so you'd skip this filter.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Based on the above recent discussion I mark this bug Confirmed

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/722187

Changed in nova:
assignee: nobody → Jie Li (ramboman)
status: Confirmed → In Progress
Changed in nova:
assignee: Jie Li (ramboman) → Stephen Finucane (stephenfinucane)
Changed in nova:
assignee: Stephen Finucane (stephenfinucane) → Jie Li (ramboman)
Changed in nova:
assignee: Jie Li (ramboman) → Stephen Finucane (stephenfinucane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/722187
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bf488a8630702160021b5848bf6e86fbb8015205
Submitter: Zuul
Branch: master

commit bf488a8630702160021b5848bf6e86fbb8015205
Author: ramboman <email address hidden>
Date: Wed Apr 22 21:33:22 2020 +0800

    replace the "hide_hypervisor_id" to "hw:hide_hypervisor_id"

    When we use the flavor extra_specs "hide_hypervisor_id" in
    AggregateInstanceExtraSpecsFilter, then will retrun False.
    So we need correct the extra_specs.

    Change-Id: I9d8d8c3a30cf6da7e8fb48374347e069ab075df2
    Closes-Bug: 1841932

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/747189

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ussuri)

Reviewed: https://review.opendev.org/747189
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9d28d7ec808469ec129b66c69b9e63cd9537a63f
Submitter: Zuul
Branch: stable/ussuri

commit 9d28d7ec808469ec129b66c69b9e63cd9537a63f
Author: ramboman <email address hidden>
Date: Wed Apr 22 21:33:22 2020 +0800

    replace the "hide_hypervisor_id" to "hw:hide_hypervisor_id"

    When we use the flavor extra_specs "hide_hypervisor_id" in
    AggregateInstanceExtraSpecsFilter, then will retrun False.
    So we need correct the extra_specs.

    Change-Id: I9d8d8c3a30cf6da7e8fb48374347e069ab075df2
    Closes-Bug: 1841932
    (cherry picked from commit bf488a8630702160021b5848bf6e86fbb8015205)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 21.1.2

This issue was fixed in the openstack/nova 21.1.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.6.1

This issue was fixed in the openstack/nova 20.6.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.