Fail to createVM with extra_spec using ComputeCapabilitiesFilter

Bug #1279719 reported by wingwj
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Dimitri Mazmanov

Bug Description

Fails to createVM with extra_spec using ComputeCapabilitiesFilter, the scheduler will always fail to find a suitable host.

------------

Here's the test steps:

1. Create an aggregate, and set its metadata, like ssd=True.

2. Add one host to this aggregate.

3. Create a new flavor, set extra_spcs like ssd=True.

4. Create a new VM using this flavor.

5. Creation failed due to no valid hosts.

-------------
Let's look at the codes:
In ComputeCapabilitiesFilter, it'll match hosts' capacities with extra_spec.

Before in Grizzly, there's a periodic_task named '_report_driver_status()' to report hosts' capacities.
But in Havana, the task is canceled. So the capacities won't be updated, the value is always 'None'.

So, if you boot a VM with extra_spec, those hosts will be filtered out.
And the exception will be raised.

-----------------

Some observations with this filter:
1- only first level properties can be used without 'capabilities' scope
This will be correct:
    hypervisor_type = QEMU
This will fail:
    cpu_info:features <in> aes
    cpu_info:vendor = Intel

From the docs:

ComputeCapabilitiesFilter

Matches properties defined in an instance type's extra specs against compute capabilities.

If an extra specs key contains a colon ":", anything before the colon is treated as a namespace, and anything after the colon is treated as the key to be matched. If a namespace is present and is not 'capabilities', it is ignored by this filter.

2- If you use both filters, ComputeCapabilitiesFilter and AggregateInstanceExtraSpecFilter, you can't use non-scoped extra_specs. That decisions were made here:
          https://bugs.launchpad.net/nova/+bug/1037503
          https://bugs.launchpad.net/nova/+bug/1039386
          http://www.gossamer-threads.com/lists/openstack/dev/18355

3- cpu_info data is loaded as unicode in HostState. ComputeCapabilitiesFilter fails to get attributes from this property.
This will fail:
    capabilities:cpu_info:features <in> aes

This last one is addressed here:
     https://bugs.launchpad.net/nova/+bug/1331176

Tags: scheduler
Revision history for this message
Leandro Ignacio Costantino (leandro-i-costantino) wrote :

I can reproduce this too.
But using AggregateInstanceExtraSpecFilters seems to work ok.

Revision history for this message
wingwj (wingwj) wrote :

So, should we need to add some notes for ComputeCapabilitiesFilter in Nova codes?

Like this in compute_capabilities_filter.py:
---------
class ComputeCapabilitiesFilter(filters.BaseHostFilter):
    """HostFilter hard-coded to work with InstanceType records.
         Note: This filter will not work with aggregate/extra_spec in flavor yet."""

Changed in nova:
assignee: nobody → Juan Manuel Ollé (juan-m-olle)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/74071

Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → Low
Revision history for this message
Juan Manuel Ollé (juan-m-olle) wrote :

According to this Bug, if the needs is to filter on aggregates extra_spec the AggregateInstanceExtraSpecFilters is available for use.

After a blue print, the _report_driver_status() was removed.

Is it Correct to add host capability with the aggregate capability only because that host is added to that aggregate?
If that behaviour is the expected, the host capabilities could be updated in the "Add one host to aggregate"

The other approach is to duplicate the AggregateInstanceExtraSpecFilters to extract the capabilities from the aggregate that the host belong but I don't think this is a good idea.

Changed in nova:
assignee: Juan Manuel Ollé (juan-m-olle) → Facundo Maldonado (facundo-n-maldonado)
Revision history for this message
Facundo Maldonado (facundo-n-maldonado) wrote :

There are two problems here. One, is the fact that the capabilities are not updated because the periodic task has been removed. That is not a problem if the capabilities needed are those explosed by the hypervisor. But if there is something that has to be added manually, that will never be updated in nova.
In the other hand, there is a problem with the host state when it is updated from compute_node. The cpu_info is loaded as unicode string and that breaks the filter because it expect a dict.

Fixing that this types of extra_specs are working:
- (capabilities:cpu_info:vendor, Intel)
- (capabilities:cpu_info:topology:cores, 2)
- (capabilities:cpu_info:features, <in> rdtscp)

description: updated
description: updated
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/89844

Revision history for this message
Shuangtai Tian (shuangtai-tian) wrote :

I also meet this bug, I think this should be merged as soon as possible.

description: updated
description: updated
Revision history for this message
Facundo Maldonado (facundo-n-maldonado) wrote :

Separate the encoding issue into another bug based on reviews feedback.
https://bugs.launchpad.net/nova/+bug/1331176

Revision history for this message
jiang, yunhong (yunhong-jiang) wrote :

There is a documentation bug at https://bugs.launchpad.net/openstack-manuals/+bug/1330962 and a patch submitted for the doc: https://review.openstack.org/101640

Also, I think for the unscoped namespace, we should enhance the API so that if user try to create a unscoped extra_spec, we should fail (of course, only w/ an extension).

Thanks
--jyh

Revision history for this message
Sean Dague (sdague) wrote :

No longer in progress. Unclear if the other patches that landed have fixed this. Please confirm that this is still something we need to address.

Changed in nova:
assignee: Facundo Maldonado (facundo-n-maldonado) → nobody
status: In Progress → Incomplete
Revision history for this message
Darren Carpenter (wdarrenc) wrote :

I can confirm this issue still exists in '2014.1.3'

I created a custom flavor with no extra_specs set and builds progressed with the --availability-zone <zone> flag without issue. After adding extra_specs, immediate error on nova boot with ComputeCapabilities filter narrowing down to 0 hosts.

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

I reproduced this bug in Juno.
And I tried to make specs specific to the filter, like:
aggregate_instance_extra_specs:ssd
aggregate_instance_extra_specs:ceph

But it is not clear for me, why it is still fails with the same error
"Filter ComputeCapabilitiesFilter returned 0 hosts "

Did I understand right that compute_capabilities_filter.py (https://github.com/openstack/nova/blob/stable/juno/nova/scheduler/filters/compute_capabilities_filter.py) should ignore all specs not in "capabilites" scope, and return all hosts, like if specs are empty?

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

This bug is still valid, please reopen.

Changed in nova:
status: Expired → Confirmed
Changed in nova:
assignee: nobody → Satyanarayana Patibandla (satya-patibandla)
Revision history for this message
Dimitri Mazmanov (sorantis) wrote :

I've been looking at this issue and perhaps I'll repeat the obvious, but still, here it is:

Both ComputeCapabilitiesFilter and AggregateInstanceExtraSpecsFilter equally handle the case when there's no scope format defined in extra_specs. That's why if both filters are enabled then scheduler will fail to pick a host because what's right for an aggregate can be wrong for a compute host.

One proposal could be to leave it as it is and then update the docs stating that the two filters must not be used together.
Another approach would be to force a certain scope format on one of the filters, for example apply ComputeCapabilitiesFilter only in case extra_specs has been written like capabilities:key:value. This will result in a small change in the compute_capabilities_filter.py file.
Since there was no update on this bug for almost a month I can start working on it.

Changed in nova:
assignee: Satyanarayana Patibandla (satya-patibandla) → Dimitri Mazmanov (sorantis)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/177824

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/177824
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=193ca209e79383942858a30aed704370375951fe
Submitter: Jenkins
Branch: master

commit 193ca209e79383942858a30aed704370375951fe
Author: Dimitri Mazmanov <email address hidden>
Date: Mon Apr 27 17:32:12 2015 +0200

    Fix documentation for scheduling filters

    ComputeCapabilitiesFilter will fail to pick a host if used
    along with AggregateInstanceExtraSpecsFilter. This fix will at least
    warn the users that they shouldn't use the two filters together.

    Change-Id: I98b1eef6484bffc4305ff84e8badbde7992132ed
    Fixes-Bug: #1279719

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.