TrustedFilter issue with ComputeCapabilitiesFilter and AggregateInstanceExtraSpecsFilter

Bug #1039386 reported by Mark McLoughlin
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Joe Gordon

Bug Description

It appears impossible to use TrustedFilter while ComputeCapabilitiesFilter or AggregateInstanceExtraSpecsFilter is configured.

The issue is that if you set trusted_host=trusted in extra_specs then the other extra_specs based filters will reject all hosts because they do not have trusted_host set in their capabilities dict.

I guess there are a few ways to fix this - (a) have the other filters ignore trusted_host explicitly or (b) always set trusted_host=trusted in the host capabilities dict or (c) have the scheduler set trusted_host in the capabilities dict directly and remove the extra filter.

Revision history for this message
Don Dugger (n0ano) wrote :

I don't really like b or c, having the scheduler implicitly set an attribute that it doesn't know about (and will probably be an incorrect value) just doesn't seem right.

I would prefer a and I'll create a patch for it but I do have a more radical idea I'd like to propose. How about we add a new colum to the `extra_specs' table that can be used to specify the scope of the entry? We put entries in this colum like `trust' to indicate that the row is only used by the trusted filter, `compute' for the ComputeCapabilitiesFilter, `*' to indicate the row is global and used by all. I suggest this because I know there are other usages of the `extra_specs' that people are considering so I'm afraid there will be other conflicts that come up.

Revision history for this message
Mark McLoughlin (markmc) wrote :

(a) is my least favorite option - it's a special case dependency between two filters. Ugly. Especially when you think about people adding 3rd party filters which use extra_specs

This "scope" idea - would it be something that is exposed all the way out to the user setting extra specs?

You're basically suggesting extra specs, I guess - e.g. we could say that ':' is a namespace separator and rename trusted_host to trusted:trusted_host or something. If no namespace is specified, then it relates to the hypervisor-reported capabilities

Revision history for this message
Don Dugger (n0ano) wrote :

Yeah, I'm not really happy with a, b or c either. b & c leave you with a problem waiting to happen when 3rd party filters utilize `extra_specs'.

I'm not wedded to a new colum, we could embed the scope into the key with a separator but I'm not sure that that buys you much. All the filters will still have to check for scope, the trust filter to find the right row and the compute and aggregate filters will have to ignore rows that have a scope. Putting the scope in the key would avoid changing user interface tools but there aren't that many tools that would be affected yet (and I know who's working on them and can get them to change the tools).

Bottom line is I think we need to put in some sort of scope, it's just a matter of how we do it.

Revision history for this message
Jinwoo "Joseph" Suh (jsuh) wrote :

I thought about this issue and it may not be actually a bug. For example, if I set "ngpu == 2" in extra_specs, any machine that does not have ngpu=2 must be rejected. Likewise, if I set "trusted_host == trusted" in extra_specs, any machine that does not have "trusted_host == trusted" must be rejected. So, simply setting "trusted_host=trusted" on a trusted machine would solve the problem. A problem at this time is that there is no easy way to add it to capability. But, we may add that function relatively easily if we aim a simple addition. An example is using a flag and the value from the flag is added to capability (that was implemented in early patches in https://review.openstack.org/#/c/8089/).

At the same time, Patrick raised the issue of scope, and that is a good point as Don discussed. So, having a well designed scope is a good solution, too.

Revision history for this message
Don Dugger (n0ano) wrote :

Unfortunately, I think there's still a problem. Imaging you want to only schedule trusted nodes that have `ngpu==2'. If you specify both those constraints in the `extra_specs' the TrustedFilter plugin will correctly select all trused nodes but the ComputeCapabilitiesFilter will reject all nodes (whether or not the ngpu's is correct) as it will fail on the `trusted' specification. I think we do need to provide a scope for the entries in the `extra_speccs' table.

Revision history for this message
Mark McLoughlin (markmc) wrote :

Ok, since adding a scope would be an incompatible API change, I'm raising this issues importance and adding to folsom-rc1

However, unless someone picks it up, it could well be delayed until grizzly and we'll have to support in some hacky compatible way

Including the scope in the key name itself seems to me to be the easiest way to get this resolved in grizzly

Changed in nova:
importance: Low → High
milestone: none → folsom-rc1
Revision history for this message
Don Dugger (n0ano) wrote :

In that case I'll get a patch done as soon as possible to implement `a.b.key' for the key name. I can't promise anything until next week (I'm at XenSummit/Linuxcon this week) but I'll get out something soon.

We do need to work out how detailed we want the scope to be. Do we want something like `Scheduler.ComputeCapabilities' (on the off chance that some somponent other than the scheduler might want to use `extra_specs' and specifying a componet might be useful) or is just `ComputeCapabilities' (effectively an arbitrary string) sufficient?

Revision history for this message
Mark McLoughlin (markmc) wrote :

As I said, I think "trusted:" is fine

The default scope be compute capabilities - i.e. if you're just matching against compute capabilities, you don't need to specify a scope. We need this for backwards compat.

Revision history for this message
Patrick Petit (patrick-michel-petit) wrote :

Sorry if I am catching up with this thread a little late. Either there is something big that I am not getting or there is something wrong in the filter chain design. Basically, I don't see the point of associating a scope like described above with an extra_spec because all filters should play a different role in the chain and so I would argue that if an extra_spec is unknown by filter, then it should not attempt to enforce any about it. That's none of its business sort of speak. Not doing so would result in binding the processing logic of a filter that is a priori left open to the designer of the filter with a prescriptive scope encoded in the extra_spec name that says "you filter are expected to do something about it because the admin said it's in your scope" or conversely "you filter are not expected to do something about it because the admin said it's not in your scope". That's not right in my opinion.

So, in the initial bug description I would just say that ComputeCapabilitiesFilter and AggregateInstanceExtraSpecsFilter should simply ignore trusted_host=trusted if it's not in their capabilities dict to manage.

Still there is a potential issue of name space clashing for the extra_specs, but what think is missing the most a mechanism by which a filter comes in pair with a hook to dynamically populate the capabilities dict that the filter understands. The name space for the extra_specs could then just be a convention with no particular semantic associated with it.

Revision history for this message
Mark McLoughlin (markmc) wrote :

> if an extra_spec is unknown by filter, then it should not attempt to enforce any about it

With regard to compute capabilities, the semantics are that if an extra spec isn't advertised by a compute node, the compute node doesn't pass the filter

i.e. the filter has no way to distinguish between "can possibly advertised as a compute capability, but isn't advertised for this hosts" and "not a valid compute capability"

We're using the scope to allow the filter distinguish between the two cases - anything not in the compute capability scope falls under "not a valid compute capability"

Revision history for this message
Mark McLoughlin (markmc) wrote :

> In that case I'll get a patch done as soon as possible to implement
> `a.b.key' for the key name. I can't promise anything until next week
> (I'm at XenSummit/Linuxcon this week) but I'll get out something soon.

Setting assignee to Don

Changed in nova:
assignee: nobody → Donald Dugger (n0ano-ddd)
assignee: Donald Dugger (n0ano-ddd) → Don Dugger (n0ano)
Changed in nova:
assignee: Don Dugger (n0ano) → Donald Dugger (n0ano-ddd)
status: Triaged → In Progress
Revision history for this message
Mark McLoughlin (markmc) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/12383
Committed: http://github.com/openstack/nova/commit/851705db9596a418b0ea3928654e88fe84a23e52
Submitter: Jenkins
Branch: master

commit 851705db9596a418b0ea3928654e88fe84a23e52
Author: Don Dugger <email address hidden>
Date: Tue Sep 4 15:58:57 2012 -0600

    Add scope to extra_specs entries

    Do to conflicts between different scheduler filters it is necessary
    to allow an optional scope for the different keys. This scope is a
    leading string followed by a ':' character, e.g.:

            foo:bar

    is the `bar' entry with scope `foo'.

    The Trusted filter will now use the scope `trust' and the
    AggregateInstanceExtraSpecs and ComputeCapabilities filters will
    check against any unscoped keys in the `extra_specs' table.
    Any new filters that utilize the `extra_specs' table will need to
    use a unique scope string for any keys they require.

    Resolves bug 1039386

    Change-Id: I2466dc3d4de8e9aeb76b294eeda1c939c0413366
    Signed-off-by: Don Dugger <<email address hidden>

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12480

Changed in nova:
assignee: Donald Dugger (n0ano-ddd) → Joe Gordon (joe-gordon0)
status: Fix Committed → In Progress
Thierry Carrez (ttx)
Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
status: Fix Committed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/12480
Committed: http://github.com/openstack/nova/commit/847fb38c5b0411f9f5b9629967d0072d976b2429
Submitter: Jenkins
Branch: master

commit 847fb38c5b0411f9f5b9629967d0072d976b2429
Author: Joe Gordon <email address hidden>
Date: Wed Sep 5 17:28:38 2012 -0700

    Add documentation for scheduler filters scope

    Add documentation for bug 1039386

    Fix other devref typos

    Change-Id: Ife19ee5feb72dd75b91a03c4167d8f4e578a4a04

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.