ComputeCapabilitiesFilter does not play well with baremetal driver

Bug #1129485 reported by devananda on 2013-02-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
devananda
devstack
Undecided
devananda
openstack-manuals
Medium
Tom Fifield

Bug Description

When using the baremetal driver and the default nova scheduler filters, the scheduler will always fail to find a suitable host.

In order to provision baremetal instances, a separate deploy kernel & ramdisk are used. These may be specified on the flavor (instance_type) extra_specs so that an environment may support different deployment processes (eg, for multiple arch support). However, the ComputeCapabilitiesFilter will not match the instance_type to any compute host that is not publishing all of the same extra_specs, and in the baremetal case, none of the compute hosts publish these deploy_kernel / deploy_ramdisk specs (precisely because they are properties of the flavor, not the compute host).

The simple workaround for this is to disable the ComputeCapabilitiesFilter, while leaving all other default filters enabled.

devananda (devananda) on 2013-02-18
tags: added: baremetal
Russell Bryant (russellb) wrote :

It sounds like the baremetal driver is using instance_type extra_specs differently than the rest of nova and that is the bug here. What are your thoughts on fixing this to avoid the brute force workaround?

Changed in nova:
assignee: nobody → Devananda van der Veen (devananda)
status: New → Triaged
importance: Undecided → Medium
milestone: none → grizzly-rc1
devananda (devananda) wrote :

I believe we will need one or more new scheduler filters for baremetal deployments anyway, eg one that does exact flavor<=>host CPU/ram/disk matching. So, that being said, is there anything wrong with requiring deployers to change the scheduler filters for bare metal? If that's acceptable, then this just needs to be documented.

An alternative might be (if this is even possible) to have the CCF behave slightly differently for different hypervisor, but I don't feel like changing or writing a new scheduler filter is a good thing to do at this point (iow, until Havana opens).

Any other ideas?

What about having it just ignore things in the baremetal case, rather
than requiring operators to implement 'if baremetal change this other
setting' logic en masse.

devananda (devananda) wrote :

A few of us talked today, and I think we all agreed that (at least for the time being) it just needs to be documented clearly somewhere that ComputeCapabilityFilter should be disabled when using the baremetal compute driver. Baremetal is, in some ways, a special case already...

I've noted this in the wiki here:
  https://wiki.openstack.org/wiki/GeneralBareMetalProvisioningFramework#Configuration_Changes

Since baremetal doesn't have its own doc book yet, I would suggest that this also be annotated in openstack-manuals, perhaps on (the grizzly equivalent of) this page:
  http://docs.openstack.org/folsom/openstack-compute/admin/content/scheduler-filters.html

Hans Lindgren (hanlind) wrote :

I think adding a scope to the extra spec names would solve the problem.

Ex:
baremetal:deploy_kernel_id
baremetal:deploy_ramdisk_id

Support for this was introduced in the filters to solve a similar problem with the trusts filter, see https://github.com/openstack/nova/commit/851705db9596a418b0ea3928654e88fe84a23e52

Tom Fifield (fifieldt) on 2013-03-01
Changed in openstack-manuals:
status: New → Confirmed
milestone: none → grizzly
importance: Undecided → Medium
devananda (devananda) wrote :

Hans, thanks for pointing that out. I tested it locally and name spaces for extra_specs does seem to solve this! Cheers :)

devananda (devananda) wrote :

I'm going to re-target this bug to devstack, since devstack is populating the extra_specs when it loads the deploy images, post a patch for that, and post a two-line patch for Nova to read the now-namespaced extra specs.

I will also note the proper naming of these extra_specs in the baremetal docs in the compute admin guide and wiki.

Fix proposed to branch: master
Review: https://review.openstack.org/23453

Changed in nova:
status: Triaged → In Progress
Changed in devstack:
assignee: nobody → Devananda van der Veen (devananda)
status: New → In Progress

Reviewed: https://review.openstack.org/23454
Committed: http://github.com/openstack-dev/devstack/commit/2920b7decc6769707ea45552c94864701c55988e
Submitter: Jenkins
Branch: master

commit 2920b7decc6769707ea45552c94864701c55988e
Author: Devananda van der Veen <email address hidden>
Date: Mon Mar 4 11:47:14 2013 -0800

    Stash baremetal deploy image IDs in a namespace.

    Baremetal PXE driver should read deploy_kernel_id & deploy_ramdisk_id
    from the 'baremetal:' namespace within instance_type['extra_specs']
    so that it doesn't conflict with ComputeCapabilitiesFilter any more.

    This allows nova-compute to use ComputeCapabilitiesFilter with baremetal
    again. For this filter to properly match the baremetal ndoe's RAM,
    we need to tune two other Nova options for physical hardware.

    Fixes bug 1129485.

    Change-Id: I0aa1b0fef8ac59a06217577af8c747437d2d6bf5

Changed in devstack:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/23453
Committed: http://github.com/openstack/nova/commit/813ed1b61de925f0385527aac096e88ea87c8802
Submitter: Jenkins
Branch: master

commit 813ed1b61de925f0385527aac096e88ea87c8802
Author: Devananda van der Veen <email address hidden>
Date: Mon Mar 4 11:03:40 2013 -0800

    Read baremetal images from extra_specs namespace.

    Baremetal PXE driver should read deploy_kernel_id & deploy_ramdisk_id
    from the 'baremetal:' namespace within instance_type['extra_specs']
    so that it doesn't conflict with ComputeCapabilitiesFilter any more.

    Fixes bug 1129485.

    Change-Id: I84b3acb2ed83dc2b1ff8f1a21ca1d95f7d25751a

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2013-03-20
Changed in nova:
status: Fix Committed → Fix Released
Tom Fifield (fifieldt) on 2013-03-29
Changed in openstack-manuals:
assignee: nobody → Tom Fifield (fifieldt)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/25692
Committed: http://github.com/openstack/openstack-manuals/commit/cec20446b68f7ab1864117613f37910115178fc2
Submitter: Jenkins
Branch: master

commit cec20446b68f7ab1864117613f37910115178fc2
Author: Tom Fifield <email address hidden>
Date: Fri Mar 29 16:11:17 2013 +0800

    note disable ComputeCapabilitiesFilter w baremetal

    As noted in the bug report, using ComputeCapabilitiesFilter
    with baremetal deployments breaks things. This patch adds a note
    requesting that users disable it in this scenario.

    fixes bug 1129485

    Change-Id: I0fb39fc2d2c2255b803502fb83b8d95b7b6188e8

Changed in openstack-manuals:
status: In Progress → Fix Released
Thierry Carrez (ttx) on 2013-04-04
Changed in nova:
milestone: grizzly-rc1 → 2013.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers