Ironic: extra_spec requirement 'amd64' does not match 'x86_64'

Bug #1366859 reported by Dan Prince
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ironic
Won't Fix
Undecided
Dan Prince
OpenStack Compute (nova)
Fix Released
Critical
Dan Prince

Bug Description

Using the latest Nova Ironic compute drivers (either from Ironic or Nova) I'm hitting scheduling ERRORS:

Sep 08 15:26:45 localhost nova-scheduler[29761]: 2014-09-08 15:26:45.620 29761 DEBUG nova.scheduler.filters.compute_capabilities_filter [req-9e34510e-268c-40de-8433-d7b41017b54e None] extra_spec requirement 'amd64' does not match 'x86_64' _satisfies_extra_specs /opt/stack/venvs/nova/lib/python2.7/site-packages/nova/scheduler/filters/compute_capabilities_filter.py:70

I've gone ahead and patched in https://review.openstack.org/#/c/117555/.

The issue seems to be that ComputeCapabilitiesFilter does not itself canonicalize instance_types when comparing them which will breaks existing TripleO baremetal clouds using x86_64 (amd64).

Tags: ironic
Dan Prince (dan-prince)
Changed in nova:
importance: Undecided → Critical
Dan Prince (dan-prince)
Changed in nova:
assignee: nobody → Dan Prince (dan-prince)
Changed in nova:
status: New → In Progress
Revision history for this message
Daniel Berrange (berrange) wrote :

Ah, so when returning a dict from get_available_resources, the Ironic driver is reporting the compute host architecture in two places

 - The 'supported_instances' list ( [[<arch>, <vmmode>, <hvtype>]])
 - The 'cpu_arch' extra specs field

Then the filter is matching on the extra specs field.

The canonicalization of architecture and back compat workarounds I put in place were only targetting the 'supported_instances' list information.

The failure you're reporting here is because we canonicalized the data put into the extra spec, but don't canonicalize the data when checking it in the ComputeCapabilitiesFilter

I don't know much about Ironic, but I'm curious as to why it is reporting a cpu_arch extra specs field at all, given that we already have a well specified way to report the architecture via the 'supported_instances' list and the ImagePropertiesFilter filter which AFAICT should serve the same purpose.

So I can see 3 possible fixes here, in order of my preference

 - Remove the cpu_arch extra_specs entirely and just use ImagePropertiesFilter instead
 - Stop canonicalizing the data in the 'cpu_arch' extra specs field (but *still* canonicalize supported_instances)
 - Add hack to ComputeCapabilitiesFilter to canonicalize match data when looking at 'cpu_arch' extra_spec

Revision history for this message
Dan Prince (dan-prince) wrote :

I'm all for using the ImagePropertiesFilter but we should deal with this in a manner that doesn't break existing installations (particularly TripleO) if possible. Given existing DB data formats and such I was already considering and hacking on option #2 (Stop canonicalizing the data in the 'cpu_arch' extra specs field (but *still* canonicalize supported_instances)) so if we agree this is doable then I think that might be the best solution.

The hack to the ComputeCapabilitiesFilter seems really dirty given it is a generic extra_specs filter already... so it is also my least favorite.

Revision history for this message
Daniel Berrange (berrange) wrote :

Ok, so lets do Option 2 for Juno to fix the immediate problem in least effort way. Then we can suggest use of cpu_arch extra specs be deprecated in Juno, to be removed in Kilo, to give Triple0 time to adapt.

Dan Prince (dan-prince)
summary: - extra_spec requirement 'amd64' does not match 'x86_64'
+ Ironic: extra_spec requirement 'amd64' does not match 'x86_64'
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/120099

Dan Prince (dan-prince)
Changed in ironic:
assignee: nobody → Dan Prince (dan-prince)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/120107

Changed in ironic:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (master)

Change abandoned by Dan Prince (<email address hidden>) on branch: master
Review: https://review.openstack.org/120107
Reason: The Nova change is already posted here: https://review.openstack.org/#/c/120099/

I had posted this so as not to break CI... after discussing this on IRC it sounds like the plan is to subclass the new Nova drivers in the old Ironic locations so as not to break things.

Dan Prince (dan-prince)
Changed in ironic:
status: In Progress → New
Dan Prince (dan-prince)
Changed in ironic:
status: New → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/120555

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/120099
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5fd0500e7e262c602dc7bbbff456326598da41bc
Submitter: Jenkins
Branch: master

commit 5fd0500e7e262c602dc7bbbff456326598da41bc
Author: Dan Prince <email address hidden>
Date: Tue Sep 9 09:58:35 2014 -0400

    Ironic: don't canonicalize extra_specs data

    Don't canonicalize the cpu_arch extra_specs field. This is important
    because the scheduler filters which use extra specs don't canonicalize
    things either and as such you'll have mismatched fields causing
    instances not to get scheduled.

    This was recently changed as part of the Ironic -> Nova driver patch
    series. We still want to canonicalize supported_instances though...

    Change-Id: I9342213b5433113816142b1f737119065e9f077f
    Closes-bug: #1366859

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/120555
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f9b0b730e598c42c50b4c115fb3a4daa53b0ae61
Submitter: Jenkins
Branch: master

commit f9b0b730e598c42c50b4c115fb3a4daa53b0ae61
Author: Matt Riedemann <email address hidden>
Date: Wed Sep 10 12:21:36 2014 -0700

    Adds a test for raw_cpu_arch in _node_resource

    Commit 5fd0500e7e262c602dc7bbbff456326598da41bc
    makes it such that the cpu_arch is not
    made canonical for scheduling, this adds a unit
    test to cover the change.

    Change-Id: I89080a0178b1098c2a297b7268fad279ece680c2
    Related-Bug: #1366859

aeva black (tenbrae)
tags: added: ironic
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-rc1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.