Seed VM filtering out compute resources

Bug #1213967 reported by Derek Higgins
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Robert Collins
tripleo
Fix Released
Critical
Unassigned

Bug Description

At some stage last week nova started failing to boot baremetal instances, from the looks of it they are being filtered out by the compute_capabilities_filter

This workaround resolves the problem
diff --git a/scripts/setup-baremetal b/scripts/setup-baremetal
index 0a1e46d..132cbc8 100755
--- a/scripts/setup-baremetal
+++ b/scripts/setup-baremetal
@@ -20,6 +20,6 @@ deploy_ramdisk_id=$(glance image-create --name bm-deploy-ramdisk --public \

 nova flavor-delete baremetal || true
 nova flavor-create baremetal auto $2 $3 $1
-nova flavor-key baremetal set "cpu_arch"="$arch" \
+nova flavor-key baremetal set \
     "baremetal:deploy_kernel_id"="$deploy_kernel_id" \
     "baremetal:deploy_ramdisk_id"="$deploy_ramdisk_id"

Although its probably not the correct fix,

I think the problem was caused by this merge https://review.openstack.org/#/c/40994/5
as cap.cpu_arch doesn't exist

Changed in tripleo:
status: New → Incomplete
status: Incomplete → Triaged
importance: Undecided → Critical
Changed in nova:
status: New → Triaged
importance: Undecided → Critical
tags: added: baremetal regression
Revision history for this message
Robert Collins (lifeless) wrote :

I've confirmed this is broken but not yet confirmed that reverting that commit fixes it.

Revision history for this message
Robert Collins (lifeless) wrote :

Reverting that commit did not fix the problem for me.

Revision history for this message
Derek Higgins (derekh) wrote :

According to my bisect
a4ad62ac9fdacecfbc5229e688a7d937d177889b is the first bad commit
which is the one linked above, maybe simply reverting that single commit isn't enough?

Thierry Carrez (ttx)
Changed in nova:
milestone: none → havana-3
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

I do confirm the workaround provided in description allows the scheduler to find an host.

Revision history for this message
Robert Collins (lifeless) wrote :

Huh, very odd. Can you confirm that reverting that one commit isn't enough to fix it.

Revision history for this message
Robert Collins (lifeless) wrote :

OK, so resetting to 6403c9df585b4fd897acdf4fbc535c68ac0a2531 fixes the issue; so it may be more than one commit...

Revision history for this message
Robert Collins (lifeless) wrote :

Resetting to bfebb360f3ba11c63d5a5562b8730689d28f6b4f also fixes it...

Revision history for this message
Robert Collins (lifeless) wrote :

(ah, thats the merge of 6403 into trunk). So - still need to track down what other commits need to be reverted.

Revision history for this message
Robert Collins (lifeless) wrote :

ok, I must have fluffed it, just reverting that one commit does fix it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/43235

Changed in nova:
status: Triaged → In Progress
assignee: nobody → Robert Collins (lifeless)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/43235
Committed: http://github.com/openstack/nova/commit/7d8a3b956b97591c8f79327e3bc6583587f53ec5
Submitter: Jenkins
Branch: master

commit 7d8a3b956b97591c8f79327e3bc6583587f53ec5
Author: Robert Collins <email address hidden>
Date: Tue Aug 20 22:33:07 2013 +1200

    Revert "Make compute_capabilities_filter use ..."

    This reverts commit a4ad62ac9fdacecfbc5229e688a7d937d177889b.

    Nova baremetal architecture scheduling was broken by it. While
    workarounds exist it is backwards incompatible, and baremetal is
    supported since Grizzly.

    Fix bug: 1213967

    Change-Id: I319b8a17f3ae7a3b527d388c6ff2954c0bcc0108

Revision history for this message
Joe Gordon (jogo) wrote :

After looking at this further, it looks like the issue is with 'CONF.baremetal.instance_type_extra_specs' which is used to advertise cpu_arch back to the scheduler. But that config option allows for arbitrary key value pairs to be sent back. Unfortunately with https://blueprints.launchpad.net/nova/+spec/no-compute-fanout-to-scheduler we want to remove this RPC and use the database, so the values from CONF.baremetal.instance_type_extra_specs can be stored in the ComputeNodeStat table and made available in compute_capabilities_filter

Revision history for this message
Robert Collins (lifeless) wrote :

So from a compat viewpoint we just need:
- the existing *setting* on bm nodes to keep working for H
- use of that setting to log a deprecation warning
- a new thing, whatever it is, to also work during H

Then for I the existing setting can be removed.

Thierry Carrez (ttx)
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Changed in tripleo:
status: Triaged → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-3 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.