Seed VM filtering out compute resources

Bug #1213967 reported by Derek Higgins on 2013-08-19
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Critical
Robert Collins
tripleo
Critical
Unassigned

Bug Description

At some stage last week nova started failing to boot baremetal instances, from the looks of it they are being filtered out by the compute_capabilities_filter

This workaround resolves the problem
diff --git a/scripts/setup-baremetal b/scripts/setup-baremetal
index 0a1e46d..132cbc8 100755
--- a/scripts/setup-baremetal
+++ b/scripts/setup-baremetal
@@ -20,6 +20,6 @@ deploy_ramdisk_id=$(glance image-create --name bm-deploy-ramdisk --public \

 nova flavor-delete baremetal || true
 nova flavor-create baremetal auto $2 $3 $1
-nova flavor-key baremetal set "cpu_arch"="$arch" \
+nova flavor-key baremetal set \
     "baremetal:deploy_kernel_id"="$deploy_kernel_id" \
     "baremetal:deploy_ramdisk_id"="$deploy_ramdisk_id"

Although its probably not the correct fix,

I think the problem was caused by this merge https://review.openstack.org/#/c/40994/5
as cap.cpu_arch doesn't exist

Changed in tripleo:
status: New → Incomplete
status: Incomplete → Triaged
importance: Undecided → Critical
Changed in nova:
status: New → Triaged
importance: Undecided → Critical
tags: added: baremetal regression
Robert Collins (lifeless) wrote :

I've confirmed this is broken but not yet confirmed that reverting that commit fixes it.

Robert Collins (lifeless) wrote :

Reverting that commit did not fix the problem for me.

Derek Higgins (derekh) wrote :

According to my bisect
a4ad62ac9fdacecfbc5229e688a7d937d177889b is the first bad commit
which is the one linked above, maybe simply reverting that single commit isn't enough?

Thierry Carrez (ttx) on 2013-08-20
Changed in nova:
milestone: none → havana-3
Sylvain Bauza (sylvain-bauza) wrote :

I do confirm the workaround provided in description allows the scheduler to find an host.

Robert Collins (lifeless) wrote :

Huh, very odd. Can you confirm that reverting that one commit isn't enough to fix it.

Robert Collins (lifeless) wrote :

OK, so resetting to 6403c9df585b4fd897acdf4fbc535c68ac0a2531 fixes the issue; so it may be more than one commit...

Robert Collins (lifeless) wrote :

Resetting to bfebb360f3ba11c63d5a5562b8730689d28f6b4f also fixes it...

Robert Collins (lifeless) wrote :

(ah, thats the merge of 6403 into trunk). So - still need to track down what other commits need to be reverted.

Robert Collins (lifeless) wrote :

ok, I must have fluffed it, just reverting that one commit does fix it.

Changed in nova:
status: Triaged → In Progress
assignee: nobody → Robert Collins (lifeless)

Reviewed: https://review.openstack.org/43235
Committed: http://github.com/openstack/nova/commit/7d8a3b956b97591c8f79327e3bc6583587f53ec5
Submitter: Jenkins
Branch: master

commit 7d8a3b956b97591c8f79327e3bc6583587f53ec5
Author: Robert Collins <email address hidden>
Date: Tue Aug 20 22:33:07 2013 +1200

    Revert "Make compute_capabilities_filter use ..."

    This reverts commit a4ad62ac9fdacecfbc5229e688a7d937d177889b.

    Nova baremetal architecture scheduling was broken by it. While
    workarounds exist it is backwards incompatible, and baremetal is
    supported since Grizzly.

    Fix bug: 1213967

    Change-Id: I319b8a17f3ae7a3b527d388c6ff2954c0bcc0108

Joe Gordon (jogo) wrote :

After looking at this further, it looks like the issue is with 'CONF.baremetal.instance_type_extra_specs' which is used to advertise cpu_arch back to the scheduler. But that config option allows for arbitrary key value pairs to be sent back. Unfortunately with https://blueprints.launchpad.net/nova/+spec/no-compute-fanout-to-scheduler we want to remove this RPC and use the database, so the values from CONF.baremetal.instance_type_extra_specs can be stored in the ComputeNodeStat table and made available in compute_capabilities_filter

Robert Collins (lifeless) wrote :

So from a compat viewpoint we just need:
- the existing *setting* on bm nodes to keep working for H
- use of that setting to log a deprecation warning
- a new thing, whatever it is, to also work during H

Then for I the existing setting can be removed.

Thierry Carrez (ttx) on 2013-09-03
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2013-09-05
Changed in nova:
status: Fix Committed → Fix Released
Changed in tripleo:
status: Triaged → Fix Released
Thierry Carrez (ttx) on 2013-10-17
Changed in nova:
milestone: havana-3 → 2013.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers