no way to audit baremetal node instance allocation for correctness

Bug #1096723 reported by Robert Collins
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Vish Ishaya

Bug Description

The nova baremetal database records the instance uuid a baremetal node has, and this can get out of sync with actual instances allocated by nova. There is currently no mechanism to detect or correct this. (See bug 1096722 for an example of this occuring)

Tags: baremetal
aeva black (tenbrae)
Changed in nova:
assignee: nobody → Devananda (devananda)
Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/21564

Revision history for this message
aeva black (tenbrae) wrote :

I dug into this more and found that ComputeManager._running_deleted_instances is not able to detect running baremetal instances. ComputeManager relies on the hypervisor driver to return a list of the names of all running instances on a host via driver.list_instances(). However, the baremetal hypervisor driver never returns the name of a deleted instance because it must call out to the virtapi to get the instance name!

I am going to fix this by caching the instance name on bm_nodes table.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/21605

Changed in nova:
assignee: Devananda van der Veen (devananda) → Vish Ishaya (vishvananda)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/21481
Committed: http://github.com/openstack/nova/commit/c20110d15be37948ddd9ef5f38001328aabf5b1d
Submitter: Jenkins
Branch: master

commit c20110d15be37948ddd9ef5f38001328aabf5b1d
Author: Devananda van der Veen <email address hidden>
Date: Tue Feb 19 12:39:55 2013 -0800

    Add better status to baremetal deployments.

    This patch introduces a few new baremetal states, which are used to
    track the deploy process. Now, nova-baremetal-deploy-helper updates the
    bm_nodes record directly when it begins and finishes deploying an image
    to that node.

    The next patch will add a LoopingCall inside driver.spawn() to wait for
    the deploy to complete.

    Also, since there can not be >1 active deployment per node, there
    is no need to have a separate table for storing them. This patch drops
    the table bm_deployments and adds the important information it contained
    to bm_nodes. Since the previous behavior was to mark a deployment as
    deleted once it completed, there is no need to copy any data from
    bm_deployments prior to dropping the table -- assuming that no active
    deployments are in process when the migration is run.

    Since this is the first migration for the baremetal database, it also
    adds a new test class, TestBaremetalMigrations, and refactors the
    test_migrations.py file to allow for multiple test classes.

    partially implements fix for bug 1096723

    Change-Id: Iad30b462d49c88fc19babed43a2fb8540b1fad30

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/21605
Committed: http://github.com/openstack/nova/commit/63285645c72d3a5dcb232b5a129cf99602f2a607
Submitter: Jenkins
Branch: master

commit 63285645c72d3a5dcb232b5a129cf99602f2a607
Author: Devananda van der Veen <email address hidden>
Date: Sun Feb 10 12:49:53 2013 -0800

    Baremetal driver returns accurate list of instance

    Add 'instance_name' to bm_nodes table so that baremetal driver is able
    to return the names of all instances it believes are still running.

    Previously, baremetal.driver.list_instances was fetching all allocated
    instances from baremetal database, then calling VirtAPI to get the
    instance name. This would raise an InstanceNotFound exception for
    deleted instances. This prevented ComputeManager from ever detecting
    a running-but-deleted baremetal instance, and could leave baremetal
    instances in an undeletable state.

    Fixes bug 1096723.

    Change-Id: Ifae532e8e70e97e48c589608cb3c7000bb6a7609

Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-3 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.