OpenStack Compute (nova)

bare metal node partitioning does not handle errors well

Bug #1088655 reported by Robert Collins on 2012-12-10

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Vish Ishaya	OpenStack Compute (nova) 2013.1 "grizzly"

Bug Description

When bare metal partitioning fails, ops needs to look in the log to determine the cause - but user requests (such as too-large swap sizes) can cause failures, and so users should get a decent error status set on their instance.

Tags:

Revision history for this message

aeva black (tenbrae) wrote on 2013-02-09:

I suspect this happens because of the division between nova-compute and nova-baremetal-deploy-helper. n-cpu calls driver.spawn() and marks the instances as ACTIVE once the machine powers on, while a separate process (nova-baremetal-deploy-helper) does the partitioning and image deployment. Merging these processes would probably resolve this bug.

Revision history for this message

aeva black (tenbrae) wrote on 2013-02-09:

Another approach would be for nova-baremetal-deploy-helper to record its progress in nova_bm.bm_deployments table.

Changed in nova:
assignee:	nobody → Devananda (devananda)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-02-09: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/21564

Changed in nova:
status:	Triaged → In Progress

OpenStack Infra (hudson-openstack) on 2013-02-20

Changed in nova:
assignee:	Devananda van der Veen (devananda) → Vish Ishaya (vishvananda)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-02-21: Fix merged to nova (master)

Reviewed: https://review.openstack.org/21564
Committed: http://github.com/openstack/nova/commit/48439b98a1a7ac2dded34c8899918773f70667f2
Submitter: Jenkins
Branch: master

commit 48439b98a1a7ac2dded34c8899918773f70667f2
Author: Devananda van der Veen <email address hidden>
Date: Fri Feb 8 20:36:19 2013 -0800

Wait for baremetal deploy inside driver.spawn

    Previously, baremetal driver.spawn returned as soon as the
    machine power turned on, but before the user-image was deployed to the
    hardware node, and long before the node was available on the network.
    This meant the nova instance was marked as ACTIVE before provisioning
    had actually finished. If the deploy failed and the baremetal node was
    set to an ERROR state, the nova instance could still be left as ACTIVE
    and the user was never informed of the error.

    This patch introduces a LoopingCall to monitor the deployment status in
    the baremetal database. As the deployment is performed by
    nova-baremetal-deploy-helper, the database record is updated. Once the
    deployment is complete, driver.spawn() sets the baremetal node status
    and the nova instance status is also set properly. If an error occurs
    during the deployment, an exception is raised within driver.spawn()
    allowing nova to follow the normal cleanup and notify paths.

This also allows the baremetal PXE driver to delete cached image files
when a baremetal deployment fails.

Fixes bug 1088655.

Change-Id: I4feefd462fd956c9780995ec8b05b13e78278c8b

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2013-02-21

Changed in nova:
milestone:	none → grizzly-3
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2013-04-04

Changed in nova:
milestone:	grizzly-3 → 2013.1

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1098694

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.