instances get stuck in 'BUILDING' sometimes

Bug #1178919 reported by Robert Collins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
aeva black
tripleo
Fix Released
High
Unassigned

Bug Description

We booted 10 instances at once and 1 had this happen; 20 at once and 3; 30 at once and 5.

Logs from the instance UUID in the original description....
/var/log/upstart/nova-compute.log.1.gz:2013-05-11 06:27:15,349.349 23016 INFO nova.compute.manager [-] [instance: 0a171cbe-0f3c-40d5-ae8d-606f1dde41ce] During sync_power_state the instance has a pending task. Skip.
ubuntu@foo:~$ date
Sat May 11 06:48:44 UTC 2013

we think this is fallout from nodes that were powered on for some reason when the deploy started. We're going to add a hard off into the code path.

Tags: baremetal
Revision history for this message
Robert Collins (lifeless) wrote :

nova thinks:

| 0a171cbe-0f3c-40d5-ae8d-606f1dde41ce | test-0a171cbe-0f3c-40d5-ae8d-606f1dde41ce | BUILD | spawning | NOSTATE | |

Changed in nova:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Robert Collins (lifeless) wrote :

Ok, we think we know the cause for this :

https://github.com/openstack/nova/blob/master/nova/virt/baremetal/ipmi.py#L200

-> we're failing to reboot machines that are currently powered on. E.g. if they fail to deploy for some reason, we leave them wedged forever.

description: updated
aeva black (tenbrae)
Changed in nova:
assignee: nobody → Devananda van der Veen (devananda)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/30650

Changed in nova:
status: Triaged → In Progress
aeva black (tenbrae)
Changed in nova:
milestone: none → havana-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/34420

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/34420
Committed: http://github.com/openstack/nova/commit/58fe97eebddc6693c05618b592f6ce961a9eaba1
Submitter: Jenkins
Branch: master

commit 58fe97eebddc6693c05618b592f6ce961a9eaba1
Author: Devananda van der Veen <email address hidden>
Date: Tue Jun 25 08:13:31 2013 -0700

    Baremetal ensures node is off before powering on

    During spawn(), ensure that a node is really off before trying to turn
    it on. This fixes bug 1178919, in which a node that had previously
    gotten stuck in a power-on state (eg, in the BIOS screen) would fail to
    spawn() because the baremetal driver would send a power-on request, the
    BMC would ignore it, and then baremetal driver would wait indefinitely
    for a DHCP request.

    Change-Id: Ie73e6ab488abe99c70ad3d149d702577941056d1

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Changed in tripleo:
status: Triaged → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-2 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.