deploy / delete fragility

Bug #1184445 reported by Robert Collins
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
High
Unassigned
tripleo
Invalid
High
Unassigned

Bug Description

We have 30 machines we're testing on. Sometimes they fail to deploy, or fail to be deleted. Othertimes they report as failed, but actually did power off / did deploy correctly.
So we have actual fails and false reports of failures.

Data gathered so far:
machine fault instance uuid notes
freecloud33 'active deleting' f7862b82-268d-4971-b961-a8fe51488b21
freecloud35 'active deleting' d3d7d58f-408c-47ff-993a-4b8327f27541
freecloud32 'build spawning none' d01059f8-97ab-4f0a-968b-7411b2ab717c
freecloud12 'active deleting'/hung iLO 28ba32b4-04d1-4aa7-9f7e-283401c5d2a5 had wrong instance (compute2) - perhaps a prior scheduler retry or something?
freecloud22 'build spawning nostate', stuck in deploy ramdisk ed634e44-5fa3-43a6-baf5-7d5d15ec7cff
freecloud18 'build spawning nostate' 64a96d33-bf89-44bd-817f-c62053c1eb91 power reset -> went active
freecloud31 'active deleting running' bb35ffdf-9fae-4e23-8e46-ec76b89c1ce4 - was still running with it's IP.
freecloud25 'build spawning nostate'/stuck on 'Boot failed: press a key to retry, or wait for reset...' 6d9a3e09-be55-4f7e-8da1-843e85c687df power reset -> went active
freecloud38 'active deleting running' 091264f9-830b-4279-92e3-20ff56375973 was active on the IP the instance had
freecloud36 'active deleting running' 30405362-c307-428a-94c5-dbe6284b8f28 is powered off
freecloud34 'active deleting running' 3f0cdb8f-70ae-43f7-bb98-83c48f5da317 is powered off
freecloud37 'active deleting running' 54fb06f0-325c-4d98-9a54-2ab4d3ab9794 is stuck in graphics mode -> prob deployramdisk
freecloud26 'build spawning nostate' 8b80ff43-ba81-44c6-a22a-fffd6034579a stuck on 'Attempting Boot From Hard Drive (C:)' [after boot-from-nic]. power reset brought it up, but nova still thinks it's build spawning nostate
freecloud30 'active deleting running' cd715548-afd7-4342-8c74-b4d5e5984dd6 stuck in deploy ramdisk
---nova delete---
cleared all but | ed633e44-5fa3-43a6-baf5-7d5d15ec7cff | compute-test2.NovaCompute0.NovaCompute | BUILD | deleting | NOSTATE | |
powered ed633 on, and the delete was processed.

Tags: baremetal
tags: added: baremetal
aeva black (tenbrae)
Changed in nova:
status: New → Triaged
importance: Undecided → High
Revision history for this message
aeva black (tenbrae) wrote :
Revision history for this message
Robert Collins (lifeless) wrote :

Another deploy case - active spawning nostate, with the machine hung in the bios - 'Booting from NIC\nBooting from Hard disk (C:)' and no output, or timeout.
pxe_deploy_timeout is disabled in our config.

Revision history for this message
Robert Collins (lifeless) wrote :

Two more occurences of fragile-while-deploy:

Revision history for this message
Robert Collins (lifeless) wrote :
Revision history for this message
Robert Collins (lifeless) wrote :

https://bugs.launchpad.net/tripleo/+bug/1183646 log details for the timeout case.

Revision history for this message
Robert Collins (lifeless) wrote :

http://paste.ubuntu.com/5708254/ the one stuck in scheduling
http://paste.ubuntu.com/5708250/ the one that timed out

Changed in nova:
status: Triaged → Invalid
Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.