PXE'd node may get stuck in DEPLOYING if ramdisk does not POST back

Bug #1270986 reported by aeva black
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Critical
Ghe Rivero

Bug Description

A node which has been deployed by the PXE driver may get stuck indefinitely in state DEPLOYING, if the ramdisk fails to POST correctly to trigger the vendor_passthru._continue_deploy() operations.

This is compounded because there is currently no management of timeouts internally, and no way to externally update the state of a node while a deploy is in progress, thus potentially rendering the node stuck.

aeva black (tenbrae)
Changed in ironic:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

I will give it some thought

Changed in ironic:
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
Changed in ironic:
assignee: Lucas Alvares Gomes (lucasagomes) → nobody
aeva black (tenbrae)
Changed in ironic:
milestone: none → icehouse-3
Ghe Rivero (ghe.rivero)
Changed in ironic:
assignee: nobody → Ghe Rivero (ghe.rivero)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/71297

Changed in ironic:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ironic (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/72395

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/71297
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=d5f30b0be73d57903e9033e21ebd4106fd735185
Submitter: Jenkins
Branch: master

commit d5f30b0be73d57903e9033e21ebd4106fd735185
Author: Ghe Rivero <email address hidden>
Date: Wed Feb 5 12:46:45 2014 +0000

    Allow to tear-down a node waiting to be deployed

    Introducing a new state, WAITDEPLOYING, those nodes which have failed
    to POST and trigger the vendor_passthru._continue_deploy() can be
    tear-down.

    Closes-Bug: #1270986
    Change-Id: Ib98f1ba6a29de46df9ce54941d0558df4f241f40

Changed in ironic:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ironic (master)

Reviewed: https://review.openstack.org/72395
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=ae79881bbc92fc8a8b1d8867ddd9bb7763d3813e
Submitter: Jenkins
Branch: master

commit ae79881bbc92fc8a8b1d8867ddd9bb7763d3813e
Author: Yuriy Zveryanskyy <email address hidden>
Date: Mon Feb 24 15:51:49 2014 +0200

    Add timeout for waiting callback from deploy ramdisk

    'callback_timeout' options added to conductor, if timeout reached
    node switched to error state. New periodical task created for timeouts
    check.

    Related-Bug: #1270986
    Change-Id: I3084d529baa13d4f848a7d050b8299582a28d7e9

Thierry Carrez (ttx)
Changed in ironic:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in ironic:
milestone: icehouse-3 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.