Ironic: Deleting while spawning can leave orphan ACTIVE nodes in Ironic

Bug #1477490 reported by Lucas Alvares Gomes
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ironic
Won't Fix
Medium
Lucas Alvares Gomes
OpenStack Compute (nova)
Fix Released
Medium
Unassigned

Bug Description

The Ironic nova driver won't try to delete the instance in Ironic if the node's provision state is DEPLOYING [1] , this is known to fail with the current Ironic code because we just can't abort the installation at the DEPLOYING stage.

But the Ironic nova driver just keep going and tries to clean up the deployment environment (without telling Ironic to unprovision the instance) and it will fail as well. But the the code that cleans up the instance will keep retrying [3] because there's a transition in progress and it can't update the node. But when the node finishes the deployment, if the retrying didn't timed out, the destroy() method from the Nova driver will succeed cleaning deployment environment and the Nova instance will be deleted but the Ironic node will continue to marked as ACTIVE in Ironic and now orphan because there's no instance in Nova associated with it [4]

The good news is that since nova clean up the network stuff the instance won't be accessible.

WORKAROUND:

Unprovision the node using the Ironic API directly

$ ironic node-set-provision-state <node uuid> deleted

PROPOSED FIX:

IMO the ironic nova driver should try to tell Ironic to delete the instance even when the provision state of the node is DEPLOYING. If it fails the nova delete command will fail saying it can not delete the instance, which is fine until this gets resolved in Ironic (there's work going on to be able to abort a deployment at any stage)

[1] https://github.com/openstack/nova/blob/6a24bbeecd8a6d6d3135a10f4917b071896d14ee/nova/virt/ironic/driver.py#L865-L868

[2] https://github.com/openstack/nova/blob/6a24bbeecd8a6d6d3135a10f4917b071896d14ee/nova/virt/ironic/driver.py#L871

[3] From the nova-compute logs

{"error_message": "{\"debuginfo\": null, \"faultcode\": \"Client\", \"faultstring\": \"Node d240ae0d-1844-48f0-adcf-b70680a1b6ce can not be updated while a state transition is in progress.\"}"}
 from (pid=6672) log_http_response /usr/local/lib/python2.7/dist-packages/ironicclient/common/http.py:260
2015-07-23 11:07:40.358 WARNING ironicclient.common.http [req-24b39fe8-435d-4869-970f-53f64b3512a8 demo demo] Request returned failure status.
2015-07-23 11:07:40.358 WARNING ironicclient.common.http [req-24b39fe8-435d-4869-970f-53f64b3512a8 demo demo] Error contacting Ironic server: Node d240ae0d-1844-48f0-adcf-b70680a1b6ce can not be updated while a state transition is in progress. (HTTP 409). Attempt 3 of 6

[4] http://paste.openstack.org/show/403569/

Tags: ironic
Changed in nova:
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
summary: - Ironic: Deleting while spawnming can leave orphan ACTIVE nodes in Ironic
+ Ironic: Deleting while spawning can leave orphan ACTIVE nodes in Ironic
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Lucas Alvares Gomes (lucasagomes):

Since you are set as assignee, I switch the status to "In Progress".

tags: added: ironic
Changed in nova:
status: New → In Progress
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

@Markus,

Ok thanks, I have a spec up in Ironic that will fix this problem[1]

[1] https://review.openstack.org/#/c/204162/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/206614

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/209457

Changed in nova:
assignee: Lucas Alvares Gomes (lucasagomes) → John Garbutt (johngarbutt)
Changed in nova:
assignee: John Garbutt (johngarbutt) → Lucas Alvares Gomes (lucasagomes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/209457
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bfe52542f5449391b9dc152a90fd79afebcb3ff2
Submitter: Jenkins
Branch: master

commit bfe52542f5449391b9dc152a90fd79afebcb3ff2
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Aug 5 11:51:01 2015 +0100

    Ironic: Call unprovison for nodes in DEPLOYING state

    This patch is making the Nova ironic driver to try to unprovision the node
    even if it's in DEPLOYING state. Current Ironic will not accept aborting
    the deployment when it's in DEPLOYING state but with the retry mechanism
    it may work once the state is moved to ACTIVE or DEPLOYWAIT. Prior to
    this patch the logic was to not even try to unprovision the node if it's
    in DEPLOYING and just go ahead and clean the instance but that behavior
    is dangerous and could leave orphan active instances in Ironic. With
    this patch at least if the unprovision fails in Ironic we can make sure
    that the instance won't be deleted from Nova.

    The tests for the destroy() method were refactored to extend testing
    destroy() being called with all provision state methods in Ironic
    instead of picking certain ones; A helper function was created to avoid
    code duplication on the tests.

    Partial-Bug: #1477490
    Change-Id: I227eac73a9043dc242b7a0908bc27b628b830c3c

Michael Davies (mrda)
Changed in nova:
importance: Undecided → Medium
Changed in nova:
assignee: Lucas Alvares Gomes (lucasagomes) → nobody
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

In order to allow the instance to be destroyed at any stage of the spawn we need to work in ironic, so unassigned myself until we get things done in the Ironic side.

Changed in nova:
status: In Progress → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/206614
Reason: This patch has been idle for a long time, so I am abandoning it to keep the review clean sane. If you're interested in still working on this patch, then please unabandon it and upload a new patchset.

Revision history for this message
Michael Davies (mrda) wrote :

Assigned to Lucas in Lucas in the hope he'll fix it :)
  -- John and Michael...

Changed in ironic:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
Revision history for this message
Dmitry Tantsur (divius) wrote :

Hi all!

It seems that the Nova fix has landed, and now we at least make sure that the deletion does not succeed for DEPLOYING nodes. I think it's enough to mark this bug as fixed. I suggest filing a new RFE for the ability to delete nodes in any state, as the previously filed spec is not abandoned. Thanks!

Changed in nova:
status: Confirmed → Fix Released
Changed in ironic:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.