Race conditions with builds and deletes in Ironic driver

Bug #1337461 reported by Chris Behrens
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Invalid
Medium
Unassigned
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

While perhaps not officially supported by nova, when running 2 nova-computes for Ironic, you can get a delete to happen at the same time as a build. The ironic virt driver for nova skips unprovision when the node is not in certain states... It happens that unprovision is skipped when a Node is still deploying... Nova then tries to unset the instance_uuid and ends up failing and retrying on a 409 until ironic finishes the build. It then succeeds and you end up with a Node in ironic that is 'active' but has no instance_uuid.

In addition to this, there's no provision_state checking in the virt driver when reporting resources to the compute manager. There's checking on if instance_uuid is assigned.. and if there's not one, it assumes the node is free. In this case, it turns out the node is not really free... but somewhat orphaned. Scheduling should probably skip these if they happen.

Chris Behrens (cbehrens)
Changed in ironic:
status: New → In Progress
assignee: nobody → Chris Behrens (cbehrens)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/104649

Dmitry Tantsur (divius)
Changed in ironic:
importance: Undecided → Medium
tags: added: nova-driver
Revision history for this message
Dmitry Tantsur (divius) wrote : Re: race conditions with builds and deletes

Hi Chris, could you give status update on this bug? The patch seems abandoned and also should be moved to Nova.

summary: - race conditions with builds and deletes
+ Race conditions with builds and deletes in Ironic driver
tags: added: ironic
Changed in ironic:
status: In Progress → Invalid
Sean Dague (sdague)
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (master)

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/104649
Reason: Thanks for the patch! Unfortunately the changes doesn't belongs to Ironic anymore, the Ironic Nova driver was finally moved to the Nova tree (yay). Please resubmit it to Nova.

Thanks again!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/135767

Changed in nova:
assignee: nobody → Devananda van der Veen (devananda)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/135767
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Removing "In Progress" status and assignee as change is abandoned.

Changed in nova:
status: In Progress → Confirmed
assignee: Devananda van der Veen (devananda) → nobody
Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

we should still push this through, even though it's not fully supported. I'm happy to pick this one up.

Revision history for this message
Michael Davies (mrda) wrote :

Assigned to Jim as per his lying under the bus

Changed in ironic:
assignee: Chris Behrens (cbehrens) → Jim Rollenhagen (jim-rollenhagen)
Changed in ironic:
assignee: Jim Rollenhagen (jim-rollenhagen) → nobody
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance: Medium → Undecided
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers