nova driven ironic deployments may fail to remove vif

Bug #1743652 reported by Julia Kreger on 2018-01-16
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ironic
Triaged
High
Julia Kreger

Bug Description

In some cases, albeit rare and unlikely, Ironic may have locked the node and be in a state that prevents the detachment of a VIF. These locked cases would be mid-deployment, when starting cleaning, etc.) Nova eventually will give up if it is unable to delete the vif and move on with the world, where the vif record will then be orphaned on the node port record and never be cleaned up, the node may run out of vifs. This also presents an issue if the vif is re-used in the deployment, and the node is re-deployed with an excess vif record.

This can be easily encountered should only one port exist on the node, which causes the next deployment to fail reporting "Not enough ports"

Changed in ironic:
assignee: nobody → Julia Kreger (juliaashleykreger)
status: New → In Progress
Mark Goddard (mgoddard) wrote :

Thanks for linking this to https://bugs.launchpad.net/nova/+bug/1733861 Julia. FYI, I've proposed a work around for this issue: https://review.openstack.org/#/c/537626/.

Related fix proposed to branch: master
Review: https://review.openstack.org/537737

Dmitry Tantsur (divius) on 2018-02-05
Changed in ironic:
importance: Undecided → High

Change abandoned by Julia Kreger (<email address hidden>) on branch: master
Review: https://review.openstack.org/537737

Reviewed: https://review.openstack.org/534441
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=4f79cb3932f2518ab3f06b86ceea065cbb399e8c
Submitter: Zuul
Branch: master

commit 4f79cb3932f2518ab3f06b86ceea065cbb399e8c
Author: Julia Kreger <email address hidden>
Date: Tue Jan 16 09:21:37 2018 -0800

    Don't try to lock for vif detach

    Historically, we did not have a prohibition upon removing
    a VIF entry stored in the extra field, however the VIF
    attachment/detachment feature resulted in a task being
    created which by default attempts to pull a reservation
    lock unless explicitly shared.

    This is problematic as part of the process of undeploying
    a node as exclusive locks are generated.

    Presently, if any of those locked tasks run long, such as
    a new image being required or for some crazy reason,
    the BMC power request hangs for a few minutes, the VIF
    record may be orphaned and never removed, as the
    expectation is that nova deletes the VIF record from ironic.

    This allows the VIF record to be removed when a node is
    no longer in active use and possibly subject to a lock being
    held for a long period of time, such as when setting up
    for CLEANING.

    Additionally, this patch moves the actual VIF record
    deletion until after the detachment action in the
    event that it fails. This allows for the state in
    ironic to be consistent instead of the record
    being removed before the detachment occurs.

    Change-Id: Ib7544e43a2b26441d4f562b584bbc7fee6a11fea
    Closes-Bug: #1743652

Changed in ironic:
status: In Progress → Fix Released

This issue was fixed in the openstack/ironic 10.1.0 release.

Dmitry Tantsur (divius) wrote :

We are about to revert this fix due to regressions it causes :( See https://bugs.launchpad.net/ironic/+bug/1750785

Changed in ironic:
status: Fix Released → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers