UnexpectedDeletingTaskStateError exception can leave traces of VIFs on host

Bug #1831771 reported by Stephen Finucane
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Stephen Finucane
Queens
Fix Released
Undecided
Unassigned
Rocky
Fix Released
Undecided
Unassigned
Stein
Fix Released
Undecided
Unassigned
Train
Fix Released
Undecided
Unassigned

Bug Description

This was originally reported in Bugzilla

https://bugzilla.redhat.com/show_bug.cgi?id=1668159

The 'UnexpectedDeletingTaskStateError' exception can be raised by something like aborting a large heat stack, where the instance hasn't finished setting up before the stack is aborted and the instances deleted.

https://github.com/openstack/nova/blob/19.0.0/nova/db/sqlalchemy/api.py#L2864

We handle this in the compute manager and as part of that handling, we clean up the resource tracking of network interfaces.

https://github.com/openstack/nova/blob/19.0.0/nova/compute/manager.py#L2034-L2040

However, we don't unplug these interfaces. This can result in things being left over on the host.

We should attempt to unplug VIFs as part of this cleanup.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/663382

Changed in nova:
assignee: nobody → Stephen Finucane (stephenfinucane)
status: New → In Progress
tags: added: compute
Revision history for this message
Matt Riedemann (mriedem) wrote :

OVH reported a similar issue via different scenario in IRC today:

(9:42:22 AM) amorin: I am currently facing a race condition in my OpenStack deployment. If I delete a port while the instance was booting (before nova plug the interface), then nova plug the interface and left the interface like this, after a while I have a lot of interface staying on the host
(9:47:06 AM) amorin: mriedem: something like this:
(9:47:07 AM) amorin: http://paste.openstack.org/show/753313/

Changed in nova:
importance: Undecided → Medium
Changed in nova:
assignee: Stephen Finucane (stephenfinucane) → Matthew Booth (mbooth-9)
Changed in nova:
assignee: Matthew Booth (mbooth-9) → Stephen Finucane (stephenfinucane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/689278
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=10434bd229973b37647741e58aff3ac90b3a0a6c
Submitter: Zuul
Branch: master

commit 10434bd229973b37647741e58aff3ac90b3a0a6c
Author: Matthew Booth <email address hidden>
Date: Thu Oct 17 21:29:46 2019 +0100

    Functional test for UnexpectedDeletingTaskStateError

    Adds a regression-style test for two cleanup bugs when
    'UnexpectedDeletingTaskStateError' is raised during build.

    Change-Id: Ief1dfbb6cc9d67b73dfab4c7b63358e76e12866b
    Related-Bug: #1848666
    Related-Bug: #1831771

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/663382
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b3e14931d6aac6ee5776ce1e6974c75a5a6b1823
Submitter: Zuul
Branch: master

commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 5 16:39:45 2019 +0100

    Unplug VIFs as part of cleanup of networks

    If an instance fails to build, which is possible for a variety of
    reasons, we may end up in a situation where we have remnants of a
    plugged VIF (typically files) left on the host. This is because we
    cleanup from the neutron perspective but don't attempt to unplug the
    VIF, a call which may have many side-effects depending on the VIF
    driver. Resolve this by always attempting to unplug VIFs as part of the
    network cleanup.

    A now invalid note is also removed and a unit test corrected.

    Closes-Bug: #1831771
    Related-Bug: #1830081
    Signed-off-by: Stephen Finucane <email address hidden>
    Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/711210

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/711251

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/711210
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=24600430a2e4c67a4584d1ee466d3376aae96f25
Submitter: Zuul
Branch: stable/train

commit 24600430a2e4c67a4584d1ee466d3376aae96f25
Author: Matthew Booth <email address hidden>
Date: Thu Oct 17 21:29:46 2019 +0100

    Functional test for UnexpectedDeletingTaskStateError

    Adds a regression-style test for two cleanup bugs when
    'UnexpectedDeletingTaskStateError' is raised during build.

    Modified:
     nova/tests/functional/regressions/test_bug_1831771.py
     nova/tests/functional/integrated_helpers.py

    NOTE(stephenfin): The '_build_server' function has to be replaced by
    '_build_minimal_create_server_request' since we don't have change
    I91fa2f73185fef48e9aae9b7f61389c374e06676 here. Similarly, we need to
    make '_wait_until_deleted' handle responses that don't include the
    'status' field, such as the response received when creating a server,
    which was done in change I0c56841d098d3e9d72db65be3143f3c893f0b6ba on
    master. Finally, change I36da36cc5b099174eece0dfba29485fc20b2867b has
    been squashed into this change to avoid the races we saw with this test
    on master.

    Change-Id: Ief1dfbb6cc9d67b73dfab4c7b63358e76e12866b
    Related-Bug: #1848666
    Related-Bug: #1831771
    (cherry picked from commit 10434bd229973b37647741e58aff3ac90b3a0a6c)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/711251
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3e935325a88bf7a0206ec07bc67383e8be846f15
Submitter: Zuul
Branch: stable/train

commit 3e935325a88bf7a0206ec07bc67383e8be846f15
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 5 16:39:45 2019 +0100

    Unplug VIFs as part of cleanup of networks

    If an instance fails to build, which is possible for a variety of
    reasons, we may end up in a situation where we have remnants of a
    plugged VIF (typically files) left on the host. This is because we
    cleanup from the neutron perspective but don't attempt to unplug the
    VIF, a call which may have many side-effects depending on the VIF
    driver. Resolve this by always attempting to unplug VIFs as part of the
    network cleanup.

    A now invalid note is also removed and a unit test corrected.

    Conflicts:
     nova/tests/unit/compute/test_compute_mgr.py

    NOTE(stephenfin): Conflicts are due to the absence of change
    Ifa9c5c468400261a5e1f66b72c575845173a4f8f ("nova-net: Remove final
    references to nova-network") which we don't want to backport here. In
    addition, we need to modify a mock to reflect the absence of change
    I329f0fd589a4b2e0426485f09f6782f94275cc07 ("nova-net: Remove layer of
    indirection in 'nova.network'").

    Closes-Bug: #1831771
    Related-Bug: #1830081
    Signed-off-by: Stephen Finucane <email address hidden>
    Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9
    (cherry picked from commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/715399

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/715400

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/715403

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/715404

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/715405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/715406

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/715399
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ff2101654376993f248abca03e123d7233af4562
Submitter: Zuul
Branch: stable/stein

commit ff2101654376993f248abca03e123d7233af4562
Author: Matthew Booth <email address hidden>
Date: Thu Oct 17 21:29:46 2019 +0100

    Functional test for UnexpectedDeletingTaskStateError

    Adds a regression-style test for two cleanup bugs when
    'UnexpectedDeletingTaskStateError' is raised during build.

    Change-Id: Ief1dfbb6cc9d67b73dfab4c7b63358e76e12866b
    Related-Bug: #1848666
    Related-Bug: #1831771
    (cherry picked from commit 10434bd229973b37647741e58aff3ac90b3a0a6c)
    (cherry picked from commit 24600430a2e4c67a4584d1ee466d3376aae96f25)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/715400
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=265fd4f6bd56c711d7827c2defc993e19a541770
Submitter: Zuul
Branch: stable/stein

commit 265fd4f6bd56c711d7827c2defc993e19a541770
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 5 16:39:45 2019 +0100

    Unplug VIFs as part of cleanup of networks

    If an instance fails to build, which is possible for a variety of
    reasons, we may end up in a situation where we have remnants of a
    plugged VIF (typically files) left on the host. This is because we
    cleanup from the neutron perspective but don't attempt to unplug the
    VIF, a call which may have many side-effects depending on the VIF
    driver. Resolve this by always attempting to unplug VIFs as part of the
    network cleanup.

    A now invalid note is also removed and a unit test corrected.

    Closes-Bug: #1831771
    Related-Bug: #1830081
    Signed-off-by: Stephen Finucane <email address hidden>
    Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9
    (cherry picked from commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823)
    (cherry picked from commit 3e935325a88bf7a0206ec07bc67383e8be846f15)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/715403
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5a17dd1e34caa44ce8a1406aea9377d8f04421ed
Submitter: Zuul
Branch: stable/rocky

commit 5a17dd1e34caa44ce8a1406aea9377d8f04421ed
Author: Matthew Booth <email address hidden>
Date: Thu Oct 17 21:29:46 2019 +0100

    Functional test for UnexpectedDeletingTaskStateError

    Adds a regression-style test for two cleanup bugs when
    'UnexpectedDeletingTaskStateError' is raised during build.

    Change-Id: Ief1dfbb6cc9d67b73dfab4c7b63358e76e12866b
    Related-Bug: #1848666
    Related-Bug: #1831771
    (cherry picked from commit 10434bd229973b37647741e58aff3ac90b3a0a6c)
    (cherry picked from commit 24600430a2e4c67a4584d1ee466d3376aae96f25)
    (cherry picked from commit ff2101654376993f248abca03e123d7233af4562)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/715404
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=85521691a843b9606d4a8aa050f4452ba025eb02
Submitter: Zuul
Branch: stable/rocky

commit 85521691a843b9606d4a8aa050f4452ba025eb02
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 5 16:39:45 2019 +0100

    Unplug VIFs as part of cleanup of networks

    If an instance fails to build, which is possible for a variety of
    reasons, we may end up in a situation where we have remnants of a
    plugged VIF (typically files) left on the host. This is because we
    cleanup from the neutron perspective but don't attempt to unplug the
    VIF, a call which may have many side-effects depending on the VIF
    driver. Resolve this by always attempting to unplug VIFs as part of the
    network cleanup.

    A now invalid note is also removed and a unit test corrected.

    Modified:
     nova/tests/unit/compute/test_compute_mgr.py

    NOTE(stephenfin): Changed 'mock.patch.object' to 'mock.patch' for a test
    in 'nova/tests/unit/compute/test_compute_mgr.py' since we haven't
    imported the 'neutronv2' module in this version of the file.

    Closes-Bug: #1831771
    Related-Bug: #1830081
    Signed-off-by: Stephen Finucane <email address hidden>
    Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9
    (cherry picked from commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823)
    (cherry picked from commit 3e935325a88bf7a0206ec07bc67383e8be846f15)
    (cherry picked from commit 265fd4f6bd56c711d7827c2defc993e19a541770)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/715405
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e33f4042c1292fc630fae6740331ccfb8c074717
Submitter: Zuul
Branch: stable/queens

commit e33f4042c1292fc630fae6740331ccfb8c074717
Author: Matthew Booth <email address hidden>
Date: Thu Oct 17 21:29:46 2019 +0100

    Functional test for UnexpectedDeletingTaskStateError

    Adds a regression-style test for two cleanup bugs when
    'UnexpectedDeletingTaskStateError' is raised during build.

    Modified:
     nova/tests/functional/regressions/test_bug_1831771.py

    NOTE(stephenfin): Modifications are necessary since we don't have change
    Iea283322124cb35fc0bc6d25f35548621e8c8c2f, which moved the
    'ProviderUsageBaseTestCase' base test class from 'test_servers.py' to
    'integrated_helpers.py'.

    Change-Id: Ief1dfbb6cc9d67b73dfab4c7b63358e76e12866b
    Related-Bug: #1848666
    Related-Bug: #1831771
    (cherry picked from commit 10434bd229973b37647741e58aff3ac90b3a0a6c)
    (cherry picked from commit 24600430a2e4c67a4584d1ee466d3376aae96f25)
    (cherry picked from commit ff2101654376993f248abca03e123d7233af4562)
    (cherry picked from commit 5a17dd1e34caa44ce8a1406aea9377d8f04421ed)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/715406
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f50bd6e65758c7c3f3b433a1e66bd8c94f6947f8
Submitter: Zuul
Branch: stable/queens

commit f50bd6e65758c7c3f3b433a1e66bd8c94f6947f8
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 5 16:39:45 2019 +0100

    Unplug VIFs as part of cleanup of networks

    If an instance fails to build, which is possible for a variety of
    reasons, we may end up in a situation where we have remnants of a
    plugged VIF (typically files) left on the host. This is because we
    cleanup from the neutron perspective but don't attempt to unplug the
    VIF, a call which may have many side-effects depending on the VIF
    driver. Resolve this by always attempting to unplug VIFs as part of the
    network cleanup.

    A now invalid note is also removed and a unit test corrected.

    Conflicts:
     nova/tests/unit/compute/test_compute_mgr.py

    NOTE(stephenfin): Conflict is because we're missing change
    Ic5cab99944df9e501ba2032eb96911c36304494d ("Port binding based on events
    during live migration") which we don't want to backport.

    Closes-Bug: #1831771
    Related-Bug: #1830081
    Signed-off-by: Stephen Finucane <email address hidden>
    Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9
    (cherry picked from commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823)
    (cherry picked from commit 3e935325a88bf7a0206ec07bc67383e8be846f15)
    (cherry picked from commit 265fd4f6bd56c711d7827c2defc993e19a541770)
    (cherry picked from commit 85521691a843b9606d4a8aa050f4452ba025eb02)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova queens-eol

This issue was fixed in the openstack/nova queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova rocky-eol

This issue was fixed in the openstack/nova rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.