Nova unplug interface race condition when deleting an instance

Bug #1830081 reported by Arnaud Morin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Arnaud Morin
Queens
Confirmed
Low
Unassigned
Rocky
Confirmed
Low
Unassigned
Stein
Fix Committed
Low
Arnaud Morin

Bug Description

Description
===========
When nova start an instance, it asks neutron to create a port and then update the instance info cache based on information from neutron.
If, in the middle of the spawning, the instance is getting deleted, the terminate_instance function is called with an instance object that DOES NOT contain any network info.
As a result, nova is deleting the instance but is never unplugging the interface.

Step to reproduce
=================
I am booting an instance and immediately deleting it thanks to a command like:
$ openstack server create --key-name fake --image ubuntu1810 --flavor c2-7 --net Ext-Net arnaudubuntu1810-3 ; nova delete arnaudubuntu1810-3

- [1] build_and_run_instance is executed, with a semaphore, thus, locking the instance. When booting, nova will fill the network_info cache, by calling [2] update_instance_cache_with_nw_info.
- [3] terminate_instance is executed few seconds later, but is waiting for the semaphore to be released. At this time, the instance network_info cache may not be filled, depending if the [2] update_instance_cache_with_nw_info has already been executed or not.
- If we follow the code, we end up at _shutdown_instance [4], which is doing a call to [5] get_network_info, which is returning a NetworkInfo object that contains no interface.
- At the end, nova is calling _unplug_vifs [6] which is doing nothing (no vif)

Note that I am running OpenStack Newton release, but the code involved seems identical on master.

[1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1837
[2] https://github.com/openstack/nova/blob/master/nova/network/base_api.py#L34
[2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2765
[4] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2559
[5] https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L1252
[6] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L919

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/660761

Changed in nova:
assignee: nobody → Arnaud Morin (arnaud-morin)
status: New → In Progress
Matt Riedemann (mriedem)
tags: added: compute neutron
Changed in nova:
importance: Undecided → Low
Changed in nova:
assignee: Arnaud Morin (arnaud-morin) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Arnaud Morin (arnaud-morin)
Changed in nova:
assignee: Arnaud Morin (arnaud-morin) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Arnaud Morin (arnaud-morin)
Changed in nova:
assignee: Arnaud Morin (arnaud-morin) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Arnaud Morin (arnaud-morin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/660761
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d4ed0d8b7adc350e8962df033c2da892c95561fe
Submitter: Zuul
Branch: master

commit d4ed0d8b7adc350e8962df033c2da892c95561fe
Author: Arnaud Morin <email address hidden>
Date: Wed May 22 17:34:20 2019 +0200

    Refresh instance network info on deletion

    When deleting an instance, if the network info is empty, we should
    refresh the info because we can't be sure the copy of the cache we
    have when we fetched the instance to delete is up-to-date, i.e. if
    we're racing to delete the server while it's building and the
    network info cache was updated in the database after we started the
    delete operation and got the instance from the DB, then we could
    fail to unplug VIFs.

    Closes-Bug: #1830081

    Change-Id: I99601773406c61f93002e2f7cbb248cf73cef0ab
    Signed-off-by: Arnaud Morin <email address hidden>

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/665143

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/665144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/665145

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/665143
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=54ec03a52a7a327021022547c75368db5e157852
Submitter: Zuul
Branch: stable/stein

commit 54ec03a52a7a327021022547c75368db5e157852
Author: Arnaud Morin <email address hidden>
Date: Wed May 22 17:34:20 2019 +0200

    Refresh instance network info on deletion

    When deleting an instance, if the network info is empty, we should
    refresh the info because we can't be sure the copy of the cache we
    have when we fetched the instance to delete is up-to-date, i.e. if
    we're racing to delete the server while it's building and the
    network info cache was updated in the database after we started the
    delete operation and got the instance from the DB, then we could
    fail to unplug VIFs.

    Closes-Bug: #1830081

    Change-Id: I99601773406c61f93002e2f7cbb248cf73cef0ab
    Signed-off-by: Arnaud Morin <email address hidden>
    (cherry picked from commit d4ed0d8b7adc350e8962df033c2da892c95561fe)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.2

This issue was fixed in the openstack/nova 19.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/rocky)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/665144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/queens)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/665145

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/663382
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b3e14931d6aac6ee5776ce1e6974c75a5a6b1823
Submitter: Zuul
Branch: master

commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 5 16:39:45 2019 +0100

    Unplug VIFs as part of cleanup of networks

    If an instance fails to build, which is possible for a variety of
    reasons, we may end up in a situation where we have remnants of a
    plugged VIF (typically files) left on the host. This is because we
    cleanup from the neutron perspective but don't attempt to unplug the
    VIF, a call which may have many side-effects depending on the VIF
    driver. Resolve this by always attempting to unplug VIFs as part of the
    network cleanup.

    A now invalid note is also removed and a unit test corrected.

    Closes-Bug: #1831771
    Related-Bug: #1830081
    Signed-off-by: Stephen Finucane <email address hidden>
    Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/711251

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/711251
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3e935325a88bf7a0206ec07bc67383e8be846f15
Submitter: Zuul
Branch: stable/train

commit 3e935325a88bf7a0206ec07bc67383e8be846f15
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 5 16:39:45 2019 +0100

    Unplug VIFs as part of cleanup of networks

    If an instance fails to build, which is possible for a variety of
    reasons, we may end up in a situation where we have remnants of a
    plugged VIF (typically files) left on the host. This is because we
    cleanup from the neutron perspective but don't attempt to unplug the
    VIF, a call which may have many side-effects depending on the VIF
    driver. Resolve this by always attempting to unplug VIFs as part of the
    network cleanup.

    A now invalid note is also removed and a unit test corrected.

    Conflicts:
     nova/tests/unit/compute/test_compute_mgr.py

    NOTE(stephenfin): Conflicts are due to the absence of change
    Ifa9c5c468400261a5e1f66b72c575845173a4f8f ("nova-net: Remove final
    references to nova-network") which we don't want to backport here. In
    addition, we need to modify a mock to reflect the absence of change
    I329f0fd589a4b2e0426485f09f6782f94275cc07 ("nova-net: Remove layer of
    indirection in 'nova.network'").

    Closes-Bug: #1831771
    Related-Bug: #1830081
    Signed-off-by: Stephen Finucane <email address hidden>
    Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9
    (cherry picked from commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/715400

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/715404

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/715406

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/715400
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=265fd4f6bd56c711d7827c2defc993e19a541770
Submitter: Zuul
Branch: stable/stein

commit 265fd4f6bd56c711d7827c2defc993e19a541770
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 5 16:39:45 2019 +0100

    Unplug VIFs as part of cleanup of networks

    If an instance fails to build, which is possible for a variety of
    reasons, we may end up in a situation where we have remnants of a
    plugged VIF (typically files) left on the host. This is because we
    cleanup from the neutron perspective but don't attempt to unplug the
    VIF, a call which may have many side-effects depending on the VIF
    driver. Resolve this by always attempting to unplug VIFs as part of the
    network cleanup.

    A now invalid note is also removed and a unit test corrected.

    Closes-Bug: #1831771
    Related-Bug: #1830081
    Signed-off-by: Stephen Finucane <email address hidden>
    Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9
    (cherry picked from commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823)
    (cherry picked from commit 3e935325a88bf7a0206ec07bc67383e8be846f15)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/715404
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=85521691a843b9606d4a8aa050f4452ba025eb02
Submitter: Zuul
Branch: stable/rocky

commit 85521691a843b9606d4a8aa050f4452ba025eb02
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 5 16:39:45 2019 +0100

    Unplug VIFs as part of cleanup of networks

    If an instance fails to build, which is possible for a variety of
    reasons, we may end up in a situation where we have remnants of a
    plugged VIF (typically files) left on the host. This is because we
    cleanup from the neutron perspective but don't attempt to unplug the
    VIF, a call which may have many side-effects depending on the VIF
    driver. Resolve this by always attempting to unplug VIFs as part of the
    network cleanup.

    A now invalid note is also removed and a unit test corrected.

    Modified:
     nova/tests/unit/compute/test_compute_mgr.py

    NOTE(stephenfin): Changed 'mock.patch.object' to 'mock.patch' for a test
    in 'nova/tests/unit/compute/test_compute_mgr.py' since we haven't
    imported the 'neutronv2' module in this version of the file.

    Closes-Bug: #1831771
    Related-Bug: #1830081
    Signed-off-by: Stephen Finucane <email address hidden>
    Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9
    (cherry picked from commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823)
    (cherry picked from commit 3e935325a88bf7a0206ec07bc67383e8be846f15)
    (cherry picked from commit 265fd4f6bd56c711d7827c2defc993e19a541770)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/715406
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f50bd6e65758c7c3f3b433a1e66bd8c94f6947f8
Submitter: Zuul
Branch: stable/queens

commit f50bd6e65758c7c3f3b433a1e66bd8c94f6947f8
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 5 16:39:45 2019 +0100

    Unplug VIFs as part of cleanup of networks

    If an instance fails to build, which is possible for a variety of
    reasons, we may end up in a situation where we have remnants of a
    plugged VIF (typically files) left on the host. This is because we
    cleanup from the neutron perspective but don't attempt to unplug the
    VIF, a call which may have many side-effects depending on the VIF
    driver. Resolve this by always attempting to unplug VIFs as part of the
    network cleanup.

    A now invalid note is also removed and a unit test corrected.

    Conflicts:
     nova/tests/unit/compute/test_compute_mgr.py

    NOTE(stephenfin): Conflict is because we're missing change
    Ic5cab99944df9e501ba2032eb96911c36304494d ("Port binding based on events
    during live migration") which we don't want to backport.

    Closes-Bug: #1831771
    Related-Bug: #1830081
    Signed-off-by: Stephen Finucane <email address hidden>
    Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9
    (cherry picked from commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823)
    (cherry picked from commit 3e935325a88bf7a0206ec07bc67383e8be846f15)
    (cherry picked from commit 265fd4f6bd56c711d7827c2defc993e19a541770)
    (cherry picked from commit 85521691a843b9606d4a8aa050f4452ba025eb02)

tags: added: in-stable-queens
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.