Nova unplug interface race condition when deleting an instance

Bug #1830081 reported by Arnaud Morin on 2019-05-22
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Arnaud Morin
Queens
Low
Arnaud Morin
Rocky
Low
Arnaud Morin
Stein
Low
Arnaud Morin

Bug Description

Description
===========
When nova start an instance, it asks neutron to create a port and then update the instance info cache based on information from neutron.
If, in the middle of the spawning, the instance is getting deleted, the terminate_instance function is called with an instance object that DOES NOT contain any network info.
As a result, nova is deleting the instance but is never unplugging the interface.

Step to reproduce
=================
I am booting an instance and immediately deleting it thanks to a command like:
$ openstack server create --key-name fake --image ubuntu1810 --flavor c2-7 --net Ext-Net arnaudubuntu1810-3 ; nova delete arnaudubuntu1810-3

- [1] build_and_run_instance is executed, with a semaphore, thus, locking the instance. When booting, nova will fill the network_info cache, by calling [2] update_instance_cache_with_nw_info.
- [3] terminate_instance is executed few seconds later, but is waiting for the semaphore to be released. At this time, the instance network_info cache may not be filled, depending if the [2] update_instance_cache_with_nw_info has already been executed or not.
- If we follow the code, we end up at _shutdown_instance [4], which is doing a call to [5] get_network_info, which is returning a NetworkInfo object that contains no interface.
- At the end, nova is calling _unplug_vifs [6] which is doing nothing (no vif)

Note that I am running OpenStack Newton release, but the code involved seems identical on master.

[1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1837
[2] https://github.com/openstack/nova/blob/master/nova/network/base_api.py#L34
[2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2765
[4] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2559
[5] https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L1252
[6] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L919

Fix proposed to branch: master
Review: https://review.opendev.org/660761

Changed in nova:
assignee: nobody → Arnaud Morin (arnaud-morin)
status: New → In Progress
Matt Riedemann (mriedem) on 2019-05-22
tags: added: compute neutron
Changed in nova:
importance: Undecided → Low
Changed in nova:
assignee: Arnaud Morin (arnaud-morin) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem) on 2019-05-29
Changed in nova:
assignee: Matt Riedemann (mriedem) → Arnaud Morin (arnaud-morin)
Changed in nova:
assignee: Arnaud Morin (arnaud-morin) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem) on 2019-06-07
Changed in nova:
assignee: Matt Riedemann (mriedem) → Arnaud Morin (arnaud-morin)
Changed in nova:
assignee: Arnaud Morin (arnaud-morin) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem) on 2019-06-11
Changed in nova:
assignee: Matt Riedemann (mriedem) → Arnaud Morin (arnaud-morin)

Reviewed: https://review.opendev.org/660761
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d4ed0d8b7adc350e8962df033c2da892c95561fe
Submitter: Zuul
Branch: master

commit d4ed0d8b7adc350e8962df033c2da892c95561fe
Author: Arnaud Morin <email address hidden>
Date: Wed May 22 17:34:20 2019 +0200

    Refresh instance network info on deletion

    When deleting an instance, if the network info is empty, we should
    refresh the info because we can't be sure the copy of the cache we
    have when we fetched the instance to delete is up-to-date, i.e. if
    we're racing to delete the server while it's building and the
    network info cache was updated in the database after we started the
    delete operation and got the instance from the DB, then we could
    fail to unplug VIFs.

    Closes-Bug: #1830081

    Change-Id: I99601773406c61f93002e2f7cbb248cf73cef0ab
    Signed-off-by: Arnaud Morin <email address hidden>

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/665143
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=54ec03a52a7a327021022547c75368db5e157852
Submitter: Zuul
Branch: stable/stein

commit 54ec03a52a7a327021022547c75368db5e157852
Author: Arnaud Morin <email address hidden>
Date: Wed May 22 17:34:20 2019 +0200

    Refresh instance network info on deletion

    When deleting an instance, if the network info is empty, we should
    refresh the info because we can't be sure the copy of the cache we
    have when we fetched the instance to delete is up-to-date, i.e. if
    we're racing to delete the server while it's building and the
    network info cache was updated in the database after we started the
    delete operation and got the instance from the DB, then we could
    fail to unplug VIFs.

    Closes-Bug: #1830081

    Change-Id: I99601773406c61f93002e2f7cbb248cf73cef0ab
    Signed-off-by: Arnaud Morin <email address hidden>
    (cherry picked from commit d4ed0d8b7adc350e8962df033c2da892c95561fe)

This issue was fixed in the openstack/nova 19.0.2 release.

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers