In certain cases compute does not clean up neutron ports after unsuccessful vm spawn

Bug #1423845 reported by Oleg Bondarev
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Oleg Bondarev

Bug Description

When allocating networks for instance compute first creates ports and then fetches them from neutron to build network info.
Under high load it might be possible that neutron/keystone timeouts on a request to fetch ports for instance (traceback attached).
In this case exception is caught and _shutdown_instance() with try_deallocate_networks=False is called with the assumption that "Network deallocation is already handled in this code path so it should not happen in _shutdown_instance." [1]
Then the exception is reraised, caught in _build_and_run_instance() and reraised as RescheduledException [2].
RescheduledException is caught in _do_build_and_run_instance [3]
Eventually only self.network_api.cleanup_instance_network_on_host() is called and instance resheduling initiated.
self.network_api.cleanup_instance_network_on_host() does nothing in case of neutron so we have orphaned ports.

I see two possible fixes: either do network deallocation on _shutdown_instance() or implement cleanup_instance_network_on_host() to do ports cleanup.

[1] bug 1332198 commit 5120c4f7c2670eaa71898fe6941029bbb0081949
[2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2233
[3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2089
[4] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2113

Tags: network
Revision history for this message
Oleg Bondarev (obondarev) wrote :
tags: added: network
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/157755

Changed in nova:
status: New → In Progress
Revision history for this message
Andrew Laski (alaski) wrote :

The ports aren't actually orphaned, right? They're used when the instance is rescheduled to a new host. Or am I missing something here?

Revision history for this message
Oleg Bondarev (obondarev) wrote :

I'm not seeing how they are used after rescheduling, in fact new ports are created. Probably it's because allocate_for_instance failed so network allocation was not considered successful.

Changed in nova:
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/157755
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=43dfc401abca3ffbd28ee7737fbe172b6ae1b439
Submitter: Jenkins
Branch: master

commit 43dfc401abca3ffbd28ee7737fbe172b6ae1b439
Author: Oleg Bondarev <email address hidden>
Date: Thu Feb 26 19:38:48 2015 +0300

    Fix orphaned ports on build failure

    In certain cases compute does not clean up neutron ports after
    unsuccessful vm spawn.
    commit 5120c4f7c2670eaa71898fe6941029bbb0081949 assumes that
    deallocation is already handled in this code path
    however it's not always the case (see bug report for details)
    This patch adds the check for network_info is empty at the moment
    failure occures. If it's empty it's better to cleanup network
    to eliminate the chance of orphaned ports in neutron.

    Closes-Bug: #1423845
    Change-Id: I88f535193dbd35253a4444950f6b2812e1a2a407

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-rc1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.