OpenStack Compute (nova)

Instance reschedule failure leaves orphaned neutron ports

Bug #1510979 reported by Boden R on 2015-10-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Wenzhi Yu
	Liberty	Fix Released	Undecided	jichenjc

Bug Description

During the instance boot (spawn/run) process, neutron ports are allocated for the instance if necessary. If the instance fails to spawn (say as a result of a compute host failure), the default behavior is to reschedule the instance and leave it's networking resources in-tact for potential reuse on the rescheduled host (as per deallocate_networks_on_reschedule() [1] which returns False for most compute drivers).

All is good if the instance is successfully rescheduled, but if the reschedule fails (say no more applicable hosts) the allocated ports are left as-is and effectively orphaned.

There are some related defects ([2] and [3]), but they don't quite touch on the particular behavior described herein.

Obviously there are a number of ways to address this issue, but the most obvious is perhaps nova should be aware of the reschedule failure and deallocate any resources which may have been left in-tact for the reschedule.

I'm running devstack all-in-one setup from openstack master branches.

nova --version
2.32.0
neutron --version
3.1.0

The easiest way to repo is to use an all-in-one devstack (only 1 compute host) simulate a host spawn failure by editing the spwan() method of your compute driver to raise an exception at the end of the method and simply try to boot a server. In this setup there's only 1 host so the reschedule will fail and you can verify the port allocated for the instance still exists after trying to boot the instance.

[1] https://github.com/openstack/nova/blob/master/nova/virt/driver.py#L1273
[2] https://bugs.launchpad.net/nova/+bug/1410739
[3] https://bugs.launchpad.net/nova/+bug/1327124

Wenzhi Yu (yuywz) on 2015-10-29

Changed in nova:
assignee:	nobody → Wen Zhi Yu (yuywz)

Revision history for this message

Wenzhi Yu (yuywz) wrote on 2015-11-03:

For the scenario described in description, if the reschedule fails, a "NoValidHost_Remote(u'No valid host was found. There are not enough hosts available.',)" exception will be captured in nova conductor manager.
At this point, conductor.build_instances method will set state of the instance as vm_states.ERROR and send related notification, return without cleaning up allocated network resources, see [1].
I think one way to fix this bug is adding code to clean up allocated network resources(like we do in compute manager, see [2]) in the exception handling section.

[1] https://github.com/openstack/nova/blob/12.0.0.0rc3/nova/conductor/manager.py#L740-L746
[2] https://github.com/openstack/nova/blob/12.0.0.0rc3/nova/compute/manager.py#L1968

Changed in nova:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-10: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/243477

OpenStack Infra (hudson-openstack) on 2015-12-02

Changed in nova:
assignee:	Wen Zhi Yu (yuywz) → Boden R (boden)

OpenStack Infra (hudson-openstack) on 2015-12-03

Changed in nova:
assignee:	Boden R (boden) → Wen Zhi Yu (yuywz)

Gary Kotton (garyk) on 2015-12-04

Changed in nova:
importance:	Undecided → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-16: Fix merged to nova (master)

Reviewed: https://review.openstack.org/243477
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=08d24b733ee9f4da44bfbb8d6d3914924a41ccdc
Submitter: Jenkins
Branch: master

commit 08d24b733ee9f4da44bfbb8d6d3914924a41ccdc
Author: Wen Zhi Yu <email address hidden>
Date: Tue Nov 10 17:16:36 2015 +0800

Clean up network resources when reschedule fails

    During the instance boot (spawn/run) process, neutron ports are
    allocated for the instance if necessary. If the instance fails
    to spawn (say as a result of a compute host failure), the default
    behaviour is to reschedule the instance and leave its networking
    resources in-tact for potential reuse on the rescheduled host.
    All is good if the instance is successfully rescheduled, but if
    the reschedule fails (say no more applicable hosts) the allocated
    ports are left as-is and effectively orphaned.

This commit add code to clean up allocated network resources
when the reschedule fails.

Change-Id: Ic670dd4dc192603c2faecf18e14ef59ebca9e420
Closes-Bug: #1510979

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

Thierry Carrez (ttx) wrote on 2016-01-21: Fix included in openstack/nova 13.0.0.0b2

This issue was fixed in the openstack/nova 13.0.0.0b2 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-08: Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/367316

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.