Instance reschedule failure leaves orphaned neutron ports
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Wenzhi Yu | ||
Liberty |
Fix Released
|
Undecided
|
jichenjc |
Bug Description
During the instance boot (spawn/run) process, neutron ports are allocated for the instance if necessary. If the instance fails to spawn (say as a result of a compute host failure), the default behavior is to reschedule the instance and leave it's networking resources in-tact for potential reuse on the rescheduled host (as per deallocate_
All is good if the instance is successfully rescheduled, but if the reschedule fails (say no more applicable hosts) the allocated ports are left as-is and effectively orphaned.
There are some related defects ([2] and [3]), but they don't quite touch on the particular behavior described herein.
Obviously there are a number of ways to address this issue, but the most obvious is perhaps nova should be aware of the reschedule failure and deallocate any resources which may have been left in-tact for the reschedule.
I'm running devstack all-in-one setup from openstack master branches.
nova --version
2.32.0
neutron --version
3.1.0
The easiest way to repo is to use an all-in-one devstack (only 1 compute host) simulate a host spawn failure by editing the spwan() method of your compute driver to raise an exception at the end of the method and simply try to boot a server. In this setup there's only 1 host so the reschedule will fail and you can verify the port allocated for the instance still exists after trying to boot the instance.
[1] https:/
[2] https:/
[3] https:/
Changed in nova: | |
assignee: | nobody → Wen Zhi Yu (yuywz) |
Changed in nova: | |
assignee: | Wen Zhi Yu (yuywz) → Boden R (boden) |
Changed in nova: | |
assignee: | Boden R (boden) → Wen Zhi Yu (yuywz) |
Changed in nova: | |
importance: | Undecided → High |
For the scenario described in description, if the reschedule fails, a "NoValidHost_ Remote( u'No valid host was found. There are not enough hosts available.',)" exception will be captured in nova conductor manager. build_instances method will set state of the instance as vm_states.ERROR and send related notification, return without cleaning up allocated network resources, see [1].
At this point, conductor.
I think one way to fix this bug is adding code to clean up allocated network resources(like we do in compute manager, see [2]) in the exception handling section.
[1] https:/ /github. com/openstack/ nova/blob/ 12.0.0. 0rc3/nova/ conductor/ manager. py#L740- L746 /github. com/openstack/ nova/blob/ 12.0.0. 0rc3/nova/ compute/ manager. py#L1968
[2] https:/