Reschedule with libvirt exception leaves dangling neutron ports

Bug #1703540 reported by Gary Kotton
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
High
Unassigned
Ocata
Triaged
Undecided
Unassigned

Bug Description

When an instance fails to spawn, for example with the exception:

2017-07-11 04:39:56.942 ERROR nova.compute.manager [req-1e54a66a-6da5-4720-89cc-f65568dea131 ashok ashok] [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] Instance failed to spawn
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] Traceback (most recent call last):
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/opt/stack/nova/nova/compute/manager.py", line 2124, in _build_resources
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] yield resources
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/opt/stack/nova/nova/compute/manager.py", line 1930, in _build_and_run_instance
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] block_device_info=block_device_info)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2714, in spawn
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] destroy_disks_on_failure=True)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5130, in _create_domain_and_network
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] destroy_disks_on_failure)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] self.force_reraise()
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] six.reraise(self.type_, self.value, self.tb)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5102, in _create_domain_and_network
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] post_xml_callback=post_xml_callback)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5020, in _create_domain
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] guest.launch(pause=pause)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 145, in launch
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] self._encoded_xml, errors='ignore')
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] self.force_reraise()
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] six.reraise(self.type_, self.value, self.tb)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 140, in launch
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] return self._domain.createWithFlags(flags)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] result = proxy_call(self._autowrap, f, *args, **kwargs)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] rv = execute(f, *args, **kwargs)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] six.reraise(c, e, tb)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] rv = meth(*args, **kwargs)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1065, in createWithFlags
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] libvirtError: internal error: process exited while connecting to monitor: Failed to initialize module: /usr/lib/x86_64-linux-gnu/qemu/block-iscsi.so
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] Note: only modules from the same build can be loaded.
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] Failed to initialize module: /usr/lib/x86_64-linux-gnu/qemu/block-curl.so
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] Note: only modules from the same build can be loaded.
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] Failed to initialize module: /usr/lib/x86_64-linux-gnu/qemu/block-rbd.so
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] Note: only modules from the same build can be loaded.
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] Failed to initialize module: /usr/lib/x86_64-linux-gnu/qemu/block-dmg.so
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] Note: only modules from the same build can be loaded.
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b] 2017-07-11T04:39:55.787074Z qemu-system-x86_64: -vnc 10.115.78.96:0: Failed to start VNC server: Failed to bind socket: Cannot assign requested address
2017-07-11 04:39:56.942 TRACE nova.compute.manager [instance: d37e6882-8c94-47dc-8c2f-c9052a25b95b]

The scheduling code does not tear down the allocated posrts. If a schedule to an additional host works then the instance will have two neutron ports assigned

Tags: reschedule
Revision history for this message
Gary Kotton (garyk) wrote :

This happens with stable/ocata

Changed in nova:
importance: Undecided → High
Revision history for this message
Gary Kotton (garyk) wrote :

On compute 1:
nicira@kvm-compute-node3:/opt/stack/logs$ grep -r "fd555446-779a-47a3-a1ef-c2ee6ea0369c\] Allocating IP information in the background" *
n-cpu.log:2017-07-10 21:08:34.112 DEBUG nova.compute.manager [req-e8739936-510d-444a-95c3-5056d5e3db01 deepthi_project deepthi_admin] [instance: fd555446-779a-47a3-a1ef-c2ee6ea0369c] Allocating IP information in the background. from (pid=2082) _allocate_network_async /opt/stack/nova/nova/compute/manager.py:1386

On compute 2 we have:
n-cpu.log:2017-07-10 21:08:48.739 DEBUG nova.compute.manager [req-e8739936-510d-444a-95c3-5056d5e3db01 deepthi_project deepthi_admin] [instance: fd555446-779a-47a3-a1ef-c2ee6ea0369c] Allocating IP information in the background. from (pid=32024) _allocate_network_async /opt/stack/nova/nova/compute/manager.py:1386

Revision history for this message
Matt Riedemann (mriedem) wrote :

With a libvirtError coming up from the driver.spawn method, I think you'd get here:

https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L1784

And since you have retries left, you wouldn't call _cleanup_allocated_networks:

https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L1790

And since it's not the Ironic driver or an SR-IOV port you don't deallocate here:

https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L1811

So we call self.network_api.cleanup_instance_network_on_host but that's a noop for the neutron networking backend code in Nova:

https://github.com/openstack/nova/blob/stable/ocata/nova/network/neutronv2/api.py#L2335

So yeah, we don't cleanup the ports anywhere if this happens.

Changed in nova:
status: New → Triaged
Revision history for this message
Matt Riedemann (mriedem) wrote :

FWIW I think we have a few duplicate bugs for this same type of issue where we don't cleanup networking information on a reschedule. I'm sure there are patches floating around attempting to fix this.

tags: added: reschedule
Changed in nova:
assignee: nobody → Xuanzhou Perry Dong (oss-xzdong)
Changed in nova:
assignee: Xuanzhou Perry Dong (oss-xzdong) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.