Networks are not cleaned up when create fails

Bug #1410739 reported by Paul Murray on 2015-01-14
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Paul Murray

Bug Description

While attempting to create a VM in an environment running with Neutron networking a failure was encountered getting the image from glance. At that point the networking has been allocated. The clean up code does not deallocate the networks. This behavior seems to be deliberately introduced in https://review.openstack.org/#/c/101335 which is a fix for https://bugs.launchpad.net/nova/+bug/1332198

The result of the failure was to reschedule the instance to another host. When the instance is created on the second host the ports are already bound to the first so they can not be bound to that host.

The original error from glance was:

2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] Traceback (most recent call last):
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2231, in _build_resources
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] yield resources
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2101, in _build_and_run_instance
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] block_device_info=block_device_info)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2616, in spawn
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] admin_pass=admin_password)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3018, in _create_image
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] project_id=instance['project_id'])
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 208, in cache
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] *args, **kwargs)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 451, in create_image
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] prepare_template(target=base, max_size=size, *args, **kwargs)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/lockutils.py", line 272, in inner
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] return f(*args, **kwargs)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 198, in fetch_func_sync
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] fetch_func(target=target, *args, **kwargs)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/libvirt/utils.py", line 452, in fetch_image
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] max_size=max_size)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/images.py", line 79, in fetch_to_raw
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] max_size=max_size)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/images.py", line 73, in fetch
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] IMAGE_API.download(context, image_href, dest_path=path)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/image/api.py", line 178, in download
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] dst_path=dest_path)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/image/glance.py", line 359, in download
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] _reraise_translated_image_exception(image_id)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/image/glance.py", line 357, in download
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] image_chunks = self._client.call(context, 1, 'data', image_id)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/image/glance.py", line 232, in call
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] return getattr(client.images, method)(*args, **kwargs)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/glanceclient/v1/images.py", line 142, in data
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] % urlparse.quote(str(image_id)))
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/glanceclient/common/http.py", line 253, in get
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] return self._request('GET', url, **kwargs)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/glanceclient/common/http.py", line 221, in _request
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] raise exc.from_response(resp, resp.content)
2015-01-08 22:39:14.251 14855 TRACE nova.compute.manager [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] HTTPInternalServerError: HTTPInternalServerError (HTTP 500)

The nova-compute log showing the clean up is as follows (some lines to do with messaging etc have been removed):

2015-01-08 22:39:15 14855 AUDIT nova.compute.manager [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] Terminating instance

2015-01-08 22:39:15.361 14855 WARNING nova.virt.libvirt.driver [-] [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] During wait destroy, instance disappeared.

2015-01-08 22:39:15 14855 DEBUG nova.objects.instance [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] Lazy-loading `system_metadata' on Instance uuid 25263c4a-ade3-41f1-9c38-2a8812c
dbbc8 obj_load_attr /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/objects/instance.py:572

2015-01-08 22:39:15 14855 DEBUG nova.openstack.common.processutils [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] Running cmd (subprocess): mv /mnt/state/var/lib/nova/instances/252
63c4a-ade3-41f1-9c38-2a8812cdbbc8 /mnt/state/var/lib/nova/instances/25263c4a-ade3-41f1-9c38-2a8812cdbbc8_del execute /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/processutils.py:161
2015-01-08 22:39:15 14855 DEBUG nova.openstack.common.processutils [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] Result was 0 execute /opt/stack/venvs/nova/local/lib/python2.7/sit
e-packages/nova/openstack/common/processutils.py:195
2015-01-08 22:39:15 14855 INFO nova.virt.libvirt.driver [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] Deleting instance files /mnt
/state/var/lib/nova/instances/25263c4a-ade3-41f1-9c38-2a8812cdbbc8_del
2015-01-08 22:39:15 14855 INFO nova.virt.libvirt.driver [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] Deletion of /mnt/state/var/l
ib/nova/instances/25263c4a-ade3-41f1-9c38-2a8812cdbbc8_del complete

2015-01-08 22:39:15 14855 DEBUG nova.compute.claims [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] Aborting claim: [Claim: 512 MB memory, 2 GB disk] abort /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/claims.py:127
2015-01-08 22:39:15 14855 DEBUG nova.openstack.common.lockutils [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] Created new semaphore "compute_resources" internal_lock /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/lockutils.py:206
2015-01-08 22:39:15 14855 DEBUG nova.openstack.common.lockutils [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] Acquired semaphore "compute_resources" lock /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/lockutils.py:229
2015-01-08 22:39:15 14855 DEBUG nova.openstack.common.lockutils [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] Got semaphore / lock "abort_instance_claim" inner /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/lockutils.py:271

2015-01-08 22:39:15 14855 INFO nova.scheduler.client.report [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] Compute_service record updated for ('overcloud-ce-novacompute5-novacompute5-73kkw7k6ose3', 'overcloud-ce-novacompute5-novacompute5-73kkw7k6ose3.novalocal')
2015-01-08 22:39:15 14855 DEBUG nova.openstack.common.lockutils [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] Releasing semaphore "compute_resources" lock /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/lockutils.py:238
2015-01-08 22:39:15 14855 DEBUG nova.openstack.common.lockutils [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] Semaphore / lock released "abort_instance_claim" inner /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/lockutils.py:275
2015-01-08 22:39:15 14855 DEBUG nova.compute.utils [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] HTTPInternalServerError (HTTP 500) notify_about_instance_usage /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/utils.py:307

2015-01-08 22:39:15 14855 DEBUG nova.compute.manager [req-fe4615a4-8aa0-4bc5-a377-6530da5a7468 d3c81059cc6e46a7b53c33c0f1d1bc92 76aac8b81d664000a08cbdec3fa52e5a] [instance: 25263c4a-ade3-41f1-9c38-2a8812cdbbc8] Build of instance 25263c4a-ade3-41f1-9c38-2a8812cdbbc8 was re-scheduled: HTTPInternalServerError (HTTP 500) do_build_and_run_instance /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py:2016

After the clean up the listing the ports shows:

root@my-server:# neutron port-show b4bbd25a-419e-4d0b-87f1-1786b004b441
+-----------------------+-------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:host_id | novacompute5-73kkw7k6ose3 |
| binding:profile | {} |
| binding:vif_details | {"port_filter": true, "ovs_hybrid_plug": true} |
| binding:vif_type | ovs |
| binding:vnic_type | normal |
| device_id | 25263c4a-ade3-41f1-9c38-2a8812cdbbc8 |
| device_owner | compute:None |
| extra_dhcp_opts | |
| fixed_ips | {"subnet_id": "f88386ff-3a5d-41be-b84b-3076100cdcf0", "ip_address": "71.181.229.3"} |
| id | b4bbd25a-419e-4d0b-87f1-1786b004b441 |
| mac_address | fa:16:3e:9a:33:d9 |
| name | |
| network_id | 1ec2a8dd-19db-4ac9-926f-79d501546915 |
| security_groups | 973724f5-df43-4b29-a6ef-ef0619a05f49 |
| status | BUILD |
| tenant_id | 76aac8b81d664000a08cbdec3fa52e5a |
+-----------------------+-------------------------------------------------------------------------------------+

Paul Murray (pmurray) wrote :

Seems to be related to this code snippet from _do_build_run_instance() in the compute manager:

            # NOTE(comstud): Deallocate networks if the driver wants
            # us to do so.
            if self.driver.deallocate_networks_on_reschedule(instance):
                self._cleanup_allocated_networks(context, instance,
                        requested_networks)

The default behavior in ComputeDriver is:

    def deallocate_networks_on_reschedule(self, instance):
        """Does the driver want networks deallocated on reschedule?"""
        return False

Only the Ironic driver over rides this method to return True, so I think this means the networks will not be cleaned up for any other virt driver.

Is this the behavior we want by default? I would have thought the default should be True.

Paul Murray (pmurray) on 2015-01-14
Changed in nova:
assignee: nobody → Paul Murray (pmurray)
Brian Haley (brian-haley) wrote :

Updating with email I sent to -dev list:

I was able to run a test here on a multi-node setup with DVR enabled:

- Booted VM
- Associated floating IP
- Updated binding:host_id (as admin) using the neutron API:

  $ neutron port-update $port -- --binding:host_id=novacompute5

The port was correctly moved to the other compute node and the floating IP
configured. So that showed me the agents all did the right thing as far as I
can tell. I know Paul was looking at the nova code to try and update just this
field, I'll check-in with him regarding that so we can get a patch up soon.

Paul Murray (pmurray) wrote :

This leaves us with three options:

1. always tear down the networks on reschedule

2. tear down the networks if the virt driver wants to OR if the NETWORK wants to

3. don't tear down the networks, rebind the ports if the network already exists

Looking at the code it seems that option 3 is more or less the same as building the networks in the first place. Instead of working out all the ports and creating them we are working out all the ports and updating them.

So to keep the code as simple as possible I propose doing option 2 by adding a method to the network api similar to the virt driver deallocate_networks_on_reschedule() in comment above. This will also be false by default and true for neutron if the port binding extension is being used. This approach fits nicely with the code above and leaves the behavior the same for other network implementations.

Fix proposed to branch: master
Review: https://review.openstack.org/147888

Changed in nova:
status: New → In Progress
Alex Xu (xuhj) wrote :

Looks like similar with this bug https://bugs.launchpad.net/nova/+bug/1327124

Paul Murray (pmurray) wrote :

Thanks @Alex - your fix does cover this one. I will abandon mine and lets try to get yours through quickly.

Change abandoned by Paul Murray (<email address hidden>) on branch: master
Review: https://review.openstack.org/147888
Reason: This fix is covered by https://review.openstack.org/#/c/98340/

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers