deallocating network not updating database

Bug #1285158 reported by moorryan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Aaron Rosen
Havana
Fix Released
Undecided
Unassigned

Bug Description

An instance that fails to spawn (say due to corrupt image download) on a compute node will call 'deallocate_for_instance' but does not update the database to remove the networking information.

from compute manager log for instance failing to spawn:
AUDIT nova.compute.manager [req-165b21e5-f727-4d2e-98f3-1a5f0039595e 10546644733724 10540146451709] [instance: 6b28052e-3488-404d-9626-bb42f51ae98f] Starting instance...
...
AUDIT nova.compute.claims [req-165b21e5-f727-4d2e-98f3-1a5f0039595e 10546644733724 10540146451709] [instance: 6b28052e-3488-404d-9626-bb42f51ae98f] Claim successful
...
DEBUG nova.network.neutronv2.api [-] [instance: 6b28052e-3488-404d-9626-bb42f51ae98f] Successfully created port: 38b34c17-e228-4d79-9248-c642a42959a8 _create_port /usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py:172
...
2014-02-11 13:20:25.257 24500 DEBUG nova.network.api [-] Updating cache with info: [VIF({'ovs_interfaceid': None, 'network': Network({'bridge': None, 'subnets': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': 'fixed', 'floating_ips': [], 'address': u'10.100.0.2'})], 'version': 4, 'meta': {'dhcp_server': u'10.100.0.3'}, 'dns': [], 'routes': [], 'cidr': u'10.100.0.0/16', 'gateway': IP({'meta': {}, 'version': 4, 'type': 'gateway', 'address': u'10.100.0.1'})})], 'meta': {'injected': False, 'tenant_id': u'10540146451709'}, 'id': u'09030eb3-bca3-4df4-a5f9-721b6bd5d599', 'label': u'private'}), 'devname': u'tap38b34c17-e2', 'qbh_params': None, 'meta': {}, 'address': u'fa:16:3e:22:7c:aa', 'type': u'other', 'id': u'38b34c17-e228-4d79-9248-c642a42959a8', 'qbg_params': None})] update_instance_cache_with_nw_info /usr/lib/python2.7/dist-packages/nova/network/api.py:72
...
ERROR nova.compute.manager [req-165b21e5-f727-4d2e-98f3-1a5f0039595e 10546644733724 10540146451709] [instance: 6b28052e-3488-404d-9626-bb42f51ae98f] Instance failed to spawn
TRACE nova.compute.manager [instance: 6b28052e-3488-404d-9626-bb42f51ae98f] IOError: [Errno 32] Corrupt image download. Checksum was 65327d2b03e53805a3354233b09aee62 expected 82d98abd651173e8c3e74b02d811f8a1
AUDIT nova.compute.manager [req-165b21e5-f727-4d2e-98f3-1a5f0039595e 10546644733724 10540146451709] [instance: 6b28052e-3488-404d-9626-bb42f51ae98f] Terminating instance
DEBUG nova.compute.manager [req-165b21e5-f727-4d2e-98f3-1a5f0039595e 10546644733724 10540146451709] [instance: 6b28052e-3488-404d-9626-bb42f51ae98f] Deallocating network for instance _deallocate_network /usr/lib/python2.7/dist-packages/nova/compute/manager.py:1518

When the instance successfully spawns on another compute node
AUDIT nova.compute.manager [req-165b21e5-f727-4d2e-98f3-1a5f0039595e 10546644733724 10540146451709] [instance: 6b28052e-3488-404d-9626-bb42f51ae98f] Starting instance...
...
AUDIT nova.compute.claims [req-165b21e5-f727-4d2e-98f3-1a5f0039595e 10546644733724 10540146451709] [instance: 6b28052e-3488-404d-9626-bb42f51ae98f] Claim successful
...
DEBUG nova.network.neutronv2.api [-] [instance: 6b28052e-3488-404d-9626-bb42f51ae98f] Successfully created port: 59f61b81-3314-4bdd-b455-2611af6653c2 _create_port /usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py:172
...
2014-02-11 13:21:20.491 63160 DEBUG nova.network.api [-] Updating cache with info: [VIF({'ovs_interfaceid': None, 'network': Network({'bridge': None, 'subnets': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': u'fixed', 'floating_ips': [], 'address': u'10.100.0.2'})], 'version': 4, 'meta': {u'dhcp_server': u'10.100.0.3'}, 'dns': [], 'routes': [], 'cidr': u'10.100.0.0/16', 'gateway': IP({'meta': {}, 'version': 4, 'type': u'gateway', 'address': u'10.100.0.1'})})], 'meta': {u'injected': False, u'tenant_id': u'10540146451709'}, 'id': u'09030eb3-bca3-4df4-a5f9-721b6bd5d599', 'label': u'private'}), 'devname': u'tap38b34c17-e2', 'qbh_params': None, 'meta': {}, 'address': u'fa:16:3e:22:7c:aa', 'type': u'other', 'id': u'38b34c17-e228-4d79-9248-c642a42959a8', 'qbg_params': None}), VIF({'ovs_interfaceid': None, 'network': Network({'bridge': None, 'subnets': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': 'fixed', 'floating_ips': [], 'address': u'10.100.0.4'})], 'version': 4, 'meta': {'dhcp_server': u'10.100.0.3'}, 'dns': [], 'routes': [], 'cidr': u'10.100.0.0/16', 'gateway': IP({'meta': {}, 'version': 4, 'type': 'gateway', 'address': u'10.100.0.1'})})], 'meta': {'injected': False, 'tenant_id': u'10540146451709'}, 'id': u'09030eb3-bca3-4df4-a5f9-721b6bd5d599', 'label': u'private'}), 'devname': u'tap59f61b81-33', 'qbh_params': None, 'meta': {}, 'address': u'fa:16:3e:51:5d:bf', 'type': u'other', 'id': u'59f61b81-3314-4bdd-b455-2611af6653c2', 'qbg_params': None})] update_instance_cache_with_nw_info /usr/lib/python2.7/dist-packages/nova/network/api.py:72

So what we are seeing here is that the fixed IP address (10.100.0.2) allocated during the spawn of the instance
on the first compute node still exists when the port is allocated on the second compute node. Even though there was a succesful call to deallocate?
Leading to the allocation of multiple fixed IP's.

 - the association of a floating IP to the fixed IP fails with:

WARNING nova.api.openstack.compute.contrib.floating_ips [req-994a5ce9-fc42-4328-9eac-a05096fa25c3 10546644733724 10540146451709] multiple fixed_ips exist, using the first: 10.100.0.2
ERROR nova.api.openstack.compute.contrib.floating_ips [req-994a5ce9-fc42-4328-9eac-a05096fa25c3 10546644733724 10540146451709] Error. Unable to associate floating ip
TRACE nova.api.openstack.compute.contrib.floating_ips Traceback (most recent call last):
TRACE nova.api.openstack.compute.contrib.floating_ips File "/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/contrib/floating_ips.py", line 255, in _add_floating_ip
TRACE nova.api.openstack.compute.contrib.floating_ips fixed_address=fixed_address)
TRACE nova.api.openstack.compute.contrib.floating_ips File "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 50, in wrapper
TRACE nova.api.openstack.compute.contrib.floating_ips res = f(self, context, *args, **kwargs)
nova.api.openstack.compute.contrib.floating_ips File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 649, in associate_floating_ip
nova.api.openstack.compute.contrib.floating_ips fixed_address)
nova.api.openstack.compute.contrib.floating_ips File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 634, in _get_port_id_by_fixed_address
TRACE nova.api.openstack.compute.contrib.floating_ips raise exception.FixedIpNotFoundForAddress(address=address)
TRACE nova.api.openstack.compute.contrib.floating_ips FixedIpNotFoundForAddress: Fixed ip not found for address 10.100.0.2.

Looking through the code, it seems that the call to _deallocate_network only removes the network allocation from the neutron side. It does not then update the database to reflect the deallocation.

Tracy Jones (tjones-i)
tags: added: network
Aaron Rosen (arosen)
Changed in nova:
assignee: nobody → Aaron Rosen (arosen)
Aaron Rosen (arosen)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/77802

Changed in nova:
status: New → In Progress
Aaron Rosen (arosen)
tags: added: icehouse-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/77802
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3538a9cd06e1e46ec5698edf4018c8d3075b619b
Submitter: Jenkins
Branch: master

commit 3538a9cd06e1e46ec5698edf4018c8d3075b619b
Author: Aaron Rosen <email address hidden>
Date: Mon Mar 3 22:55:35 2014 -0800

    network_info cache should be cleared before being rescheduled

    If an instance fails to boot due to a non-networking error the instance
    then gets rescheduled and launched on another compute node. In these cases
    deallocate_for_instance() is called which deletes the network ports
    allocated though the info_cache for the instance is never cleared. This patch
    adds a call to update_instance_cache_with_nw_info() which causes the cache
    to get cleared out. Note: the cache is only cleared if the instance hasn't
    been marked for deletion. This is due to how instance_info_cache_update()
    is implemented.

    Change-Id: If967884c9a6276f5949a7a04b597cedcce12ba09
    Closes-bug: #1285158

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → icehouse-rc1
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/84583

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/84583
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4f262692cef46f2a7f78f3491c31bd5bf245de00
Submitter: Jenkins
Branch: stable/havana

commit 4f262692cef46f2a7f78f3491c31bd5bf245de00
Author: Aaron Rosen <email address hidden>
Date: Mon Mar 3 22:55:35 2014 -0800

    network_info cache should be cleared before being rescheduled

    If an instance fails to boot due to a non-networking error the instance
    then gets rescheduled and launched on another compute node. In these cases
    deallocate_for_instance() is called which deletes the network ports
    allocated though the info_cache for the instance is never cleared. This patch
    adds a call to update_instance_cache_with_nw_info() which causes the cache
    to get cleared out. Note: the cache is only cleared if the instance hasn't
    been marked for deletion. This is due to how instance_info_cache_update()
    is implemented.

    Change-Id: If967884c9a6276f5949a7a04b597cedcce12ba09
    Closes-bug: #1285158
    (cherry picked from commit 3538a9cd06e1e46ec5698edf4018c8d3075b619b)

tags: added: in-stable-havana
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-rc1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.