OpenStack Compute (Nova)

floating ips are not disassociated from instances on deletion

Reported by Piotr Siwczak on 2012-05-10
46
This bug affects 10 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Trey Morris
Essex
Undecided
Unassigned
neutron
High
dan wendlandt
nova (Ubuntu)
Undecided
Unassigned
Precise
Undecided
Unassigned

Bug Description

The following scenario does not work:

nova add-floating-ip <instance> <floating_ip>

nova delete <instance>

nova "floating-ip-list" renders error like this:
"The server has either erred or is incapable of performing the requested operation"

This is because there's still mapping between the fixed ip of the deleted instance and floating ip left.
Right now one must explicitly type:
nova remove-floating-ip <instance> <floating_ip>
nova delete <instance>

Piotr Siwczak (psiwczak) on 2012-05-10
Changed in nova:
assignee: nobody → Piotr Siwczak (psiwczak)
Vish Ishaya (vishvananda) wrote :

This is a nasty bug. I think we need a few changes regarding updating network_info to make this work properly, which also means that deallocate_floating_ip may need to go through compute_api like allocate_floating_ip does.

Changed in nova:
status: New → Triaged
importance: Undecided → High
Piotr Siwczak (psiwczak) wrote :

I already have a "dirty" fix for this in compute/api.py

It probably deserves to be put into separate function in compute/api.py (the one that dissasociates all the floating ips from instance and takes just instance_id as a parameter.

in _delete function:

    def _delete(self, context, instance):
        host = instance['host']

        nw_info = self.network_api.get_instance_nw_info(context.elevated(),
                                                        instance)
        for i in nw_info:
            for j in i['network']['subnets']:
                for k in j['ips']:
                    for l in k['floating_ips']:
                        fip=l['address']
                        LOG.debug(_("Dissasociating floating ip %s from instance") % fip)
                        self.network_api.disassociate_floating_ip(context.elevated(), fip)

Vish Ishaya (vishvananda) wrote :

this actually looks acceptable with the refactor into a separate function and a test. I think we can handle the performance penalty of making an actual request for nw_info.

It would also be nice to handle:

nova floating-ip-list

to allow it to not error like it does above if the instance has been deleted.

 and to allow:

nova remove-floating-ip <instance> <floating_ip>

to work even if the instance is gone. (my initial reading of the code makes it seem like this will work but I'm not totally sure)

Piotr Siwczak (psiwczak) wrote :

Vish,
Do you have any ways to work around this performance penalty?

As for other:
nova floating-ip-list will work always with the above patch - the clue here is not to have references to non-existent floating ips in the table . So when we dissasociate them before vm deletion everything should be fine here

the last one - needs a review ;-)

Vish Ishaya (vishvananda) wrote :

The performance shouldn't be to bad so I wouldn't worry about it. I was originally thinking using the network_info cache so we wouldn't have to make a call but not a big deal.

I agree that the problem should never happen, but since floating ips will be managed by a separate service it shouldn't fail if the instance happens to not exist.

Piotr Siwczak (psiwczak) on 2012-05-30
Changed in nova:
status: Triaged → Fix Committed
status: Fix Committed → Fix Released
Piotr Siwczak (psiwczak) on 2012-05-30
Changed in nova:
status: Fix Released → Confirmed
Adrian Moya (adrianmoya) wrote :

I've just hit this bug from withing horizon. Is it possible to get the fix via apt-get? (I have an essex installed in ubuntu12.04). What would be the workarround?

Endre Karlson (endre-karlson) wrote :

+1 to this is a fact.

Endre Karlson (endre-karlson) wrote :

There's a funny thing happening with this in Horizon:
1. Assign a floating IP to a instance.
2. Terminate it
3. Create a new instance
4. The same IP is assigned to the new one?!

Fix proposed to branch: master
Review: https://review.openstack.org/7995

Changed in nova:
status: Confirmed → In Progress
Vish Ishaya (vishvananda) wrote :

I can't replicate this through the command line. When i create and associate a floating ip then delete the instance it is completely removed from the instance. Perhaps this is a horizon issue?

Piotr Siwczak (psiwczak) wrote :

Vish,

Seems to be quantum-related. I was able to replicate the bug running the latest devstack + quantum (added "q-agt,q-svc,quantum" to ENABLED_SERVICES in stackrc).

Normally when you run without quantum, the function "deallocate_for_instance" (nova/network/manager.py) seems to be taking care about cleaning floating ips from instance before termination. And this message is passed to logfile:

LOG.debug(_("floating IP deallocation for instance |%s|")

e.g (using flatdhcpmanager - I can see this in nova-network output):
2012-05-31 01:27:51 DEBUG nova.network.manager [req-627cb4c1-39dd-422d-a8ff-370232a76779 3380b86169804c3581d15cb00fe1f1a5 b02947aca9f3410882b8f6121cfc28a5] floating IP deallocation for instance |12| from (pid=9734) deallocate_for_instance /opt/stack/nova/nova/network/manager.py:358

I could not register this message using quantum manager though.

And here's the bug reproduced using up-to-date devstack + quantum. When the instance is deleted, we can no longer display floating ips:

http://paste.openstack.org/show/18296/

Vish Ishaya (vishvananda) wrote :

If this is a quantum issue, then I suspect the quantum manager is not properly deallocating floating ips in the deallocate_for_instance call. It seems preferable to make that work rather than add an extra call through the api to deallocate the ips.

dan wendlandt (danwent) wrote :

thanks for reporting this Piotr. The floating IP stuff with Quantum is only lightly used at this point, though I'm starting to work on a couple deployments who will be using it, so it should get a lot more testing soon.

If you run into other issues, please do keep reporting them.

Changed in quantum:
importance: Undecided → High
milestone: none → folsom-2
Piotr Siwczak (psiwczak) wrote :

Vish, Dan,

Since this will be fixed in quantum, I abandoned my proposed change.

Regards,
-Piotr

Fix proposed to branch: master
Review: https://review.openstack.org/8072

Changed in nova:
assignee: Piotr Siwczak (psiwczak) → dan wendlandt (danwent)
dan wendlandt (danwent) wrote :

Hi Piotr, please check out that patch and see if it fixes the problem for you. I tested it on my setup and it solved the issue. Thanks.

Changed in quantum:
assignee: nobody → dan wendlandt (danwent)
status: New → In Progress

Reviewed: https://review.openstack.org/8072
Committed: http://github.com/openstack/nova/commit/e6e0bf343f73fb664167f173ef2ae80d39a06540
Submitter: Jenkins
Branch: master

commit e6e0bf343f73fb664167f173ef2ae80d39a06540
Author: Dan Wendlandt <email address hidden>
Date: Fri Jun 1 17:07:13 2012 -0700

    Quantum Manager disassociate floating-ips on instance delete.

    bug #997763

    Change-Id: I4a1e6c63d2a27c361433b9150dd5ad5218578c02

Changed in nova:
status: In Progress → Fix Committed
Piotr Siwczak (psiwczak) wrote :

Dan,

The patch fixed the issue!

Thank you.
-Piotr

dan wendlandt (danwent) on 2012-06-04
Changed in quantum:
status: In Progress → Fix Committed

Fix proposed to branch: master
Review: https://review.openstack.org/8339

Changed in nova:
assignee: dan wendlandt (danwent) → Trey Morris (tr3buchet)
status: Fix Committed → In Progress
Tomoe Sugihara (tomoe) wrote :

Could this be backported to stable/essex?

On 06/14/2012 08:35 AM, Tomoe Sugihara wrote:
> Could this be backported to stable/essex?
>
Yes, this will be done soon.
Thanks
Gary

Reviewed: https://review.openstack.org/8339
Committed: http://github.com/openstack/nova/commit/82599c77346bbefd550ea4ee6c0b13a3df4950af
Submitter: Jenkins
Branch: master

commit 82599c77346bbefd550ea4ee6c0b13a3df4950af
Author: Trey Morris <email address hidden>
Date: Fri Jun 15 16:35:31 2012 -0500

    moved update cache functionality to the network api

    previously the network manager get_instance_nw_info
    was responsible for updating the cache. This is to
    prevent calling that function in a confusing way.

    part 2 of this patch was fixing bug997763
    floating_ip_associate was removed from the compute
    api. network api associate is now called directly.
    network api floating_ip functions now require
    instance as an argument in order to update cache.

    Change-Id: Ie8daa017b99e48769afbac4862696ef0a8eb1067

Changed in nova:
status: In Progress → Fix Committed

Is the nova part now in the maintenance release? We do experience exactly this behavior without quantum

Reviewed: https://review.openstack.org/8646
Committed: http://github.com/openstack/nova/commit/9b789bed095e6110d126f8d355e1434a2b0c60f0
Submitter: Jenkins
Branch: stable/essex

commit 9b789bed095e6110d126f8d355e1434a2b0c60f0
Author: Dan Wendlandt <email address hidden>
Date: Fri Jun 1 17:07:13 2012 -0700

    Quantum Manager disassociate floating-ips on instance delete.

    bug #997763

    Change-Id: I4a1e6c63d2a27c361433b9150dd5ad5218578c02

Thierry Carrez (ttx) on 2012-07-04
Changed in nova:
milestone: none → folsom-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-07-04
Changed in quantum:
status: Fix Committed → Fix Released
Dave Walker (davewalker) on 2012-08-24
Changed in nova (Ubuntu):
status: New → Fix Released
Changed in nova (Ubuntu Precise):
status: New → Confirmed

Please find the attached test log from the Ubuntu Server Team's CI infrastructure. As part of the verification process for this bug, Nova has been deployed and configured across multiple nodes using precise-proposed as an installation source. After successful bring-up and configuration of the cluster, a number of exercises and smoke tests have be invoked to ensure the updated package did not introduce any regressions. A number of test iterations were carried out to catch any possible transient errors.

Please Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the Jenkins links in the comments of the relevant upstream code-review(s):

Trunk review: https://review.openstack.org/8339
Stable review: https://review.openstack.org/8646

As per the provisional Micro Release Exception granted to this package by the Technical Board, we hope this contributes toward verification of this update.

Adam Gandelman (gandelman-a) wrote :

Test coverage log.

tags: added: verification-done
Launchpad Janitor (janitor) wrote :
Download full text (5.4 KiB)

This bug was fixed in the package nova - 2012.1.3+stable-20120827-4d2a4afe-0ubuntu1

---------------
nova (2012.1.3+stable-20120827-4d2a4afe-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot, fixes FTBFS in -proposed. (LP: #1041120)
  * Resynchronize with stable/essex (4d2a4afe):
    - [5d63601] Inappropriate exception handling on kvm live/block migration
      (LP: #917615)
    - [ae280ca] Deleted floating ips can cause instance delete to fail
      (LP: #1038266)

nova (2012.1.3+stable-20120824-86fb7362-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot. (LP: #1041120)
  * Dropped, superseded by new snapshot:
    - debian/patches/CVE-2012-3447.patch: [d9577ce]
    - debian/patches/CVE-2012-3371.patch: [25f5bd3]
    - debian/patches/CVE-2012-3360+3361.patch: [b0feaff]
  * Resynchronize with stable/essex (86fb7362):
    - [86fb736] Libvirt driver reports incorrect error when volume-detach fails
      (LP: #1029463)
    - [272b98d] nova delete lxc-instance umounts the wrong rootfs (LP: #971621)
    - [09217ab] Block storage connections are NOT restored on system reboot
      (LP: #1036902)
    - [d9577ce] CVE-2012-3361 not fully addressed (LP: #1031311)
    - [e8ef050] pycrypto is unused and the existing code is potentially insecure
      to use (LP: #1033178)
    - [3b4ac31] cannot umount guestfs (LP: #1013689)
    - [f8255f3] qpid_heartbeat setting in ineffective (LP: #1030430)
    - [413c641] Deallocation of fixed IP occurs before security group refresh
      leading to potential security issue in error / race conditions
      (LP: #1021352)
    - [219c5ca] Race condition in network/deallocate_for_instance() leads to
      security issue (LP: #1021340)
    - [f2bc403] cleanup_file_locks does not remove stale sentinel files
      (LP: #1018586)
    - [4c7d671] Deleting Flavor currently in use by instance creates error
      (LP: #994935)
    - [7e88e39] nova testsuite errors on newer versions of python-boto (e.g.
      2.5.2) (LP: #1027984)
    - [80d3026] NoMoreFloatingIps: Zero floating ips available after repeatedly
      creating and destroying instances over time (LP: #1017418)
    - [4d74631] Launching with source groups under load produces lazy load error
      (LP: #1018721)
    - [08e5128] API 'v1.1/{tenant_id}/os-hosts' does not return a list of hosts
      (LP: #1014925)
    - [801b94a] Restarting nova-compute removes ip packet filters (LP: #1027105)
    - [f6d1f55] instance live migration should create virtual_size disk image
      (LP: #977007)
    - [4b89b4f] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [6e873bc] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [7b215ed] Use default qemu-img cluster size in libvirt connection driver
    - [d3a87a2] Listing flavors with marker set returns 400 (LP: #956096)
    - [cf6a85a] nova-rootwrap hardcodes paths instead of using
      /sbin:/usr/sbin:/usr/bin:/bin (LP: #1013147)
    - [2efc87c] affinity filters don't work if scheduler_hints is None
      (LP: #1007573)
  ...

Read more...

Changed in nova (Ubuntu Precise):
status: Confirmed → Fix Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Thierry Carrez (ttx) on 2012-09-27
Changed in quantum:
milestone: folsom-2 → 2012.2
Thierry Carrez (ttx) on 2012-09-27
Changed in nova:
milestone: folsom-2 → 2012.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers