OpenStack Compute (Nova)

Openstack does not free fixed IPs when instances are removed

Reported by David Lawson on 2012-04-04
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Critical
Vish Ishaya

Bug Description

Running the latest version of essex, we're experiencing a problem where fixed IPs are not freed to be reused when the instance they were associated with is removed. We've resorted to running SQL from cron, like so:

update fixed_ips set instance_id = NULL where reserved = false and allocated = false and leased = false and instance_id is not NULL;

This makes it appear that either the method for finding free fixed IPs is faulty or fixed IPs aren't properly being marked as unassociated when their instance goes away.

Vish Ishaya (vishvananda) wrote :

are you using:

force_dhcp_release?

or no

David Lawson (deej) wrote :

We are, yes.

Vish Ishaya (vishvananda) wrote :

I don't see anything in the current code base that would cause this to happen. When the dhcp release comes in, it should disassociate if the ip is no longer allocated. Could you check your logs to see if a lease is coming in after the release?

Vish Ishaya (vishvananda) wrote :

ok i replicated this. Due to retrieval of vif happening after we've already lost it.

Vish Ishaya (vishvananda) wrote :

2012-04-04 15:38:51 ERROR nova.network.manager [req-5dbb20c4-83db-4ffa-85e7-25307307c63a c3761148189b412cbb4ecd498cf5912e 15bad91537084f24867a7d844a681aad] Unable to release 10.0.0.2 because vif doesn't exist.

Thierry Carrez (ttx) on 2012-04-04
tags: added: essex-rc-potential
Changed in nova:
importance: Undecided → Critical
status: New → Triaged

Fix proposed to branch: master
Review: https://review.openstack.org/6229

Changed in nova:
assignee: nobody → Vish Ishaya (vishvananda)
status: Triaged → In Progress

Reviewed: https://review.openstack.org/6229
Committed: http://github.com/openstack/nova/commit/cabe27b955918cbfc410ad20cf9244d5ed4439bc
Submitter: Jenkins
Branch: master

commit cabe27b955918cbfc410ad20cf9244d5ed4439bc
Author: Vishvananda Ishaya <email address hidden>
Date: Wed Apr 4 16:14:50 2012 +0000

    Fixes regression in release_dhcp

     * regression from c96e75d6804d016da7c6356bf593eb86dcb2f257
     * fixes out of order update and retrieval of vif_id
     * includes failing test
     * fixes bug 973442

    Change-Id: I3bea1c754042ad5960f285fbcdc1d45445079f81

Changed in nova:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/6230
Committed: http://github.com/openstack/nova/commit/f5bdaed0d5e0556a16c5d0a22b1a476a263867f9
Submitter: Jenkins
Branch: milestone-proposed

commit f5bdaed0d5e0556a16c5d0a22b1a476a263867f9
Author: Vishvananda Ishaya <email address hidden>
Date: Wed Apr 4 16:14:50 2012 +0000

    Fixes regression in release_dhcp

     * regression from c96e75d6804d016da7c6356bf593eb86dcb2f257
     * fixes out of order update and retrieval of vif_id
     * includes failing test
     * fixes bug 973442

    Change-Id: I3bea1c754042ad5960f285fbcdc1d45445079f81

Changed in nova:
status: Fix Committed → Fix Released
Changed in nova:
milestone: none → essex-rc4
Thierry Carrez (ttx) on 2012-04-05
Changed in nova:
milestone: essex-rc4 → 2012.1
David Kranz (david-kranz) wrote :

I am still seeing this while running the tempest server create/destroy stress test with

ii nova-network 2012.1-0ubuntu2 OpenStack Compute - Network manager

network log attached

David Kranz (david-kranz) wrote :
Sina Sadeghi (ssadeghi) wrote :

We seem to be affected by this bug, although the patches shown above exist in our code tree

Sina Sadeghi (ssadeghi) wrote :

Currently (out of an nova-compute cluster of size ~80 with heavy usage) we find at least one or two compute nodes every day with this issue, requiring a 'killall dnsmasq; service nova-network restart' to fix the issue.

Vish Ishaya (vishvananda) wrote :

It shouldn't be a huge issue if some ips are not released immediately. If you have this fix included they should be timed out after a while:

https://review.openstack.org/9030/

Also, these fixes should help minimize race conditions on deallocation, so they might fix the issue that you are seeing:

https://review.openstack.org/9041
https://review.openstack.org/10387

Vish, I have forwarded your email on so the team can discuss today.
Thanks for the response as we were investigating this issue again
yesterday without any success.

On 27/07/12 09:02, Vish Ishaya wrote:
> It shouldn't be a huge issue if some ips are not released immediately.
> If you have this fix included they should be timed out after a while:
>
> https://review.openstack.org/9030/
>
> Also, these fixes should help minimize race conditions on deallocation,
> so they might fix the issue that you are seeing:
>
> https://review.openstack.org/9041
> https://review.openstack.org/10387
>

--
Sina Sadeghi
Research Cloud Systems Administrator

Dražen Lučanin (kermit666) wrote :

I am also experiencing this issue in Folsom, even though the patch appears to be part of the code already.

I am running a multinode installation with nova-network instances on all the nodes. While booting an instance on a compute-only node I get an error

    {u'message': u'NoValidHost', u'code': 500, u'created': u'2012-12-19T22:24:25Z'}

And the nova-network.log on the node in question shows

     2012-12-19 23:26:39 ERROR nova.network.manager [req-445d7128-60e7-4b9e-899b-a5eab4f4c297 41f0850b0dfe487cad02c13c8ea45dda 908a8e207bd047a49bc0717f7a4a2477] Unable to release 192.168.100.9 because vif doesn't exist.

Is it maybe relevant that I only created the VM IP pool on the controller node?

An interesting thing is that everything worked while I only had 2 nodes (controller + compute) and this error started appearing after I added a 3rd node (compute) - and only on the two compute nodes.

     sudo nova-manage network create private --multi_host=T --fixed_range_v4=192.168.100.0/24 --bridge_interface=br100 --num_networks=1 --network_size=256

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers