NoMoreFloatingIps: Zero floating ips available after repeatedly creating and destroying instances over time

Bug #1017418 reported by Kevin Jackson
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Vish Ishaya
Essex
Fix Released
Critical
Vish Ishaya
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Precise
Fix Released
Undecided
Unassigned

Bug Description

Problem: NoMoreFloatingIps: Zero floating ips available
Configuration: FlatDHCP, auto_assign_floating_ip
  multi_host

Floating Ranges created using:
nova-manage floating create 172.29.3.0/24
nova-manage floating create 172.29.4.0/24

This gives me 512 floating addresses I can use.

Public: bond0 (eth0 and eth1)
Private: eth2 (unbonded)

OS: Ubuntu 12.04 LTS

ii nova-api 2012.1-0ubuntu2.3 OpenStack Compute - API frontend
ii nova-common 2012.1-0ubuntu2.3 OpenStack Compute - common files
ii nova-compute 2012.1-0ubuntu2.3 OpenStack Compute - compute node
ii nova-compute-kvm 2012.1-0ubuntu2.3 OpenStack Compute - compute node (KVM)
ii nova-network 2012.1-0ubuntu2.3 OpenStack Compute - Network manager
ii python-glance 2012.1-0ubuntu2.1 OpenStack Image Registry and Delivery Service - Python library
ii python-keystone 2012.1-0ubuntu1 OpenStack identity service - Python library
ii python-keystoneclient 2012.1-0ubuntu1 Client libary for Openstack Keystone API
ii python-nova 2012.1-0ubuntu2.3 OpenStack Compute Python libraries
ii python-novaclient 2012.1-0ubuntu1 client library for OpenStack Compute API

I run a script to create and destroy instances repeatedly and report on the number of instances that made it to active state and the number of machines set to error state.
After a while of running the script (which just creates instances of the same type, waits for 20 seconds, then deletes them, waits for 20 seconds, and repeats) I get ERRORS.
On viewing the logs I see lots of errors relating to Floating IPs:

/var/log/nova/nova-network.log:
2012-06-25 10:11:13 ERROR nova.rpc.amqp [req-6814af4e-90ef-425f-9756-11fcf84b0a39 cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Exception during message handling
2012-06-25 10:11:13 TRACE nova.rpc.amqp Traceback (most recent call last):
2012-06-25 10:11:13 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data
2012-06-25 10:11:13 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args)
2012-06-25 10:11:13 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped
2012-06-25 10:11:13 TRACE nova.rpc.amqp return func(self, context, *args, **kwargs)
2012-06-25 10:11:13 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 371, in deallocate_for_instance
2012-06-25 10:11:13 TRACE nova.rpc.amqp affect_auto_assigned=True)
2012-06-25 10:11:13 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped
2012-06-25 10:11:13 TRACE nova.rpc.amqp return func(self, context, *args, **kwargs)
2012-06-25 10:11:13 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 520, in disassociate_floating_ip
2012-06-25 10:11:13 TRACE nova.rpc.amqp raise exception.FloatingIpNotAssociated(address=floating_address)
2012-06-25 10:11:13 TRACE nova.rpc.amqp FloatingIpNotAssociated: Floating ip 172.29.4.6 is not associated.
2012-06-25 10:11:13 TRACE nova.rpc.amqp

Run 1, Num Errors: 0, Num Active: 21
Run 2, Num Errors: 0, Num Active: 19
Run 3, Num Errors: 0, Num Active: 19
Run 4, Num Errors: 0, Num Active: 17
Run 5, Num Errors: 2, Num Active: 19
Run 6, Num Errors: 0, Num Active: 18
Run 7, Num Errors: 0, Num Active: 21
Run 8, Num Errors: 1, Num Active: 20
Run 9, Num Errors: 1, Num Active: 19
Run 10, Num Errors: 0, Num Active: 21
Run 11, Num Errors: 0, Num Active: 21
Run 12, Num Errors: 0, Num Active: 21
Run 13, Num Errors: 0, Num Active: 19
Run 14, Num Errors: 0, Num Active: 21
Run 15, Num Errors: 1, Num Active: 17

nova list
+----+------+--------+----------+
| ID | Name | Status | Networks |
+----+------+--------+----------+
+----+------+--------+----------+

nova-manage floating list
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.1 Missing instance 5762 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.2 Missing instance 5765 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.3 Missing instance 5771 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.4 Missing instance 5774 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.5 Missing instance 5768 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.6 Missing instance 5777 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.7 Missing instance 5780 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.8 Missing instance 5786 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.9 Missing instance 5783 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.10 Missing instance 5789 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.11 Missing instance 5792 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.12 Missing instance 5795 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.13 Missing instance 5801 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.14 Missing instance 5798 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.15 Missing instance 5804 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.16 Missing instance 5807 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.17 Missing instance 5813 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.18 Missing instance 5819 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.19 Missing instance 5816 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.20 Missing instance 5822 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.21 Missing instance 5828 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.22 Missing instance 5810 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.23 Missing instance 5825 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.24 Missing instance 5834 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.25 Missing instance 5831 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.26 Missing instance 5837 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.27 Missing instance 5840 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.28 Missing instance 5846 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.29 Missing instance 5843 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.30 Missing instance 5849 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.31 Missing instance 5852 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.32 Missing instance 5858 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.33 Missing instance 5855 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.34 Missing instance 5864 nova bond0

Showing what is not in this "Missing instance" state:

nova-manage floating list | egrep -v "Missing"
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.234 None nova bond0
None 172.29.4.193 None nova bond0
None 172.29.4.194 None nova bond0
None 172.29.4.195 None nova bond0
None 172.29.4.196 None nova bond0
None 172.29.4.197 None nova bond0
None 172.29.4.198 None nova bond0
None 172.29.4.199 None nova bond0
None 172.29.4.200 None nova bond0
None 172.29.4.201 None nova bond0
None 172.29.4.202 None nova bond0
None 172.29.4.203 None nova bond0
None 172.29.4.204 None nova bond0
None 172.29.4.205 None nova bond0
None 172.29.4.206 None nova bond0
None 172.29.4.207 None nova bond0
None 172.29.4.208 None nova bond0
None 172.29.4.209 None nova bond0
None 172.29.4.210 None nova bond0
None 172.29.4.211 None nova bond0
None 172.29.4.212 None nova bond0
None 172.29.4.213 None nova bond0
None 172.29.4.214 None nova bond0
None 172.29.4.215 None nova bond0
None 172.29.4.216 None nova bond0
None 172.29.4.217 None nova bond0
None 172.29.4.218 None nova bond0
None 172.29.4.219 None nova bond0
None 172.29.4.220 None nova bond0
None 172.29.4.221 None nova bond0
None 172.29.4.222 None nova bond0
None 172.29.4.223 None nova bond0
None 172.29.4.224 None nova bond0
None 172.29.4.225 None nova bond0
None 172.29.4.226 None nova bond0
None 172.29.4.227 None nova bond0
None 172.29.4.228 None nova bond0
None 172.29.4.229 None nova bond0
None 172.29.4.230 None nova bond0
None 172.29.4.231 None nova bond0
None 172.29.4.232 None nova bond0
None 172.29.4.233 None nova bond0
None 172.29.4.234 None nova bond0
None 172.29.4.235 None nova bond0
None 172.29.4.236 None nova bond0
None 172.29.4.237 None nova bond0
None 172.29.4.238 None nova bond0
None 172.29.4.239 None nova bond0
None 172.29.4.240 None nova bond0
None 172.29.4.241 None nova bond0
None 172.29.4.242 None nova bond0
None 172.29.4.243 None nova bond0
None 172.29.4.244 None nova bond0
None 172.29.4.245 None nova bond0
None 172.29.4.246 None nova bond0
None 172.29.4.247 None nova bond0
None 172.29.4.248 None nova bond0
None 172.29.4.249 None nova bond0
None 172.29.4.250 None nova bond0
None 172.29.4.251 None nova bond0
None 172.29.4.252 None nova bond0
None 172.29.4.253 None nova bond0
None 172.29.4.254 None nova bond0

Kick off test again:

on compute hosts /var/log/nova/nova-network.log:
2012-06-25 10:49:28 ERROR nova.rpc.amqp [req-9911e91f-2175-4aaf-899c-e6dabcfaaa29 cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Exception during message handling
2012-06-25 10:49:28 TRACE nova.rpc.amqp Traceback (most recent call last):
2012-06-25 10:49:28 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data
2012-06-25 10:49:28 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args)
2012-06-25 10:49:28 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped
2012-06-25 10:49:28 TRACE nova.rpc.amqp return func(self, context, *args, **kwargs)
2012-06-25 10:49:28 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 371, in deallocate_for_instance
2012-06-25 10:49:28 TRACE nova.rpc.amqp affect_auto_assigned=True)
2012-06-25 10:49:28 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped
2012-06-25 10:49:28 TRACE nova.rpc.amqp return func(self, context, *args, **kwargs)
2012-06-25 10:49:28 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 520, in disassociate_floating_ip
2012-06-25 10:49:28 TRACE nova.rpc.amqp raise exception.FloatingIpNotAssociated(address=floating_address)
2012-06-25 10:49:28 TRACE nova.rpc.amqp FloatingIpNotAssociated: Floating ip 172.29.4.210 is not associated.
2012-06-25 10:49:28 TRACE nova.rpc.amqp
2012-06-25 10:49:34 ERROR nova.rpc.amqp [req-303bd4ee-5104-4658-be1d-117016841263 cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Exception during message handling
2012-06-25 10:49:34 TRACE nova.rpc.amqp Traceback (most recent call last):
2012-06-25 10:49:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data
2012-06-25 10:49:34 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args)
2012-06-25 10:49:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped
2012-06-25 10:49:34 TRACE nova.rpc.amqp return func(self, context, *args, **kwargs)
2012-06-25 10:49:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 371, in deallocate_for_instance
2012-06-25 10:49:34 TRACE nova.rpc.amqp affect_auto_assigned=True)
2012-06-25 10:49:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped
2012-06-25 10:49:34 TRACE nova.rpc.amqp return func(self, context, *args, **kwargs)
2012-06-25 10:49:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 520, in disassociate_floating_ip
2012-06-25 10:49:34 TRACE nova.rpc.amqp raise exception.FloatingIpNotAssociated(address=floating_address)
2012-06-25 10:49:34 TRACE nova.rpc.amqp FloatingIpNotAssociated: Floating ip 172.29.4.221 is not associated.
2012-06-25 10:49:34 TRACE nova.rpc.amqp

Then eventually

Run 1, Num Errors: 0, Num Active: 21
Run 2, Num Errors: 1, Num Active: 19
Run 3, Num Errors: 27, Num Active: 3
Run 4, Num Errors: 30, Num Active: 0

2012-06-25 10:53:31 ERROR nova.rpc.amqp [req-54164eb0-8e71-42bc-b812-7c55ceca6bcf cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Returning exception Zero floating ips available. to caller
2012-06-25 10:53:31 ERROR nova.rpc.amqp [req-54164eb0-8e71-42bc-b812-7c55ceca6bcf cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data\n rval = node_func(context=ctxt, **node_args)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped\n return func(self, context, *args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 324, in allocate_for_instance\n floating_address = self.allocate_floating_ip(context, project_id)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped\n return func(self, context, *args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 408, in allocate_floating_ip\n pool)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 246, in floating_ip_allocate_address\n return IMPL.floating_ip_allocate_address(context, project_id, pool)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 642, in floating_ip_allocate_address\n raise exception.NoMoreFloatingIps()\n', 'NoMoreFloatingIps: Zero floating ips available.\n']
2012-06-25 10:53:34 ERROR nova.rpc.amqp [req-326cd1e6-7024-4b58-8b47-43b1cbc493ec cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Exception during message handling
2012-06-25 10:53:34 TRACE nova.rpc.amqp Traceback (most recent call last):
2012-06-25 10:53:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data
2012-06-25 10:53:34 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args)
2012-06-25 10:53:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped
2012-06-25 10:53:34 TRACE nova.rpc.amqp return func(self, context, *args, **kwargs)
2012-06-25 10:53:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 324, in allocate_for_instance
2012-06-25 10:53:34 TRACE nova.rpc.amqp floating_address = self.allocate_floating_ip(context, project_id)
2012-06-25 10:53:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped
2012-06-25 10:53:34 TRACE nova.rpc.amqp return func(self, context, *args, **kwargs)
2012-06-25 10:53:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 408, in allocate_floating_ip
2012-06-25 10:53:34 TRACE nova.rpc.amqp pool)
2012-06-25 10:53:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 246, in floating_ip_allocate_address
2012-06-25 10:53:34 TRACE nova.rpc.amqp return IMPL.floating_ip_allocate_address(context, project_id, pool)
2012-06-25 10:53:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper
2012-06-25 10:53:34 TRACE nova.rpc.amqp return f(*args, **kwargs)
2012-06-25 10:53:34 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 642, in floating_ip_allocate_address
2012-06-25 10:53:34 TRACE nova.rpc.amqp raise exception.NoMoreFloatingIps()
2012-06-25 10:53:34 TRACE nova.rpc.amqp NoMoreFloatingIps: Zero floating ips available.
2012-06-25 10:53:34 TRACE nova.rpc.amqp
2012-06-25 10:53:34 ERROR nova.rpc.amqp [req-326cd1e6-7024-4b58-8b47-43b1cbc493ec cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Returning exception Zero floating ips available. to caller
2012-06-25 10:53:34 ERROR nova.rpc.amqp [req-326cd1e6-7024-4b58-8b47-43b1cbc493ec cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data\n rval = node_func(context=ctxt, **node_args)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped\n return func(self, context, *args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 324, in allocate_for_instance\n floating_address = self.allocate_floating_ip(context, project_id)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 258, in wrapped\n return func(self, context, *args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/network/manager.py", line 408, in allocate_floating_ip\n pool)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 246, in floating_ip_allocate_address\n return IMPL.floating_ip_allocate_address(context, project_id, pool)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 642, in floating_ip_allocate_address\n raise exception.NoMoreFloatingIps()\n', 'NoMoreFloatingIps: Zero floating ips available.\n']
2012-06-25 10:54:04 ERROR nova.network.manager [req-3beb2739-cd48-41a8-870d-d69c4fe530ae cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Unable to release 10.29.4.19 because vif doesn't exist.
2012-06-25 10:54:10 ERROR nova.network.manager [req-e8e13ea9-c8e2-46e5-a790-4d40c990f16c cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Unable to release 10.29.4.21 because vif doesn't exist.
2012-06-25 10:54:12 ERROR nova.network.manager [req-45de33b1-b132-448c-9a67-fe14ed06423d cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Unable to release 10.29.4.24 because vif doesn't exist.
2012-06-25 10:54:17 ERROR nova.network.manager [req-68e2a7bb-9a20-4565-b880-012be66c9618 cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Unable to release 10.29.4.29 because vif doesn't exist.
2012-06-25 10:54:20 ERROR nova.network.manager [req-9b104583-de0d-4a6c-a8f8-7b750c5f8ea7 cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Unable to release 10.29.4.36 because vif doesn't exist.
2012-06-25 10:54:23 ERROR nova.network.manager [req-c7d11987-7148-41a9-a1c1-a759343347d6 cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Unable to release 10.29.4.37 because vif doesn't exist.

Related branches

Revision history for this message
Vish Ishaya (vishvananda) wrote :

Is it possible that your script is deleting instances before they are done creating? I don't see any way for the code path with the error to get hit unless the network deallocate is getting called before the allocation is done somehow. Can you try either waiting until the api says the instance is running or upping the wait to a minute to see if the error persists? That will help us track down if it is a race condition or a logic error somewhere.

Vish

Changed in nova:
status: New → Incomplete
Revision history for this message
Kevin Jackson (kevin-linuxservices) wrote :

Hi Vish, you are correct in assuming the script doesn't wait for the machines to finish booting before deleting them. Patience was never a strong point of mine... the test I was performing was as follows - not pretty, but highlights the issue:

NUM_INSTANCES=30
NUM_ITERATIONS=15

for I in `seq 1 ${NUM_ITERATIONS}`
do
        # Spin up N instances
        for a in `seq 1 ${NUM_INSTANCES}`; do nova boot kevinj${a} --flavor=2 --image 57a173b0-f24d-4e2d-b914-e8f60ed1b881 --key_name kevinj --security_groups default 2>&1 > /dev/null; done

        sleep 20

        NUM_ERRORS=$(nova list | grep -c ERR)
        NUM_ACTIVE=$(nova list | grep -c ACTIVE)

        echo "Run $I, Num Errors: ${NUM_ERRORS}, Num Active: ${NUM_ACTIVE}"
        for a in `seq 1 ${NUM_INSTANCES}`; do nova delete kevinj${a} 2>&1 > /dev/null; done

        sleep 20
done

I'll re-run the test with the --poll option to see if it makes a difference.

Either way, if this is the case that a race condition can cause the floating IPs to become unavailable - where --auto_assign_floating_ip is used, and therefore cause no more instances to spawn - this is a Security/DDoS issue.

Revision history for this message
Thierry Carrez (ttx) wrote :

Would that be caught by quota limitations ?
Also would the floating IPs be freed after a period of time ? Would the consumer of those be billed for the usage ? This is not a DoS attack if the attacker is just a user that gets billed for resources he consumes and everything gets back to normal if he stops ?

Changed in nova:
importance: Undecided → Medium
status: Incomplete → Confirmed
Revision history for this message
Kevin Jackson (kevin-linuxservices) wrote :

Ah, Thierry - I forgot to add this little ditty which is crucial to this. I modified the rate limit - as well as quotas (for internal clouds, the quotas are too strict when the idea is, under an account, to spin up as many machines as required - especially for a "production" tenant)...:

/etc/nova/api-paste.ini:
[filter:ratelimit]
paste.filter_factory = nova.api.openstack.compute.limits:RateLimitingMiddleware.factory
limits =("POST", "*", ".*", 120, MINUTE);("POST", "*/servers", "^/servers", 1000, DAY);("PUT", "*", ".*", 120, MINUTE);("GET", "*changes-since*", ".*changes-since.*", 30, MINUTE);("DELETE", "*", ".*", 100,MINUTE)

In terms of DoS and billed: for internal private clouds, this very probably won't involved money changing hands so this makes the resource "limitless" providing the local network level DoS

I don't think ratelimit or quota will help this race condition.
I've not seen floating ips re-appear after a while - after returning the next day to the environment, the environment is still broke.

Revision history for this message
Thierry Carrez (ttx) wrote :

Awesome, adding VMT to track this

Changed in nova:
importance: Medium → High
Revision history for this message
Kevin Jackson (kevin-linuxservices) wrote :
Download full text (6.4 KiB)

I re-ran that script above, adding --poll to the end of the nova boot command:

Run 1, Num Errors: 0, Num Active: 30
Run 2, Num Errors: 0, Num Active: 30
Run 3, Num Errors: 0, Num Active: 30
Run 4, Num Errors: 0, Num Active: 30
Run 5, Num Errors: 0, Num Active: 30
Run 6, Num Errors: 0, Num Active: 30
Run 7, Num Errors: 0, Num Active: 30
Run 8, Num Errors: 0, Num Active: 30
Run 9, Num Errors: 0, Num Active: 30
Run 10, Num Errors: 0, Num Active: 30
Run 11, Num Errors: 0, Num Active: 30
Run 12, Num Errors: 0, Num Active: 30
Run 13, Num Errors: 0, Num Active: 30
Run 14, Num Errors: 0, Num Active: 30
Run 15, Num Errors: 0, Num Active: 30

sudo nova-manage floating list
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.1 Missing instance 7469 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.2 Missing instance 7472 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.3 Missing instance 7475 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.4 Missing instance 7478 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.5 Missing instance 7481 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.6 Missing instance 7484 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.7 Missing instance 7487 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.8 Missing instance 7490 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.9 Missing instance 7493 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.10 Missing instance 7496 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.4.11 Missing instance 7499 nova bond0
...
None 172.29.4.244 None nova bond0
None 172.29.4.245 None nova bond0
None 172.29.4.246 None nova bond0
None 172.29.4.247 None nova bond0
None 172.29.4.248 None nova bond0
None 172.29.4.249 None nova bond0
None 172.29.4.250 None nova bond0
None 172.29.4.251 None nova bond0
None 172.29.4.252 None nova bond0
None 172.29.4.253 None nova bond0
None 172.29.4.254 None nova bond0

Re-running again

Run 1, Num Errors: 0, Num Active: 30
Run 2, Num Errors: 2, Num Active: 28
Run 3, Num Errors: 30, Num Active: 0
Run 4, Num Errors: 30, Num Active: 0
Run 5, Num Errors: 30, Num Active: 0
Run 6, Num Errors: 30, Num Active: 0
Run 7, Num Errors: 30, Num Active: 0
Run 8, Num Errors: 30, Num Active: 0

when running

sudo nova-manage floating list

b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.250 Missing instance 8974 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.251 Missing instance 8977 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.252 Missing instance 8980 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.253 Missing instance 8983 nova bond0
b3ecef20bbb842929cee0d7523a0b2e4 172.29.3.254 Missing instance 8986 nova bond0

all now show as Missing.

/var/log/nova/nova-network.log

012-06-26 12:57:51 ERROR nova.rpc.amqp [req-f4557acc-e19e-4289-8216-a5dc15623ce6 cb704aa5159b4aa887f17f0349a7f7c9 b3ecef20bbb842929cee0d7523a0b2e4] Exception during message handling
2012-06-26 12:57:51 TRACE nova.rpc.amqp Traceback (most recent call last):
2012-06-26 12:57:51 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data
2012-06-26 12:57:51 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args)
2012-06-26 ...

Read more...

Revision history for this message
Vish Ishaya (vishvananda) wrote :

So if I understand the post above, the poll option fixes the issue, so it is a race condition on deleting an instance that has not finished spawning that causes the ip to leak?

Revision history for this message
Ryan Tidwell (ryan-tidwell) wrote :

Seeing incomplete modifications of the floating_ips table when deleting long-lived instances in a multi-host, auto_assign_floating_ip environment.

Here is record in the floating_ips table for the floating IP in question AFTER terminating the associated instance
"2012-06-20 21:46:30.253886";"2012-06-26 21:57:29.50243";"";FALSE;"105";"16.125.108.105";;"d7127ed3027044c4942fc2f94549489e";"";TRUE;"nova";"eth0"

The project_id column is not properly updated, nor is the auto_assigned column. The network manager code looks like it should be nulling out project_id, host, and fixed_ip_id, as well as setting auto_assigned to False. fixed_ip_id and host are updated properly.

In addition, I was finally was able to locate the related traceback that is logged when this occurs:

2012-06-26 15:57:29 ERROR nova.rpc.amqp [req-d094a9f3-a0b6-4d8e-adf7-6d9e8eb5117f e4a51bc329a74001806ce7820b784313 d7127ed3027044c4942fc2f94549489e] Exception during message handling
2012-06-26 15:57:29 TRACE nova.rpc.amqp Traceback (most recent call last):
2012-06-26 15:57:29 TRACE nova.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/rpc/amqp.py", line 253, in _process_data
2012-06-26 15:57:29 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args)
2012-06-26 15:57:29 TRACE nova.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/network/manager.py", line 257, in wrapped
2012-06-26 15:57:29 TRACE nova.rpc.amqp return func(self, context, *args, **kwargs)
2012-06-26 15:57:29 TRACE nova.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/network/manager.py", line 374, in deallocate_for_instance
2012-06-26 15:57:29 TRACE nova.rpc.amqp affect_auto_assigned=True)
2012-06-26 15:57:29 TRACE nova.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/network/manager.py", line 257, in wrapped
2012-06-26 15:57:29 TRACE nova.rpc.amqp return func(self, context, *args, **kwargs)
2012-06-26 15:57:29 TRACE nova.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/network/manager.py", line 425, in deallocate_floating_ip
2012-06-26 15:57:29 TRACE nova.rpc.amqp raise exception.FloatingIpAssociated(address=floating_address)
2012-06-26 15:57:29 TRACE nova.rpc.amqp FloatingIpAssociated: Floating ip 16.125.108.105 is associated.

Changed in nova:
assignee: nobody → Vish Ishaya (vishvananda)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/9040

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/essex)

Fix proposed to branch: stable/essex
Review: https://review.openstack.org/9041

Revision history for this message
Vish Ishaya (vishvananda) wrote :

I actually think that this issue was fixed in this review:

https://review.openstack.org/#/c/8339/

I've made one additional change that could be causing issues and pulled the relevent changes from the other patch into a backport.

Revision history for this message
Vish Ishaya (vishvananda) wrote :

If you could try applying the change here and verify that it fixes the issue, that would be amazing:

https://review.openstack.org/9041

Revision history for this message
Thierry Carrez (ttx) wrote :

FixedIp variant: bug 1014769

Changed in nova:
milestone: none → folsom-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/9040
Committed: http://github.com/openstack/nova/commit/c689891226baa8877f6fb7b3c468b06972068847
Submitter: Jenkins
Branch: master

commit c689891226baa8877f6fb7b3c468b06972068847
Author: Vishvananda Ishaya <email address hidden>
Date: Tue Jun 26 17:56:21 2012 -0700

    Avoid casts in network manager to prevent races

     * Part of fix for bug 1017418

    Change-Id: I3251f02fc5fcb8305f12b6b8a9644232b9146db2

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Thierry Carrez (ttx) wrote :

Vish: do you agree that this is a valid attack scenario and therefore we should consider this a vulnerability ? How about the FixedIp variant on bug 1014769 ?

Revision history for this message
Kevin Jackson (kevin-linuxservices) wrote : Re: [Bug 1017418] Re: NoMoreFloatingIps: Zero floating ips available after repeatedly creating and destroying instances over time

Cheers Vish, I'm just playing catch up here - will look at this today.

Kev

On 27 June 2012 02:05, Vish Ishaya <email address hidden> wrote:

> If you could try applying the change here and verify that it fixes the
> issue, that would be amazing:
>
> https://review.openstack.org/9041
>
>
--
Kevin Jackson
@itarchitectkev

Revision history for this message
Vish Ishaya (vishvananda) wrote :

ttx: I think they are both high priority, but I don't know about security vulnerability. They both only occur under certain configurations that are non-default

Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
importance: High → Critical
milestone: folsom-2 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/essex)

Reviewed: https://review.openstack.org/9041
Committed: http://github.com/openstack/nova/commit/80d302670f8db37b297ea0618cc14851aab6fff5
Submitter: Jenkins
Branch: stable/essex

commit 80d302670f8db37b297ea0618cc14851aab6fff5
Author: Vishvananda Ishaya <email address hidden>
Date: Tue Jun 26 17:56:21 2012 -0700

    Avoid casts in network manager to prevent races

     * Fixes bug 1017418
     * Includes a change to turn the floating ip casts
       into calls which is part of commit
       82599c77346bbefd550ea4ee6c0b13a3df4950af
     * These calls are to prevent race conditions that
       were preventing floating ips from being deallocated
       properly

    Change-Id: I3251f02fc5fcb8305f12b6b8a9644232b9146db2
    (cherry picked from commit c689891226baa8877f6fb7b3c468b06972068847)

Dave Walker (davewalker)
Changed in nova (Ubuntu):
status: New → Fix Released
Changed in nova (Ubuntu Precise):
status: New → Confirmed
Revision history for this message
Adam Gandelman (gandelman-a) wrote : Verification report.

Please find the attached test log from the Ubuntu Server Team's CI infrastructure. As part of the verification process for this bug, Nova has been deployed and configured across multiple nodes using precise-proposed as an installation source. After successful bring-up and configuration of the cluster, a number of exercises and smoke tests have be invoked to ensure the updated package did not introduce any regressions. A number of test iterations were carried out to catch any possible transient errors.

Please Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the Jenkins links in the comments of the relevant upstream code-review(s):

Trunk review: https://review.openstack.org/9040
Stable review: https://review.openstack.org/9041

As per the provisional Micro Release Exception granted to this package by the Technical Board, we hope this contributes toward verification of this update.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Test coverage log.

tags: added: verification-done
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.4 KiB)

This bug was fixed in the package nova - 2012.1.3+stable-20120827-4d2a4afe-0ubuntu1

---------------
nova (2012.1.3+stable-20120827-4d2a4afe-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot, fixes FTBFS in -proposed. (LP: #1041120)
  * Resynchronize with stable/essex (4d2a4afe):
    - [5d63601] Inappropriate exception handling on kvm live/block migration
      (LP: #917615)
    - [ae280ca] Deleted floating ips can cause instance delete to fail
      (LP: #1038266)

nova (2012.1.3+stable-20120824-86fb7362-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot. (LP: #1041120)
  * Dropped, superseded by new snapshot:
    - debian/patches/CVE-2012-3447.patch: [d9577ce]
    - debian/patches/CVE-2012-3371.patch: [25f5bd3]
    - debian/patches/CVE-2012-3360+3361.patch: [b0feaff]
  * Resynchronize with stable/essex (86fb7362):
    - [86fb736] Libvirt driver reports incorrect error when volume-detach fails
      (LP: #1029463)
    - [272b98d] nova delete lxc-instance umounts the wrong rootfs (LP: #971621)
    - [09217ab] Block storage connections are NOT restored on system reboot
      (LP: #1036902)
    - [d9577ce] CVE-2012-3361 not fully addressed (LP: #1031311)
    - [e8ef050] pycrypto is unused and the existing code is potentially insecure
      to use (LP: #1033178)
    - [3b4ac31] cannot umount guestfs (LP: #1013689)
    - [f8255f3] qpid_heartbeat setting in ineffective (LP: #1030430)
    - [413c641] Deallocation of fixed IP occurs before security group refresh
      leading to potential security issue in error / race conditions
      (LP: #1021352)
    - [219c5ca] Race condition in network/deallocate_for_instance() leads to
      security issue (LP: #1021340)
    - [f2bc403] cleanup_file_locks does not remove stale sentinel files
      (LP: #1018586)
    - [4c7d671] Deleting Flavor currently in use by instance creates error
      (LP: #994935)
    - [7e88e39] nova testsuite errors on newer versions of python-boto (e.g.
      2.5.2) (LP: #1027984)
    - [80d3026] NoMoreFloatingIps: Zero floating ips available after repeatedly
      creating and destroying instances over time (LP: #1017418)
    - [4d74631] Launching with source groups under load produces lazy load error
      (LP: #1018721)
    - [08e5128] API 'v1.1/{tenant_id}/os-hosts' does not return a list of hosts
      (LP: #1014925)
    - [801b94a] Restarting nova-compute removes ip packet filters (LP: #1027105)
    - [f6d1f55] instance live migration should create virtual_size disk image
      (LP: #977007)
    - [4b89b4f] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [6e873bc] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [7b215ed] Use default qemu-img cluster size in libvirt connection driver
    - [d3a87a2] Listing flavors with marker set returns 400 (LP: #956096)
    - [cf6a85a] nova-rootwrap hardcodes paths instead of using
      /sbin:/usr/sbin:/usr/bin:/bin (LP: #1013147)
    - [2efc87c] affinity filters don't work if scheduler_hints is None
      (LP: #1007573)
  ...

Read more...

Changed in nova (Ubuntu Precise):
status: Confirmed → Fix Released
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.