OpenStack Compute (nova)

Bug #1324934
Activity log

Activity log for bug #1324934

Date	Who	What changed	Old value	New value	Message
2014-05-30 13:40:59	Matthew Gilliard	bug			added bug
2014-05-30 13:41:32	Matthew Gilliard	bug task added		nova
2014-05-30 21:32:13	Eugene Nikanorov	neutron: status	New	Incomplete
2014-05-30 21:41:22	Aaron Rosen	bug task deleted	neutron
2014-05-30 21:41:25	Aaron Rosen	tags		network
2014-06-02 06:06:07	haruka tanizawa	nova: status	New	In Progress
2014-06-02 08:00:29	Matthew Gilliard	description	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances.	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; +----------------+--------------------------------------+-----+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name\| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+-----+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 27772649216983 \| 00021f10-dcd7-4198-ae2b-d18b9410eb44 \| \| a06ab43e-585f-4c32-8925-c903d79a5aaa \| fa:16:3e:66:85:10 \| 1 \| ACTIVE \| 2f4e1412-6aed-4e28-ab0d-543a579a9796 \| compute:az3 \| \| 44971349524446 \| 0027ffa0-97a3-4ad6-af75-36d6d0240ece \| \| 28200352-b51d-437d-a62b-0a47c1a85326 \| fa:16:3e:d1:a8:05 \| 1 \| ACTIVE \| b7163835-3747-4f3b-8c5f-13ec17848743 \| compute:az1 \| \| 61149338019956 \| 003843bf-09f4-4380-a161-719f01ff072c \| \| 862ed494-2dad-49d8-b2b1-ce5b03da5891 \| fa:16:3e:c5:78:3a \| 1 \| ACTIVE \| 0ba0c1ad-5dac-4f0a-b9a7-9ee387a69dc8 \| compute:az1 \| \| 10578736914191 \| 003cfa8d-8fab-4920-a0f0-7909ab789154 \| \| a8784cc6-ee59-458b-ae39-6bcc081b58a6 \| fa:16:3e:94:b0:e0 \| 1 \| ACTIVE \| 9cce27cf-f39c-432e-ae62-bc345cb6ef20 \| compute:az2 \| \| 60948452113939 \| 0046cc65-67da-4606-a396-4de3e0b733aa \| \| a03b1336-3144-4914-8354-4c1b093bff5c \| fa:16:3e:9c:a2:54 \| 1 \| ACTIVE \| 25f9ebf5-cc5e-456f-9c87-221cfb68c88b \| compute:az1 \| \| 11382141558535 \| 004fa94d-c5cc-4350-8d07-96b2b6fe6ed8 \| \| c2c04cd5-1329-4707-ae23-c22777259702 \| fa:16:3e:27:94:64 \| 1 \| ACTIVE \| 595dcdad-c426-474d-b613-5fe11949be27 \| compute:az2 \| \| 85660834569272 \| 004fb569-2bff-4cf4-b1ee-ab1c7e7af3c5 \| \| d202537d-d436-4226-85f6-70370b1971d1 \| fa:16:3e:01:3d:3f \| 1 \| ACTIVE \| ef9e842a-4080-43ba-afd9-73ff3339f98b \| compute:az1 \| \| 61149338019956 \| 005836cd-7bfc-48e7-a18b-17aea020f80f \| \| 862ed494-2dad-49d8-b2b1-ce5b03da5891 \| fa:16:3e:e5:22:89 \| 1 \| ACTIVE \| 7560590b-0173-40a7-ba94-44d25fd6e8de \| compute:az2 \| \| 22067631723371 \| 006686c9-ce05-4f1f-a649-238158e209f3 \| \| 55171400-916e-422c-9068-1de4ce3ebec2 \| fa:16:3e:d6:02:30 \| 1 \| ACTIVE \| e64ef387-bae1-4321-adaa-9faa9db82c8b \| compute:az2 \| \| 10980441848524 \| 006f1be3-f85a-4dfe-91f3-8540cada795b \| krak\| 85170ed0-a5b1-4e99-a033-56cb05a51d15 \| fa:16:3e:e6:17:e5 \| 1 \| ACTIVE \| 2e5174ec-4255-432e-a3f8-d8ab3a21e9fa \| compute:az1 \| \| 10308385519040 \| 0076afd8-a838-481f-9701-9a40d8eab25b \| \| be9486e2-e179-48c7-bf1b-96e81ff1eb34 \| fa:16:3e:a4:81:0a \| 1 \| ACTIVE \| 9a5f081c-a07b-4309-a9b6-715e1a9be859 \| compute:az2 \| \| 34809866885491 \| 007ff0bf-c940-4a80-88ad-617dd46aba95 \| \| 76a0cdcb-22b2-4678-8466-8c959177c0a8 \| fa:16:3e:de:ad:7b \| 1 \| ACTIVE \| e1a592fa-ee3e-4e02-8537-0eacf54b6d32 \| compute:az2 \| \| 10876451072196 \| 00814bf0-1dfe-4c53-9664-cd69cd1a9213 \| \| 13444eee-2cf7-4051-acff-5347844697f8 \| fa:16:3e:22:09:27 \| 1 \| ACTIVE \| 2e027d7e-1bed-4bcd-a347-b348ae4380ca \| compute:az2 \| \| 10075236746315 \| 00910429-3d7b-4423-8b39-062a32d72c56 \| \| 048fe0df-4ff7-40d5-86f0-2bf9c56550c3 \| fa:16:3e:87:69:4c \| 1 \| ACTIVE \| 6ed40a37-f49f-43e2-9935-86d054fec293 \| compute:az2 \| \| 53278391320577 \| 009b9978-d6a8-4163-bfb7-75cb72c62d10 \| \| 6b98a3a7-60bc-4194-a7f3-32cd337821d3 \| fa:16:3e:e1:24:ed \| 1 \| ACTIVE \| bee69220-2579-4c4b-9f45-5b321dd42de7 \| compute:az2 \| \| 10087549362244 \| 00ad592f-577b-4434-ab7e-9970c06fcef1 \| \| 238df4ac-6140-4ae9-98d2-27db38c9d184 \| fa:16:3e:3e:40:28 \| 1 \| ACTIVE \| fdac090e-eb37-4a81-93ac-2019715a6698 \| compute:az1 \| \| 10409882459003 \| 00aeaf29-b825-4d43-99b5-733fe1287931 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:55:bd:f8 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| \| 61149338019956 \| 00c42ff5-7e0c-4fac-b936-03cdcac46634 \| \| 862ed494-2dad-49d8-b2b1-ce5b03da5891 \| fa:16:3e:94:d4:29 \| 1 \| ACTIVE \| a92aed17-4736-4dc7-81ff-66bed984f1d4 \| compute:az2 \| \| 63404451948086 \| 00cda837-16c4-43ee-ba12-fc61f9395ee8 \| \| 0f126585-45a8-4f5d-9526-0b842dc122c6 \| fa:16:3e:5e:81:9a \| 1 \| ACTIVE \| 21f143c9-84ab-4b3c-a26c-6226c126a423 \| compute:az1 \| \| 10652852967720 \| 00e1c8f7-fc50-4069-a87a-a5578d1a7245 \| \| 7e239c0b-291a-4fcc-b55d-c4bdce0a62a7 \| fa:16:3e:ae:c4:26 \| 1 \| ACTIVE \| e5345d0e-a6f1-4f3a-9b8b-cbff5f9a143 \| compute:az1 \| …... +----------------+--------------------------------------+-----+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ 6226 rows in set (0.02 sec) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to a running instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) But in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.
2014-06-02 08:04:14	Matthew Gilliard	description	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; +----------------+--------------------------------------+-----+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name\| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+-----+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 27772649216983 \| 00021f10-dcd7-4198-ae2b-d18b9410eb44 \| \| a06ab43e-585f-4c32-8925-c903d79a5aaa \| fa:16:3e:66:85:10 \| 1 \| ACTIVE \| 2f4e1412-6aed-4e28-ab0d-543a579a9796 \| compute:az3 \| \| 44971349524446 \| 0027ffa0-97a3-4ad6-af75-36d6d0240ece \| \| 28200352-b51d-437d-a62b-0a47c1a85326 \| fa:16:3e:d1:a8:05 \| 1 \| ACTIVE \| b7163835-3747-4f3b-8c5f-13ec17848743 \| compute:az1 \| \| 61149338019956 \| 003843bf-09f4-4380-a161-719f01ff072c \| \| 862ed494-2dad-49d8-b2b1-ce5b03da5891 \| fa:16:3e:c5:78:3a \| 1 \| ACTIVE \| 0ba0c1ad-5dac-4f0a-b9a7-9ee387a69dc8 \| compute:az1 \| \| 10578736914191 \| 003cfa8d-8fab-4920-a0f0-7909ab789154 \| \| a8784cc6-ee59-458b-ae39-6bcc081b58a6 \| fa:16:3e:94:b0:e0 \| 1 \| ACTIVE \| 9cce27cf-f39c-432e-ae62-bc345cb6ef20 \| compute:az2 \| \| 60948452113939 \| 0046cc65-67da-4606-a396-4de3e0b733aa \| \| a03b1336-3144-4914-8354-4c1b093bff5c \| fa:16:3e:9c:a2:54 \| 1 \| ACTIVE \| 25f9ebf5-cc5e-456f-9c87-221cfb68c88b \| compute:az1 \| \| 11382141558535 \| 004fa94d-c5cc-4350-8d07-96b2b6fe6ed8 \| \| c2c04cd5-1329-4707-ae23-c22777259702 \| fa:16:3e:27:94:64 \| 1 \| ACTIVE \| 595dcdad-c426-474d-b613-5fe11949be27 \| compute:az2 \| \| 85660834569272 \| 004fb569-2bff-4cf4-b1ee-ab1c7e7af3c5 \| \| d202537d-d436-4226-85f6-70370b1971d1 \| fa:16:3e:01:3d:3f \| 1 \| ACTIVE \| ef9e842a-4080-43ba-afd9-73ff3339f98b \| compute:az1 \| \| 61149338019956 \| 005836cd-7bfc-48e7-a18b-17aea020f80f \| \| 862ed494-2dad-49d8-b2b1-ce5b03da5891 \| fa:16:3e:e5:22:89 \| 1 \| ACTIVE \| 7560590b-0173-40a7-ba94-44d25fd6e8de \| compute:az2 \| \| 22067631723371 \| 006686c9-ce05-4f1f-a649-238158e209f3 \| \| 55171400-916e-422c-9068-1de4ce3ebec2 \| fa:16:3e:d6:02:30 \| 1 \| ACTIVE \| e64ef387-bae1-4321-adaa-9faa9db82c8b \| compute:az2 \| \| 10980441848524 \| 006f1be3-f85a-4dfe-91f3-8540cada795b \| krak\| 85170ed0-a5b1-4e99-a033-56cb05a51d15 \| fa:16:3e:e6:17:e5 \| 1 \| ACTIVE \| 2e5174ec-4255-432e-a3f8-d8ab3a21e9fa \| compute:az1 \| \| 10308385519040 \| 0076afd8-a838-481f-9701-9a40d8eab25b \| \| be9486e2-e179-48c7-bf1b-96e81ff1eb34 \| fa:16:3e:a4:81:0a \| 1 \| ACTIVE \| 9a5f081c-a07b-4309-a9b6-715e1a9be859 \| compute:az2 \| \| 34809866885491 \| 007ff0bf-c940-4a80-88ad-617dd46aba95 \| \| 76a0cdcb-22b2-4678-8466-8c959177c0a8 \| fa:16:3e:de:ad:7b \| 1 \| ACTIVE \| e1a592fa-ee3e-4e02-8537-0eacf54b6d32 \| compute:az2 \| \| 10876451072196 \| 00814bf0-1dfe-4c53-9664-cd69cd1a9213 \| \| 13444eee-2cf7-4051-acff-5347844697f8 \| fa:16:3e:22:09:27 \| 1 \| ACTIVE \| 2e027d7e-1bed-4bcd-a347-b348ae4380ca \| compute:az2 \| \| 10075236746315 \| 00910429-3d7b-4423-8b39-062a32d72c56 \| \| 048fe0df-4ff7-40d5-86f0-2bf9c56550c3 \| fa:16:3e:87:69:4c \| 1 \| ACTIVE \| 6ed40a37-f49f-43e2-9935-86d054fec293 \| compute:az2 \| \| 53278391320577 \| 009b9978-d6a8-4163-bfb7-75cb72c62d10 \| \| 6b98a3a7-60bc-4194-a7f3-32cd337821d3 \| fa:16:3e:e1:24:ed \| 1 \| ACTIVE \| bee69220-2579-4c4b-9f45-5b321dd42de7 \| compute:az2 \| \| 10087549362244 \| 00ad592f-577b-4434-ab7e-9970c06fcef1 \| \| 238df4ac-6140-4ae9-98d2-27db38c9d184 \| fa:16:3e:3e:40:28 \| 1 \| ACTIVE \| fdac090e-eb37-4a81-93ac-2019715a6698 \| compute:az1 \| \| 10409882459003 \| 00aeaf29-b825-4d43-99b5-733fe1287931 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:55:bd:f8 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| \| 61149338019956 \| 00c42ff5-7e0c-4fac-b936-03cdcac46634 \| \| 862ed494-2dad-49d8-b2b1-ce5b03da5891 \| fa:16:3e:94:d4:29 \| 1 \| ACTIVE \| a92aed17-4736-4dc7-81ff-66bed984f1d4 \| compute:az2 \| \| 63404451948086 \| 00cda837-16c4-43ee-ba12-fc61f9395ee8 \| \| 0f126585-45a8-4f5d-9526-0b842dc122c6 \| fa:16:3e:5e:81:9a \| 1 \| ACTIVE \| 21f143c9-84ab-4b3c-a26c-6226c126a423 \| compute:az1 \| \| 10652852967720 \| 00e1c8f7-fc50-4069-a87a-a5578d1a7245 \| \| 7e239c0b-291a-4fcc-b55d-c4bdce0a62a7 \| fa:16:3e:ae:c4:26 \| 1 \| ACTIVE \| e5345d0e-a6f1-4f3a-9b8b-cbff5f9a143 \| compute:az1 \| …... +----------------+--------------------------------------+-----+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ 6226 rows in set (0.02 sec) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to a running instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) But in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) But in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.
2014-06-02 08:05:50	Matthew Gilliard	bug task added		neutron
2014-06-02 08:15:15	Matthew Gilliard	description	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) But in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) But in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.
2014-06-02 08:19:58	Matthew Gilliard	description	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) But in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually, in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.
2014-06-02 08:24:50	Matthew Gilliard	description	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually, in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: nova-compute.log: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later, nova-compute.log: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually the port is present in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Our guess is that the "Failed to create port" is spuriously caused by the neutronclient's connection being dropped. In fact the port is being created, but it takes some time, and during that time neutron reports that there are no ports on that instance, so nothing is cleaned up when the instance is deleted. Then, the port details are actually written to the db and the port is leaked. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.
2014-06-02 08:43:07	Matthew Gilliard	description	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: nova-compute.log: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later, nova-compute.log: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually the port is present in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Our guess is that the "Failed to create port" is spuriously caused by the neutronclient's connection being dropped. In fact the port is being created, but it takes some time, and during that time neutron reports that there are no ports on that instance, so nothing is cleaned up when the instance is deleted. Then, the port details are actually written to the db and the port is leaked. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state.	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: nova-compute.log: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later, nova-compute.log: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually the port is present in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Our guess is that the "Failed to create port" is spuriously caused by the neutronclient's connection being dropped. In fact the port is being created, but it takes some time, and during that time neutron reports that there are no ports on that instance, so nothing is cleaned up when the instance is deleted. Then, the port details are actually written to the db and the port is leaked. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state. Solutions ========= Neither nova nor neutron has enough information to determine which ports are leaked - so a periodic task in either of those two services would not be possible. A user can free up their ports with a script like https://gist.github.com/moorryan/93fa4be65fc5ea60b3ed - and I think an operator could do the same. But there is a risk with this script that instances/ports which are currently being created could be wrongly identified. So care is needed. Neutron could help by writing ports to db quicker? Or synchronizing get_ports calls with create_port (nb I don't know the neutron codebase to know how feasible this is).
2014-06-02 08:43:44	Matthew Gilliard	summary	port leak when instance is deleted before ACTIVE	Neutron port leak when connection is dropped during port create in instance boot.
2014-06-02 08:44:40	Matthew Gilliard	description	If an instance is deleted after it has a neutron port allocated but before it has reached ACTIVE state, sometimes the port is not deleted but the instance is. These orphan ports count toward the user's quota so if it happens enough times the user will be unable to boot instances. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: nova-compute.log: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later, nova-compute.log: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually the port is present in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Our guess is that the "Failed to create port" is spuriously caused by the neutronclient's connection being dropped. In fact the port is being created, but it takes some time, and during that time neutron reports that there are no ports on that instance, so nothing is cleaned up when the instance is deleted. Then, the port details are actually written to the db and the port is leaked. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state. Solutions ========= Neither nova nor neutron has enough information to determine which ports are leaked - so a periodic task in either of those two services would not be possible. A user can free up their ports with a script like https://gist.github.com/moorryan/93fa4be65fc5ea60b3ed - and I think an operator could do the same. But there is a risk with this script that instances/ports which are currently being created could be wrongly identified. So care is needed. Neutron could help by writing ports to db quicker? Or synchronizing get_ports calls with create_port (nb I don't know the neutron codebase to know how feasible this is).	Sometimes an instance fails to boot because the call to neutron to allocate a port fails. However, we see cases where this happens but the port is actually created and allocated to the now-deleted instance. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: nova-compute.log: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later, nova-compute.log: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually the port is present in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Our guess is that the "Failed to create port" is spuriously caused by the neutronclient's connection being dropped. In fact the port is being created, but it takes some time, and during that time neutron reports that there are no ports on that instance, so nothing is cleaned up when the instance is deleted. Then, the port details are actually written to the db and the port is leaked. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state. Solutions ========= Neither nova nor neutron has enough information to determine which ports are leaked - so a periodic task in either of those two services would not be possible. A user can free up their ports with a script like https://gist.github.com/moorryan/93fa4be65fc5ea60b3ed - and I think an operator could do the same. But there is a risk with this script that instances/ports which are currently being created could be wrongly identified. So care is needed. Neutron could help by writing ports to db quicker? Or synchronizing get_ports calls with create_port (nb I don't know the neutron codebase to know how feasible this is).
2014-06-02 08:45:00	Matthew Gilliard	description	Sometimes an instance fails to boot because the call to neutron to allocate a port fails. However, we see cases where this happens but the port is actually created and allocated to the now-deleted instance. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: nova-compute.log: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later, nova-compute.log: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually the port is present in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Our guess is that the "Failed to create port" is spuriously caused by the neutronclient's connection being dropped. In fact the port is being created, but it takes some time, and during that time neutron reports that there are no ports on that instance, so nothing is cleaned up when the instance is deleted. Then, the port details are actually written to the db and the port is leaked. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state. Solutions ========= Neither nova nor neutron has enough information to determine which ports are leaked - so a periodic task in either of those two services would not be possible. A user can free up their ports with a script like https://gist.github.com/moorryan/93fa4be65fc5ea60b3ed - and I think an operator could do the same. But there is a risk with this script that instances/ports which are currently being created could be wrongly identified. So care is needed. Neutron could help by writing ports to db quicker? Or synchronizing get_ports calls with create_port (nb I don't know the neutron codebase to know how feasible this is).	Sometimes an instance fails to boot because the call to neutron to allocate a port fails. However, we see cases where this happens but the port is actually created and allocated to the now-deleted instance. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') How did we run out of ports? Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: nova-compute.log: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later, nova-compute.log: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually the port is present in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Our guess is that the "Failed to create port" is spuriously caused by the neutronclient's connection being dropped. In fact the port is being created, but it takes some time, and during that time neutron reports that there are no ports on that instance, so nothing is cleaned up when the instance is deleted. Then, the port details are actually written to the db and the port is leaked. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state. Solutions ========= Neither nova nor neutron has enough information to determine which ports are leaked - so a periodic task in either of those two services would not be possible. A user can free up their ports with a script like https://gist.github.com/moorryan/93fa4be65fc5ea60b3ed - and I think an operator could do the same. But there is a risk with this script that instances/ports which are currently being created could be wrongly identified. So care is needed. Neutron could help by writing ports to db quicker? Or synchronizing get_ports calls with create_port (nb I don't know the neutron codebase to know how feasible this is).
2014-06-02 08:48:17	Matthew Gilliard	description	Sometimes an instance fails to boot because the call to neutron to allocate a port fails. However, we see cases where this happens but the port is actually created and allocated to the now-deleted instance. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') How did we run out of ports? Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: nova-compute.log: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later, nova-compute.log: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually the port is present in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Our guess is that the "Failed to create port" is spuriously caused by the neutronclient's connection being dropped. In fact the port is being created, but it takes some time, and during that time neutron reports that there are no ports on that instance, so nothing is cleaned up when the instance is deleted. Then, the port details are actually written to the db and the port is leaked. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state. Solutions ========= Neither nova nor neutron has enough information to determine which ports are leaked - so a periodic task in either of those two services would not be possible. A user can free up their ports with a script like https://gist.github.com/moorryan/93fa4be65fc5ea60b3ed - and I think an operator could do the same. But there is a risk with this script that instances/ports which are currently being created could be wrongly identified. So care is needed. Neutron could help by writing ports to db quicker? Or synchronizing get_ports calls with create_port (nb I don't know the neutron codebase to know how feasible this is).	Sometimes an instance fails to boot because the call to neutron to allocate a port fails. However, we see cases where this happens but the port is actually created and allocated to the now-deleted instance. The same problem has been reported regarding hpcloud internal monitoring tools, and the openstack-infra nodepool tenant. There seems to be a port leak. Evidence ======== Sometimes instances fail to boot with the following error: 2014-05-27 00:09:23 ERROR : [NOV58] NovaServers.add Failed: OverLimit - Maximum number of ports exceeded (HTTP 413) (Request-ID: req-e05525c3-0876-4da4-8a81-8dcc3432b418) args('('SLAM_META_m1_az1_00_09_NC_TEMP', u'8c096c29-a666-4b82-99c4-c77dc70cfb40', u'100', 'metastmkey_m1_az1', 'metastm_m1_az1', u'ee7d6d37-d855-4d30-a67b-0d88a03e72fc', 'az1'),{}') How did we run out of ports? Investigating further, starting with the neutron database: mysql> select * from ports where device_owner like 'comput%'; This gives a table which shows neutron ports and the instance uuids that they are allocated to (example: http://paste.openstack.org/show/82394/) Matching neutron's `device_id` with `uuid` in nova's instances table, we found that approximately 50% of the ports were allocated to instances that had been deleted. As far as we know this must be a bug, as there is no way to create a port without linking it to an instance, and deleting an instance should delete its ports atomically. The effect is that the user has unused ports counting toward their port quota, which will prevent them from booting instances when the quota is fully allocated. Logs ==== The nova-compute log which relates to an instance that is failing to boot because of port starvation is not interesting here. However we have the case where an instance fails to boot for "Neutron error creating port", but a port is actually created: nova-compute.log: 2014-05-28 08:08:53.413 16699 DEBUG neutronclient.client [-] throwing ConnectionFailed : [Errno 104] Connection reset by peer _cs_request /usr/lib/python2.7/dist-packages/neutronclient/client.py:153 2014-05-28 08:08:53.417 16699 ERROR nova.network.neutronv2.api [-] [instance: 2e479806-d13e-4d11-81c1-cc2244a26ef7] Neutron error creating port on network 63657422-b84f-4d2d-b7d2-765ac560546b (fuller section of log: http://paste.openstack.org/show/82392/) 0.2s later, nova-compute.log: 2014-05-28 08:08:53.664 16699 DEBUG neutronclient.client [-] RESP:{'date': 'Wed, 28 May 2014 08:08:53 GMT', 'status': '200', 'content-length': '13', 'content-type': 'application/json', 'content-location': 'https://region-b.geo-1.network-internal.hpcloudsvc.com/v2.0/ports.json?tenant_id=10409882459003&device_id=2e479806-d13e-4d11-81c1-cc2244a26ef7'} {"ports": []} (this is repeated once more after 0.2s longer. Slightly longer log section: http://paste.openstack.org/show/82395/) But eventually the port is present in the neutron database: +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| tenant_id \| id \| name \| network_id \| mac_address \| admin_state_up \| status \| device_id \| device_owner \| +----------------+--------------------------------------+--------------------+--------------------------------------+-------------------+----------------+--------+--------------------------------------+--------------+ \| 10409882459003 \| 916cba73-8925-45a2-80e9-6e9d03e602c8 \| \| 63657422-b84f-4d2d-b7d2-765ac560546b \| fa:16:3e:a8:7d:14 \| 1 \| ACTIVE \| 2e479806-d13e-4d11-81c1-cc2244a26ef7 \| compute:az2 \| It looks like this port has been leaked by neutron. Our guess is that the "Failed to create port" is spuriously caused by the neutronclient's connection being dropped. In fact the port is being created, but it takes some time, and during that time neutron reports that there are no ports on that instance, so nothing is cleaned up when the instance is deleted. Then, the port details are actually written to the db and the port is leaked. Openstack-infra's nodepool was unable to boot instances recently, and found several hundred ports in this state. Solutions ========= Neither nova nor neutron has enough information to determine which ports are leaked - so a periodic task in either of those two services would not be possible. A user can free up their ports with a script like https://gist.github.com/moorryan/93fa4be65fc5ea60b3ed - and I think an operator could do the same. But there is a risk with this script that instances/ports which are currently being created could be wrongly identified. So care is needed. Neutron synchronizing get_ports calls with create_port (nb I don't know the neutron codebase to know how feasible this is).
2014-06-02 11:53:17	Sudipta Biswas	bug			added subscriber Sudipta Biswas
2014-06-02 15:51:34	Brian Haley	bug			added subscriber Brian Haley
2014-06-02 20:03:05	Aaron Rosen	nova: importance	Undecided	High
2014-06-02 20:03:10	Aaron Rosen	nova: assignee		Aaron Rosen (arosen)
2014-06-05 15:45:00	Mark McLoughlin	tags	network	icehouse-backport-potential network
2014-06-05 21:58:33	Tomoko Inoue	bug			added subscriber Tomoko Inoue
2014-06-06 02:43:50	Tiantian Gao	bug			added subscriber Tiantian Gao
2014-06-08 12:54:38	Eugene Nikanorov	neutron: status	New	Incomplete
2014-06-09 14:43:55	OpenStack Infra	neutron: status	Incomplete	In Progress
2014-06-09 14:43:55	OpenStack Infra	neutron: assignee		Matthew Gilliard (matthew-gilliard-u)
2014-06-10 14:30:30	OpenStack Infra	nova: status	In Progress	Fix Committed
2014-06-11 13:42:52	Thierry Carrez	nova: status	Fix Committed	Fix Released
2014-06-11 13:42:52	Thierry Carrez	nova: milestone		juno-1
2014-06-17 14:55:47	Matt Riedemann	tags	icehouse-backport-potential network	network
2014-07-01 09:47:05	Matthew Gilliard	neutron: status	In Progress	Invalid
2014-07-03 02:24:07	OpenStack Infra	tags	network	in-stable-icehouse network
2014-07-09 10:21:33	Matthew Gilliard	neutron: assignee	Matthew Gilliard (matthew-gilliard-u)
2014-08-07 12:52:24	Chuck Short	nominated for series		nova/icehouse
2014-08-07 12:52:24	Chuck Short	bug task added		nova/icehouse
2014-08-07 12:52:39	Chuck Short	nova/icehouse: milestone		2014.1.2
2014-08-07 12:52:46	Chuck Short	nova/icehouse: status	New	Fix Committed
2014-10-16 08:28:12	Thierry Carrez	nova: milestone	juno-1	2014.2
2014-11-18 14:23:32	Matthew Gilliard	nova: assignee	Aaron Rosen (arosen)	Matthew Gilliard (matthew-gilliard-u)
2014-11-19 11:29:05	Dr. Jens Harbott	bug			added subscriber Dr. Jens Rosenboom
2016-02-19 20:18:10	Sean Dague	nova/icehouse: status	Fix Committed	Fix Released