Launching multiple VMs fails over 63 instances

Bug #1372049 reported by Yair Fried
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Dan Smith
Icehouse
Fix Released
High
Ihar Hrachyshka
Juno
Fix Released
High
Ihar Hrachyshka
neutron
Invalid
Undecided
Unassigned
oslo.messaging
Won't Fix
Undecided
Ihar Hrachyshka

Bug Description

RHEL-7.0
Icehouse
All-In-One

Booting 63 VMs at once (with "num-instances" attribute) works fine.
Setup is able to support up to 100 VMs booted in ~50 bulks.

Booting 100 VMs at once, without Neutron network, so no network for the VMs, works fine.

Booting 64 (and more) VMs boots only 63 VMs. any of the VMs over 63 are booted in ERROR state with details: VirtualInterfaceCreateException: Virtual Interface creation failed
Failed VM's port at DOWN state

Details:
After the initial boot commands goes through, all CPU usage goes down (no neutron/nova CPU consumption) untll nova's vif_plugging_timeout is reached. at which point 1 (= #num_instances - 63) VM is set to ERROR, and the rest of the VMs reach active state.

Guess: seems like neutron is going into some deadlock until some of the load is reduced by vif_plugging_timeout

disabling neutorn-nova port notifications allows all VMs to be created.

Notes: this is recreated also with multiple Compute nodes, and also multiple neutron RPC/API workers

Recreate:
set nova/neutron quota's to "-1"
make sure neutorn-nova port notifications is ON on both neutron and nova conf files
create a network in your tenant

boot more than 64 VMs

nova boot --flavor 42 test_VM --image cirros --num-instances 64

[yfried@yfried-mobl-rh ~(keystone_demo)]$ nova list
+--------------------------------------+----------------------------------------------+--------+------------+-------------+-------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+----------------------------------------------+--------+------------+-------------+-------------------------+
| 02d7b680-efd8-4291-8d56-78b43c9451cb | test_VM-02d7b680-efd8-4291-8d56-78b43c9451cb | ACTIVE | - | Running | demo_private=10.0.0.156 |
| 05fd6dd2-6b0e-4801-9219-ae4a77a53cfd | test_VM-05fd6dd2-6b0e-4801-9219-ae4a77a53cfd | ACTIVE | - | Running | demo_private=10.0.0.150 |
| 09131f19-5e83-4a40-a900-ffca24a8c775 | test_VM-09131f19-5e83-4a40-a900-ffca24a8c775 | ACTIVE | - | Running | demo_private=10.0.0.160 |
| 0d3be93b-73d3-4995-913c-03a4b80ad37e | test_VM-0d3be93b-73d3-4995-913c-03a4b80ad37e | ACTIVE | - | Running | demo_private=10.0.0.164 |
| 0fcadae4-768c-44a1-9e1c-ac371d1803f9 | test_VM-0fcadae4-768c-44a1-9e1c-ac371d1803f9 | ACTIVE | - | Running | demo_private=10.0.0.202 |
| 11a87db1-5b15-4cad-a749-5d53e2fd8194 | test_VM-11a87db1-5b15-4cad-a749-5d53e2fd8194 | ACTIVE | - | Running | demo_private=10.0.0.201 |
| 147e4a6b-a77c-46ef-b8fd-d65479ccb8ca | test_VM-147e4a6b-a77c-46ef-b8fd-d65479ccb8ca | ACTIVE | - | Running | demo_private=10.0.0.147 |
| 1c5b5f40-d2f3-4cc7-9f80-f5df8de918b9 | test_VM-1c5b5f40-d2f3-4cc7-9f80-f5df8de918b9 | ACTIVE | - | Running | demo_private=10.0.0.187 |
| 1d0b7210-f5a0-4827-b338-2014e8f21341 | test_VM-1d0b7210-f5a0-4827-b338-2014e8f21341 | ACTIVE | - | Running | demo_private=10.0.0.165 |
| 1df564f6-5aac-4ac8-8361-bd44c305332b | test_VM-1df564f6-5aac-4ac8-8361-bd44c305332b | ACTIVE | - | Running | demo_private=10.0.0.145 |
| 2031945f-6305-4cdc-939f-5f02171f82b2 | test_VM-2031945f-6305-4cdc-939f-5f02171f82b2 | ACTIVE | - | Running | demo_private=10.0.0.149 |
| 256ff0ed-0e56-47e3-8b69-68006d658ad6 | test_VM-256ff0ed-0e56-47e3-8b69-68006d658ad6 | ACTIVE | - | Running | demo_private=10.0.0.177 |
| 2b7256a8-c04a-42cf-9c19-5836b585c0f5 | test_VM-2b7256a8-c04a-42cf-9c19-5836b585c0f5 | ACTIVE | - | Running | demo_private=10.0.0.180 |
| 2daac227-e0c9-4259-8e8e-b8a6e93b45e3 | test_VM-2daac227-e0c9-4259-8e8e-b8a6e93b45e3 | ACTIVE | - | Running | demo_private=10.0.0.191 |
| 425c170f-a450-440d-b9ba-0408d7c69b25 | test_VM-425c170f-a450-440d-b9ba-0408d7c69b25 | ACTIVE | - | Running | demo_private=10.0.0.169 |
| 461fcce3-96ae-4462-ab65-fb63f3552703 | test_VM-461fcce3-96ae-4462-ab65-fb63f3552703 | ACTIVE | - | Running | demo_private=10.0.0.179 |
| 46a9965d-6511-44a3-ab71-a87767cda759 | test_VM-46a9965d-6511-44a3-ab71-a87767cda759 | ACTIVE | - | Running | demo_private=10.0.0.199 |
| 4c4ce671-5e84-4ccd-8496-02c0723178ec | test_VM-4c4ce671-5e84-4ccd-8496-02c0723178ec | ACTIVE | - | Running | demo_private=10.0.0.163 |
| 4c941954-e593-4da5-a7c6-37e9490e87ee | test_VM-4c941954-e593-4da5-a7c6-37e9490e87ee | ACTIVE | - | Running | demo_private=10.0.0.158 |
| 4fa89309-cb8f-4bcd-ae13-68ade441ff02 | test_VM-4fa89309-cb8f-4bcd-ae13-68ade441ff02 | ACTIVE | - | Running | demo_private=10.0.0.190 |
| 547d6f97-31db-4718-afb3-4b6123d71076 | test_VM-547d6f97-31db-4718-afb3-4b6123d71076 | ACTIVE | - | Running | demo_private=10.0.0.159 |
| 592d3618-a797-4ec9-8681-fdec498e83fa | test_VM-592d3618-a797-4ec9-8681-fdec498e83fa | ACTIVE | - | Running | demo_private=10.0.0.146 |
| 5e82af64-3676-4ad5-8c67-090de0d0850f | test_VM-5e82af64-3676-4ad5-8c67-090de0d0850f | ACTIVE | - | Running | demo_private=10.0.0.166 |
| 603209df-857f-41fa-8ee4-be65c16bcff2 | test_VM-603209df-857f-41fa-8ee4-be65c16bcff2 | ACTIVE | - | Running | demo_private=10.0.0.198 |
| 648df862-9d09-4371-a1ac-41e7011ce9ed | test_VM-648df862-9d09-4371-a1ac-41e7011ce9ed | ACTIVE | - | Running | demo_private=10.0.0.173 |
| 648ed802-44a9-4115-8a56-d18039a4f01f | test_VM-648ed802-44a9-4115-8a56-d18039a4f01f | ACTIVE | - | Running | demo_private=10.0.0.171 |
| 65fa15f9-d680-4367-9e7a-01f2d8ecf18c | test_VM-65fa15f9-d680-4367-9e7a-01f2d8ecf18c | ACTIVE | - | Running | demo_private=10.0.0.196 |
| 6c47f24b-981e-4ff0-b332-fa4033656905 | test_VM-6c47f24b-981e-4ff0-b332-fa4033656905 | ACTIVE | - | Running | demo_private=10.0.0.194 |
| 6e261e3f-eedc-4fed-acf0-bb2a5b63f848 | test_VM-6e261e3f-eedc-4fed-acf0-bb2a5b63f848 | ACTIVE | - | Running | demo_private=10.0.0.174 |
| 7135b04f-2a9a-4bdd-9c50-1f7161576428 | test_VM-7135b04f-2a9a-4bdd-9c50-1f7161576428 | ACTIVE | - | Running | demo_private=10.0.0.157 |
| 7393c4e6-1cc7-479f-930c-f8b60ed80fb2 | test_VM-7393c4e6-1cc7-479f-930c-f8b60ed80fb2 | ACTIVE | - | Running | demo_private=10.0.0.154 |
| 79103533-afe9-4004-adcb-8b7245e8e463 | test_VM-79103533-afe9-4004-adcb-8b7245e8e463 | ACTIVE | - | Running | demo_private=10.0.0.184 |
| 79efdb4e-d0f2-49bc-ad8f-19db3ef4e8f8 | test_VM-79efdb4e-d0f2-49bc-ad8f-19db3ef4e8f8 | ACTIVE | - | Running | demo_private=10.0.0.144 |
| 7a043e89-d385-4a30-ac4a-265a6ee25492 | test_VM-7a043e89-d385-4a30-ac4a-265a6ee25492 | ACTIVE | - | Running | demo_private=10.0.0.142 |
| 8ad55399-76a1-4ce0-8c3b-e3ce79ece983 | test_VM-8ad55399-76a1-4ce0-8c3b-e3ce79ece983 | ACTIVE | - | Running | demo_private=10.0.0.168 |
| 8ffc713e-d39e-4155-bbdc-330c0e5626ff | test_VM-8ffc713e-d39e-4155-bbdc-330c0e5626ff | ACTIVE | - | Running | demo_private=10.0.0.143 |
| 951443b7-65f7-4822-b323-8321c059c034 | test_VM-951443b7-65f7-4822-b323-8321c059c034 | ACTIVE | - | Running | demo_private=10.0.0.161 |
| 9b6094fe-4529-4b3e-8c7e-ec7c1dc02495 | test_VM-9b6094fe-4529-4b3e-8c7e-ec7c1dc02495 | ACTIVE | - | Running | demo_private=10.0.0.151 |
| a431c16f-bf1e-424c-88f9-b0ba2a683f99 | test_VM-a431c16f-bf1e-424c-88f9-b0ba2a683f99 | ACTIVE | - | Running | demo_private=10.0.0.167 |
| a4e10145-19ad-4f6c-86a2-43af5976b204 | test_VM-a4e10145-19ad-4f6c-86a2-43af5976b204 | ACTIVE | - | Running | demo_private=10.0.0.148 |
| a5815fed-8b03-47bb-8214-e3984330e4d3 | test_VM-a5815fed-8b03-47bb-8214-e3984330e4d3 | ACTIVE | - | Running | demo_private=10.0.0.183 |
| b54e91fb-804a-4044-8a59-37f67737e9a8 | test_VM-b54e91fb-804a-4044-8a59-37f67737e9a8 | ACTIVE | - | Running | demo_private=10.0.0.203 |
| b745e820-892f-46e7-a65f-0cbe5664b0f4 | test_VM-b745e820-892f-46e7-a65f-0cbe5664b0f4 | ACTIVE | - | Running | demo_private=10.0.0.172 |
| be60b489-07d3-4ec1-a6e7-79b459d631c5 | test_VM-be60b489-07d3-4ec1-a6e7-79b459d631c5 | ACTIVE | - | Running | demo_private=10.0.0.162 |
| c1320742-b8af-4623-978a-56e964083f82 | test_VM-c1320742-b8af-4623-978a-56e964083f82 | ACTIVE | - | Running | demo_private=10.0.0.197 |
| c1973e90-927f-48f6-b979-73b300915483 | test_VM-c1973e90-927f-48f6-b979-73b300915483 | ACTIVE | - | Running | demo_private=10.0.0.193 |
| c5ee5ebd-39dd-442f-8fb8-5ea8e10bf86b | test_VM-c5ee5ebd-39dd-442f-8fb8-5ea8e10bf86b | ACTIVE | - | Running | demo_private=10.0.0.195 |
| c98a67f3-06b2-455f-af2a-4325493589ce | test_VM-c98a67f3-06b2-455f-af2a-4325493589ce | ACTIVE | - | Running | demo_private=10.0.0.141 |
| c9a845f0-9b7d-4335-8ce9-15378cb65ba8 | test_VM-c9a845f0-9b7d-4335-8ce9-15378cb65ba8 | ACTIVE | - | Running | demo_private=10.0.0.153 |
| c9cd6390-955f-412c-852d-0b84738e6c9a | test_VM-c9cd6390-955f-412c-852d-0b84738e6c9a | ACTIVE | - | Running | demo_private=10.0.0.189 |
| cf4c577c-5002-4e97-a132-b0d36a0d06ae | test_VM-cf4c577c-5002-4e97-a132-b0d36a0d06ae | ACTIVE | - | Running | demo_private=10.0.0.170 |
| d0e675cb-61d6-46a4-979f-e5ecb10d1aec | test_VM-d0e675cb-61d6-46a4-979f-e5ecb10d1aec | ACTIVE | - | Running | demo_private=10.0.0.178 |
| d32315a7-5522-4701-8e71-a471100c09b8 | test_VM-d32315a7-5522-4701-8e71-a471100c09b8 | ACTIVE | - | Running | demo_private=10.0.0.186 |
| d79c15ff-2f2e-48e5-94ee-5ab4ae7dce3e | test_VM-d79c15ff-2f2e-48e5-94ee-5ab4ae7dce3e | ACTIVE | - | Running | demo_private=10.0.0.155 |
| da1f1f00-0fc8-4d13-ad70-5bfafe5c804d | test_VM-da1f1f00-0fc8-4d13-ad70-5bfafe5c804d | ACTIVE | - | Running | demo_private=10.0.0.176 |
| dabe4aea-b33a-4fb9-852d-2747ab5c5e66 | test_VM-dabe4aea-b33a-4fb9-852d-2747ab5c5e66 | ACTIVE | - | Running | demo_private=10.0.0.181 |
| de2b28ce-c6b1-4720-8dea-6145a0ae9a75 | test_VM-de2b28ce-c6b1-4720-8dea-6145a0ae9a75 | ACTIVE | - | Running | demo_private=10.0.0.175 |
| e2bb90ef-39cb-4c20-8e20-07181dcacd7d | test_VM-e2bb90ef-39cb-4c20-8e20-07181dcacd7d | ACTIVE | - | Running | demo_private=10.0.0.152 |
| e68a48b0-c3e8-41d7-9828-af13304b4d35 | test_VM-e68a48b0-c3e8-41d7-9828-af13304b4d35 | ACTIVE | - | Running | demo_private=10.0.0.200 |
| ea990c15-ff0b-4798-b5f3-782e7af30835 | test_VM-ea990c15-ff0b-4798-b5f3-782e7af30835 | ACTIVE | - | Running | demo_private=10.0.0.185 |
| ec17379c-882f-4507-9347-69812f03922c | test_VM-ec17379c-882f-4507-9347-69812f03922c | ACTIVE | - | Running | demo_private=10.0.0.182 |
| f41ec90c-6d10-4ee5-be7f-96590ec7c8c1 | test_VM-f41ec90c-6d10-4ee5-be7f-96590ec7c8c1 | ERROR | - | NOSTATE | demo_private=10.0.0.140 |
| f754b053-7fe3-4a87-bdf7-bbf9b8216e6a | test_VM-f754b053-7fe3-4a87-bdf7-bbf9b8216e6a | ACTIVE | - | Running | demo_private=10.0.0.188 |
| f757f4ca-7204-432d-8181-5133f73538fb | test_VM-f757f4ca-7204-432d-8181-5133f73538fb | ACTIVE | - | Running | demo_private=10.0.0.192 |
+--------------------------------------+----------------------------------------------+--------+------------+-------------+-------------------------+
[yfried@yfried-mobl-rh ~(keystone_demo)]$ nova list | wc -l
68
[yfried@yfried-mobl-rh ~(keystone_demo)]$ neutron port-list --device_id f41ec90c-6d10-4ee5-be7f-96590ec7c8c1
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
| id | name | mac_address | fixed_ips |
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
| ca09537e-3cb8-4c81-a2ca-db8ac1f9e95b | | fa:16:3e:f4:c5:a8 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.140"} |
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
[yfried@yfried-mobl-rh ~(keystone_demo)]$ neutron port-show ca09537e-3cb8-4c81-a2ca-db8ac1f9e95b
+-----------------------+-----------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+-----------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:vnic_type | normal |
| device_id | f41ec90c-6d10-4ee5-be7f-96590ec7c8c1 |
| device_owner | compute:None |
| extra_dhcp_opts | |
| fixed_ips | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.140"} |
| id | ca09537e-3cb8-4c81-a2ca-db8ac1f9e95b |
| mac_address | fa:16:3e:f4:c5:a8 |
| name | |
| network_id | 17bfbf17-8b55-4e11-98c3-264014c31172 |
| security_groups | 45e0761a-eee1-4717-a3b7-359be0ed2788 |
| status | DOWN |
| tenant_id | 3957117d021244b5badcacb08cdc83e2 |
+-----------------------+-----------------------------------------------------------------------------------+
[yfried@yfried-mobl-rh ~(keystone_demo)]$ neutron port-list --device_owner compute:None
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
| id | name | mac_address | fixed_ips |
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
| 01e07410-b27a-41bf-8206-63b94ef2ef97 | | fa:16:3e:e3:88:92 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.188"} |
| 0b5de93f-5935-4d07-b044-1936d00847f1 | | fa:16:3e:d5:1d:e8 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.141"} |
| 148bba35-312c-49ff-9fd8-c7448e6033d6 | | fa:16:3e:46:7d:f2 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.184"} |
| 2447ef35-6d09-4dcb-82ec-15c8df4501a8 | | fa:16:3e:ab:8e:76 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.170"} |
| 264dbf3b-4f6b-4c55-8996-3626999adddb | | fa:16:3e:97:f4:6c | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.172"} |
| 27b2810b-a726-4b4c-b5d4-a98393336ccf | | fa:16:3e:2e:e6:ae | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.149"} |
| 31092e4a-4ce6-4dfb-b9e1-c9d7000e9897 | | fa:16:3e:d1:16:c6 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.181"} |
| 313dae93-739e-4046-92fb-bf5a8c136740 | | fa:16:3e:87:d1:de | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.156"} |
| 3689f727-b0d9-4afc-822d-5ce4648a9be4 | | fa:16:3e:2f:93:94 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.203"} |
| 39f1af5b-4ed5-4acb-a412-849a3c69ad25 | | fa:16:3e:44:01:72 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.143"} |
| 3df7a0b2-1e11-4394-a667-aa649fdf1a95 | | fa:16:3e:44:97:d8 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.202"} |
| 41a9075b-37d9-460e-8413-f17578a5a677 | | fa:16:3e:81:89:ed | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.165"} |
| 4431e932-91d3-4d29-a8f2-36a634e95cc3 | | fa:16:3e:11:84:29 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.148"} |
| 483b352b-e808-44c5-a160-63a637acb01e | | fa:16:3e:2f:ed:dd | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.182"} |
| 513c4ffe-9988-4ce3-b3e5-1dffe3b3fa03 | | fa:16:3e:1f:9f:46 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.147"} |
| 549de3d5-2b8f-4822-9912-15730b03234f | | fa:16:3e:df:45:2f | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.167"} |
| 5a55a171-65f8-4784-9ec2-32fb89ffb60b | | fa:16:3e:f8:f0:f6 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.142"} |
| 694b4cdc-d66f-4eda-8444-65077f7d0972 | | fa:16:3e:67:7f:1c | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.197"} |
| 6aaeb1e4-6feb-47b1-871f-22aa715373b9 | | fa:16:3e:bc:2f:06 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.183"} |
| 6d6595a1-7daa-41b1-b049-124ed549c0f7 | | fa:16:3e:da:00:e6 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.168"} |
| 70b5d20b-0bbc-49bf-8d2c-de78f0057e4d | | fa:16:3e:7f:c0:29 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.161"} |
| 7c3b3872-f711-43a4-a566-669c640c99f4 | | fa:16:3e:71:19:e5 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.164"} |
| 820d8afc-4d74-4f19-a7a5-8c58cd41e898 | | fa:16:3e:67:84:a1 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.180"} |
| 8304220f-fd1b-44c5-b2f2-c2916fd07fe8 | | fa:16:3e:c2:0d:6e | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.193"} |
| 960d974a-5fb0-4df0-9ec9-4878381e0fda | | fa:16:3e:e6:85:52 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.189"} |
| 96f8de53-9b38-4a45-a6ff-8fd489a26a8c | | fa:16:3e:71:65:df | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.158"} |
| 9a7475fa-1ef3-4b66-92a3-c86ec3f5d749 | | fa:16:3e:16:dd:2f | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.195"} |
| 9b792d1a-48b0-456c-ae68-f8d25936d98e | | fa:16:3e:80:eb:0c | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.146"} |
| 9ba38561-ce0a-4a8e-a860-e01e40e2eea7 | | fa:16:3e:fd:3c:f4 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.171"} |
| 9c8ec990-ef0a-4778-86ea-00e9c5d5c66e | | fa:16:3e:28:a2:29 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.160"} |
| a09f7486-aae6-4b0b-b298-9e0f80b46278 | | fa:16:3e:d3:cc:f5 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.162"} |
| a0feff01-b1ae-4ba5-a58c-02fe8dd2046a | | fa:16:3e:a3:57:1d | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.173"} |
| a448055c-cbdf-4db3-b01a-2ddd4f2da4d6 | | fa:16:3e:06:1d:f2 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.166"} |
| a8d13ffd-b1ce-4f0d-bb7b-68be8c69e845 | | fa:16:3e:3f:b7:d2 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.145"} |
| ae9cf945-b7a2-4bca-ade7-e664e9ccab71 | | fa:16:3e:f3:11:31 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.144"} |
| af87657d-e36d-4fde-8fa2-36327785bbc4 | | fa:16:3e:4b:01:ed | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.186"} |
| b113ffd7-6440-4c29-bd58-97bbdac2b534 | | fa:16:3e:e3:c4:51 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.191"} |
| b3947ef4-d430-4418-8318-942451c3c032 | | fa:16:3e:e2:71:49 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.151"} |
| b6ddf71e-493c-4dd2-9550-3e4e5b13c4c2 | | fa:16:3e:a2:91:7b | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.200"} |
| ba2aa507-8097-46d2-9d0b-aeaddddb0d03 | | fa:16:3e:fd:f6:f9 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.194"} |
| c1ac28c3-c5b5-44a1-b774-805a66bf69d1 | | fa:16:3e:6c:0e:78 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.155"} |
| c6c51c19-6723-4eb0-83b3-bd3042429c32 | | fa:16:3e:88:47:9d | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.152"} |
| c7639f56-0f96-4bf2-8f8c-98b280ae49db | | fa:16:3e:af:e6:04 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.196"} |
| c829811b-2f29-4a61-96df-61600fe12591 | | fa:16:3e:5e:df:7b | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.169"} |
| c9b9bce7-59c3-4de1-822a-e3cb60730286 | | fa:16:3e:82:ef:df | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.154"} |
| ca09537e-3cb8-4c81-a2ca-db8ac1f9e95b | | fa:16:3e:f4:c5:a8 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.140"} |
| cb1247fb-9654-43bc-b149-a209a4c0ef1f | | fa:16:3e:09:b5:4f | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.201"} |
| ce16b05b-c84d-4e0b-b54c-54da15635d3c | | fa:16:3e:23:f6:9a | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.176"} |
| d498d2c4-86b2-47d2-8f52-dbab1b78c7ae | | fa:16:3e:0d:63:20 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.174"} |
| d5f5355b-2793-4a35-8914-c488bc98b78a | | fa:16:3e:60:d5:32 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.175"} |
| d69b8962-a109-42ee-910a-2a2f618658d6 | | fa:16:3e:51:86:94 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.185"} |
| d87d6d1b-1e1d-49d5-9a5c-440d58432e55 | | fa:16:3e:00:15:bd | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.179"} |
| ded96519-bbea-45d5-ab34-8321c2320e7f | | fa:16:3e:d9:da:a5 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.177"} |
| e18315e1-3aff-4786-a0c4-196af79b59e8 | | fa:16:3e:48:f9:4f | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.199"} |
| e46caeca-14d8-4712-a79d-35a57f62eff2 | | fa:16:3e:0a:f8:07 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.153"} |
| e64d51dd-916c-4306-924a-06ad9ac3712d | | fa:16:3e:3c:b1:f7 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.187"} |
| e6d46c1d-16ef-4592-bc8a-24c1f81f32fd | | fa:16:3e:7b:4e:fc | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.178"} |
| ea02236d-35b3-4b41-bd82-68a4b185e3fe | | fa:16:3e:ec:ab:04 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.163"} |
| f30ffae7-161c-4d0e-98aa-c66f1b90cb28 | | fa:16:3e:46:f7:7c | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.190"} |
| f354cd61-7aee-4389-b68c-f079cfc58a96 | | fa:16:3e:08:17:7e | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.150"} |
| f81204e1-cd3b-4aa5-a3bd-de01c6b23afc | | fa:16:3e:4f:e5:a6 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.192"} |
| f9b637cd-e439-43d2-833a-202cab28b35a | | fa:16:3e:69:53:c2 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.198"} |
| fabd224f-76aa-48ed-8f07-699bc2a8dde2 | | fa:16:3e:10:7c:05 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.157"} |
| ff2bb3bb-bcbd-4bc8-80b4-5c91841048c0 | | fa:16:3e:41:29:a3 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.159"} |
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+

[yfried@yfried-mobl-rh ~(keystone_demo)]$ neutron port-list --device_owner compute:None -c status -c id -c fixed_ips
+--------+--------------------------------------+-----------------------------------------------------------------------------------+
| status | id | fixed_ips |
+--------+--------------------------------------+-----------------------------------------------------------------------------------+
| ACTIVE | 01e07410-b27a-41bf-8206-63b94ef2ef97 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.188"} |
| ACTIVE | 0b5de93f-5935-4d07-b044-1936d00847f1 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.141"} |
| ACTIVE | 148bba35-312c-49ff-9fd8-c7448e6033d6 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.184"} |
| ACTIVE | 2447ef35-6d09-4dcb-82ec-15c8df4501a8 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.170"} |
| ACTIVE | 264dbf3b-4f6b-4c55-8996-3626999adddb | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.172"} |
| ACTIVE | 27b2810b-a726-4b4c-b5d4-a98393336ccf | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.149"} |
| ACTIVE | 31092e4a-4ce6-4dfb-b9e1-c9d7000e9897 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.181"} |
| ACTIVE | 313dae93-739e-4046-92fb-bf5a8c136740 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.156"} |
| ACTIVE | 3689f727-b0d9-4afc-822d-5ce4648a9be4 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.203"} |
| ACTIVE | 39f1af5b-4ed5-4acb-a412-849a3c69ad25 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.143"} |
| ACTIVE | 3df7a0b2-1e11-4394-a667-aa649fdf1a95 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.202"} |
| ACTIVE | 41a9075b-37d9-460e-8413-f17578a5a677 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.165"} |
| ACTIVE | 4431e932-91d3-4d29-a8f2-36a634e95cc3 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.148"} |
| ACTIVE | 483b352b-e808-44c5-a160-63a637acb01e | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.182"} |
| ACTIVE | 513c4ffe-9988-4ce3-b3e5-1dffe3b3fa03 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.147"} |
| ACTIVE | 549de3d5-2b8f-4822-9912-15730b03234f | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.167"} |
| ACTIVE | 5a55a171-65f8-4784-9ec2-32fb89ffb60b | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.142"} |
| ACTIVE | 694b4cdc-d66f-4eda-8444-65077f7d0972 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.197"} |
| ACTIVE | 6aaeb1e4-6feb-47b1-871f-22aa715373b9 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.183"} |
| ACTIVE | 6d6595a1-7daa-41b1-b049-124ed549c0f7 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.168"} |
| ACTIVE | 70b5d20b-0bbc-49bf-8d2c-de78f0057e4d | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.161"} |
| ACTIVE | 7c3b3872-f711-43a4-a566-669c640c99f4 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.164"} |
| ACTIVE | 820d8afc-4d74-4f19-a7a5-8c58cd41e898 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.180"} |
| ACTIVE | 8304220f-fd1b-44c5-b2f2-c2916fd07fe8 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.193"} |
| ACTIVE | 960d974a-5fb0-4df0-9ec9-4878381e0fda | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.189"} |
| ACTIVE | 96f8de53-9b38-4a45-a6ff-8fd489a26a8c | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.158"} |
| ACTIVE | 9a7475fa-1ef3-4b66-92a3-c86ec3f5d749 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.195"} |
| ACTIVE | 9b792d1a-48b0-456c-ae68-f8d25936d98e | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.146"} |
| ACTIVE | 9ba38561-ce0a-4a8e-a860-e01e40e2eea7 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.171"} |
| ACTIVE | 9c8ec990-ef0a-4778-86ea-00e9c5d5c66e | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.160"} |
| ACTIVE | a09f7486-aae6-4b0b-b298-9e0f80b46278 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.162"} |
| ACTIVE | a0feff01-b1ae-4ba5-a58c-02fe8dd2046a | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.173"} |
| ACTIVE | a448055c-cbdf-4db3-b01a-2ddd4f2da4d6 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.166"} |
| ACTIVE | a8d13ffd-b1ce-4f0d-bb7b-68be8c69e845 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.145"} |
| ACTIVE | ae9cf945-b7a2-4bca-ade7-e664e9ccab71 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.144"} |
| ACTIVE | af87657d-e36d-4fde-8fa2-36327785bbc4 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.186"} |
| ACTIVE | b113ffd7-6440-4c29-bd58-97bbdac2b534 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.191"} |
| ACTIVE | b3947ef4-d430-4418-8318-942451c3c032 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.151"} |
| ACTIVE | b6ddf71e-493c-4dd2-9550-3e4e5b13c4c2 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.200"} |
| ACTIVE | ba2aa507-8097-46d2-9d0b-aeaddddb0d03 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.194"} |
| ACTIVE | c1ac28c3-c5b5-44a1-b774-805a66bf69d1 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.155"} |
| ACTIVE | c6c51c19-6723-4eb0-83b3-bd3042429c32 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.152"} |
| ACTIVE | c7639f56-0f96-4bf2-8f8c-98b280ae49db | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.196"} |
| ACTIVE | c829811b-2f29-4a61-96df-61600fe12591 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.169"} |
| ACTIVE | c9b9bce7-59c3-4de1-822a-e3cb60730286 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.154"} |
| DOWN | ca09537e-3cb8-4c81-a2ca-db8ac1f9e95b | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.140"} |
| ACTIVE | cb1247fb-9654-43bc-b149-a209a4c0ef1f | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.201"} |
| ACTIVE | ce16b05b-c84d-4e0b-b54c-54da15635d3c | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.176"} |
| ACTIVE | d498d2c4-86b2-47d2-8f52-dbab1b78c7ae | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.174"} |
| ACTIVE | d5f5355b-2793-4a35-8914-c488bc98b78a | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.175"} |
| ACTIVE | d69b8962-a109-42ee-910a-2a2f618658d6 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.185"} |
| ACTIVE | d87d6d1b-1e1d-49d5-9a5c-440d58432e55 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.179"} |
| ACTIVE | ded96519-bbea-45d5-ab34-8321c2320e7f | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.177"} |
| ACTIVE | e18315e1-3aff-4786-a0c4-196af79b59e8 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.199"} |
| ACTIVE | e46caeca-14d8-4712-a79d-35a57f62eff2 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.153"} |
| ACTIVE | e64d51dd-916c-4306-924a-06ad9ac3712d | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.187"} |
| ACTIVE | e6d46c1d-16ef-4592-bc8a-24c1f81f32fd | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.178"} |
| ACTIVE | ea02236d-35b3-4b41-bd82-68a4b185e3fe | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.163"} |
| ACTIVE | f30ffae7-161c-4d0e-98aa-c66f1b90cb28 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.190"} |
| ACTIVE | f354cd61-7aee-4389-b68c-f079cfc58a96 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.150"} |
| ACTIVE | f81204e1-cd3b-4aa5-a3bd-de01c6b23afc | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.192"} |
| ACTIVE | f9b637cd-e439-43d2-833a-202cab28b35a | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.198"} |
| ACTIVE | fabd224f-76aa-48ed-8f07-699bc2a8dde2 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.157"} |
| ACTIVE | ff2bb3bb-bcbd-4bc8-80b4-5c91841048c0 | {"subnet_id": "a117329e-a613-4779-9f26-f54f928c8f39", "ip_address": "10.0.0.159"} |
+--------+--------------------------------------+-----------------------------------------------------------------------------------+

Revision history for this message
Yair Fried (yfried) wrote :
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Can you also post your sanitized nova.conf (& any other relevant configs) - might help in debugging this more.

tags: added: nova
tags: added: network
removed: nova
tags: added: compute
Revision history for this message
Yair Fried (yfried) wrote :

correction: when using 2 compute nodes you can boot ~120 VMs.
This seems like an issue with nova-compute missing nova-api notifications
https://bugzilla.redhat.com/show_bug.cgi?id=1141518

Revision history for this message
Yair Fried (yfried) wrote :
Revision history for this message
Yair Fried (yfried) wrote :
Revision history for this message
Sean Dague (sdague) wrote :

Can you prune out relevant piece of the log files and put the relevant snippets into the bug. Better if bugs from community members come with 2nd state debug already.

Changed in nova:
status: New → Incomplete
importance: Undecided → Low
Revision history for this message
Yair Fried (yfried) wrote :
Download full text (5.8 KiB)

Well, all I've been able to find, so far, is that it seems like nova-compute is ignoring neutron port-notifications:

taking 2 instances
f41ec90c-6d10-4ee5-be7f-96590ec7c8c1 - ERROR - VirtualInterfaceCreateException: Virtual Interface creation failed
be60b489-07d3-4ec1-a6e7-79b459d631c5 - ACTIVE

server be60b489-07d3-4ec1-a6e7-79b459d631c5 Neutron sends port notifications:
[yfried-mobl-rh ~/workspace/Hosts/scale_debug] # grep -rn be60b489-07d3-4ec1-a6e7-79b459d631c5 | grep notifiers
var/log/neutron/server.log:9482:2014-09-21 11:32:02.358 57216 DEBUG neutron.notifiers.nova [-] Sending events: [{'status': 'completed', 'tag': u'7c3b3872-f711-43a4-a566-669c640c99f4', 'name': 'network-vif-plugged', 'server_uuid': u'0d3be93b-73d3-4995-913c-03a4b80ad37e'}, {'status': 'completed', 'tag': u'a09f7486-aae6-4b0b-b298-9e0f80b46278', 'name': 'network-vif-plugged', 'server_uuid': u'be60b489-07d3-4ec1-a6e7-79b459d631c5'}, {'status': 'completed', 'tag': u'ff2bb3bb-bcbd-4bc8-80b4-5c91841048c0', 'name': 'network-vif-plugged', 'server_uuid': u'547d6f97-31db-4718-afb3-4b6123d71076'}, {'status': 'completed', 'tag': u'549de3d5-2b8f-4822-9912-15730b03234f', 'name': 'network-vif-plugged', 'server_uuid': u'a431c16f-bf1e-424c-88f9-b0ba2a683f99'}, {'status': 'completed', 'tag': u'820d8afc-4d74-4f19-a7a5-8c58cd41e898', 'name': 'network-vif-plugged', 'server_uuid': u'2b7256a8-c04a-42cf-9c19-5836b585c0f5'}] send_events /usr/lib/python2.7/site-packages/neutron/notifiers/nova.py:218
var/log/neutron/server.log:9507:2014-09-21 11:32:03.618 57216 INFO neutron.notifiers.nova [-] Nova event response: {u'status': u'completed', u'tag': u'a09f7486-aae6-4b0b-b298-9e0f80b46278', u'name': u'network-vif-plugged', u'server_uuid': u'be60b489-07d3-4ec1-a6e7-79b459d631c5', u'code': 200}

Nova receives notifications:
[yfried-mobl-rh ~/workspace/Hosts/scale_debug] # grep -rn be60b489-07d3-4ec1-a6e7-79b459d631c5 var/log/nova | grep "event network-vif-plugged"
var/log/nova/nova-api.log:3059:2014-09-21 11:32:03.486 57411 AUDIT nova.api.openstack.compute.contrib.server_external_events [req-50beaf5f-5db9-4fd3-b1fd-26b5a7d39b11 940afd62ccde4376915abee8356a28d9 9426944e683b41d5aa4d7f8e298fd4d8] Creating event network-vif-plugged:a09f7486-aae6-4b0b-b298-9e0f80b46278 for instance be60b489-07d3-4ec1-a6e7-79b459d631c5
var/log/nova/nova-compute.log:8370:2014-09-21 11:31:46.012 57535 DEBUG nova.compute.manager [req-10ffb1e2-6592-492f-b465-00a0658a6251 16480061596d47ea819c7def6d7eef1f 3957117d021244b5badcacb08cdc83e2] [instance: be60b489-07d3-4ec1-a6e7-79b459d631c5] Preparing to wait for external event network-vif-plugged-a09f7486-aae6-4b0b-b298-9e0f80b46278 prepare_for_instance_event /usr/lib/python2.7/site-packages/nova/compute/manager.py:447
var/log/nova/nova-compute.log:14539:2014-09-21 11:36:24.837 57535 DEBUG nova.compute.manager [req-50beaf5f-5db9-4fd3-b1fd-26b5a7d39b11 940afd62ccde4376915abee8356a28d9 9426944e683b41d5aa4d7f8e298fd4d8] [instance: be60b489-07d3-4ec1-a6e7-79b459d631c5] Processing event network-vif-plugged-a09f7486-aae6-4b0b-b298-9e0f80b46278 _process_instance_event /usr/lib/python2.7/site-packages/nova/compute/manager.py:5665

But on server f41ec90c-6d10-4ee5-be7f-96590ec7c8c1:
Neut...

Read more...

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Seems like a duplicte of bug 1357476

Changed in neutron:
assignee: nobody → Oleg Bondarev (obondarev)
Changed in nova:
assignee: nobody → Oleg Bondarev (obondarev)
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

It's not clear if Neutron is to blame here as it sends notifications as expected.

tags: added: loadimpact
Changed in neutron:
status: New → Incomplete
Revision history for this message
Oleg Bondarev (obondarev) wrote :

Able to reproduce on devstack.
Inspecting logs (n-cpu, n-api, q-svc) shows that this is not a neutron issue: q-svc sends vif_plugged event in time (no timeout on n-cpu), n-api accepts the event and sends it to n-cpu by RPC. For some reason n-cpu starts to handle this event when timeout has already occured (several minutes past the event was sent from n-api). Increasing vif_plugging_timeout does not solve the issue, so obviosly there is some race in n-cpu. Will dig more into it.

Changed in neutron:
status: Incomplete → Invalid
Changed in nova:
importance: Low → High
Changed in neutron:
assignee: Oleg Bondarev (obondarev) → nobody
Revision history for this message
Oleg Bondarev (obondarev) wrote :

So the result of my analysis is following: eventually the root cause is in: https://github.com/openstack/oslo.messaging/blob/master/oslo/messaging/_executors/impl_eventlet.py
In particular rpc_thread_pool_size which is 64 by default.
When spawning >= 64 instances at the same time in n-cpu we have 64 blocked threads waiting for network-vif-plugged events.
Then when network-vif-plugged events come to n-cpu from n-api by rpc (neutron -> n-api -> n-cpu) - there is no available threads in thread pool to handle them.
After instances start to fall with timeouts - available threads appear and start to handle network-vif-plugged events so that the rest of the instances become active (right before timeout for them occures).

So we have 1:1 relationship between rpc_thread_pool_size and the number of instances that can be spawned simultaneously.

One of possible fixes I can think of is to set priority for the rpc messages and have a set of "reserved" threads which can be used only for high-priority messages (network-vif-plugged for example).
Another way (maybe a bit simpler) is to monitor in the number of available threads in the pool being able to provide extra threads in case pool becomes empty.
Not sure how this can be fixed in Nova.

Changed in nova:
status: Incomplete → Opinion
Revision history for this message
Yair Fried (yfried) wrote :

I think the status was changed to Opinion by accident. It should be "Confirmed"

Changed in nova:
status: Opinion → Confirmed
Changed in oslo.messaging:
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
status: New → Confirmed
Revision history for this message
Attila Fazekas (afazekas) wrote :

The default greenpool size 1000 http://eventlet.net/doc/modules/greenpool.html.
The 64 maximum thread sounds very small to me. I would recommend to increase the default size to 512.

Revision history for this message
Attila Fazekas (afazekas) wrote :

512 might be too much if the threads needs to use more file descriptors and we are using the default 1024 limit.
rpc_thread_pool_size=256 might be better if we expect a worker might call a long running external command with 3 pipe fd.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.messaging (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/130278

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/130601

Changed in nova:
assignee: Oleg Bondarev (obondarev) → Dan Smith (danms)
status: Confirmed → In Progress
Revision history for this message
Dan Smith (danms) wrote :

Can someone try my patch to nova for this issue to see if it helps?

It's worth noting that using an all-in-one deployment for any sort of scale testing eliminates most of what nova uses to scale horizontally. I would even say that this particular issue is rather synthetic as it would only impact clouds with enough parallel activity to hit 64 concurrent builds on a single compute node.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/130601
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1d8eddb2614de8daaddddd64ad1a8de4c215fe7a
Submitter: Jenkins
Branch: master

commit 1d8eddb2614de8daaddddd64ad1a8de4c215fe7a
Author: Dan Smith <email address hidden>
Date: Thu Oct 23 10:10:48 2014 -0700

    Run build_and_run_instance in a separate greenthread

    If we're doing a lot of build operations, we are using a large portion
    of the limited rpc worker pool for long periods of time. Since we may wait
    on external services (like neutron or glance) during those times, we could
    fully deplete that pool.

    This patch makes us spawn a new greenthread for that task and return the
    rpc worker to the pool. Due to some funkiness with the stack of decorators,
    this breaks the inner function out to an object method, which is probably
    good anyway, given its size. This also moves the wrap_instance_event
    decorator to the inner function so that the start and stop events properly
    demarcate the actual task and not just the (now very quick) RPC call.

    Change-Id: Ife712c43c5a61424bc68b2f5ab47cefdb46ac168
    Closes-Bug: #1372049

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Yair Fried (yfried) wrote :

I still see this problem even when I'm using the patch.
I fear this might even got worse, since once I pass the 63 limit (and delete all) even a single VM fails to boot

Changed in neutron:
status: Invalid → Confirmed
Changed in nova:
status: Fix Committed → Confirmed
status: Confirmed → Fix Committed
Revision history for this message
Yair Fried (yfried) wrote :

Please ignore my previous comment, I have verified that I am able to boot 75 VMs at once.
I have no idea why I ran into the problem I mentioned earlier. could be that the timeout was too small

Changed in neutron:
status: Confirmed → Invalid
Revision history for this message
Yair Fried (yfried) wrote :

sorry for messing up the bugs' status. connection trouble...

Revision history for this message
Yair Fried (yfried) wrote :

Can we backport this fix to Juno and Icehouse?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/132202

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/132218

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Automatic expansion of thread pool in oslo.messaging is not an option since we should have some limit applied to avoid other problems due to too high parallelism. If we will ever consider expansion of the pool beyond the hardcoded value from the configuration file, we'll need to apply some non-obvious heuristics to determine whether higher parallelism will be beneficial for the whole system.

We may change the default value for oslo.messaging eventlet executor, though it will influence all the services that use the library, not just this specific case, and it's not obvious whether it won't introduce other issues.

The safest option is to backport the Nova fix to stable branches.

I've requested juno and icehouse backports for the Nova patch:
- https://review.openstack.org/132202 (Juno)
- https://review.openstack.org/132218 (Icehouse)

Changed in oslo.messaging:
status: Confirmed → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.messaging (master)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: master
Review: https://review.openstack.org/130278
Reason: Another fix with better isolation for the bug was implemented for Nova.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/132202
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f9be9467a30aaa0540867e2818e81b5b024af527
Submitter: Jenkins
Branch: stable/juno

commit f9be9467a30aaa0540867e2818e81b5b024af527
Author: Dan Smith <email address hidden>
Date: Thu Oct 23 10:10:48 2014 -0700

    Run build_and_run_instance in a separate greenthread

    If we're doing a lot of build operations, we are using a large portion
    of the limited rpc worker pool for long periods of time. Since we may wait
    on external services (like neutron or glance) during those times, we could
    fully deplete that pool.

    This patch makes us spawn a new greenthread for that task and return the
    rpc worker to the pool. Due to some funkiness with the stack of decorators,
    this breaks the inner function out to an object method, which is probably
    good anyway, given its size. This also moves the wrap_instance_event
    decorator to the inner function so that the start and stop events properly
    demarcate the actual task and not just the (now very quick) RPC call.

    Change-Id: Ife712c43c5a61424bc68b2f5ab47cefdb46ac168
    Closes-Bug: #1372049
    (cherry picked from commit 1d8eddb2614de8daaddddd64ad1a8de4c215fe7a)

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-1
status: Fix Committed → Fix Released
Alan Pevec (apevec)
tags: removed: in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/132218
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8812672b909f861e6d800b7ba99e330684f0c63e
Submitter: Jenkins
Branch: stable/icehouse

commit 8812672b909f861e6d800b7ba99e330684f0c63e
Author: Dan Smith <email address hidden>
Date: Thu Oct 23 10:10:48 2014 -0700

    Run build_and_run_instance in a separate greenthread

    If we're doing a lot of build operations, we are using a large portion
    of the limited rpc worker pool for long periods of time. Since we may wait
    on external services (like neutron or glance) during those times, we could
    fully deplete that pool.

    This patch makes us spawn a new greenthread for that task and return the
    rpc worker to the pool. Due to some funkiness with the stack of decorators,
    this breaks the inner function out to an object method, which is probably
    good anyway, given its size. This also moves the wrap_instance_event
    decorator to the inner function so that the start and stop events properly
    demarcate the actual task and not just the (now very quick) RPC call.

    Conflicts:
     nova/compute/manager.py
     nova/tests/compute/test_compute_mgr.py

    Icehouse changes:
    - some patched unit tests were not present or had different names.
    - minor change in unit tests to pass pep8 checks.

    Change-Id: Ife712c43c5a61424bc68b2f5ab47cefdb46ac168
    Closes-Bug: #1372049
    (cherry picked from commit 1d8eddb2614de8daaddddd64ad1a8de4c215fe7a)

Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.