nova list as admin is slow (no vms)

Bug #1176446 reported by Jacob Cherkas
66
This bug affects 13 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
High
Unassigned

Bug Description

Running nova 2013.1 from Ubuntu packages.

I saw bug 1160487 and thought this might be a dup but I dont believe so because running nova list as admin who owns no VMs at all takes about 25-30 seconds to return .

We also applied commit e653938ff7bc6b9b3e97e784bb07516576305b3e to nova which significantly improved nova list for none admin tenants only.

nova --debug list

EQ: curl -i http://10.34.104.187:35357/v2.0/tokens -X POST -H "Content-Type: application/json" -H "Accept: application/json" -H "User-Agent: python-novaclient" -d '{"auth": {"tenantName": "nicira", "passwordCredentials": {"username": "admin", "password": "xxxxx!"}}}'

INFO (connectionpool:191) Starting new HTTP connection (1): 10.34.104.187
DEBUG (connectionpool:283) "POST /v2.0/tokens HTTP/1.1" 200 2416
RESP: [200] {'date': 'Sat, 04 May 2013 23:32:29 GMT', 'content-type': 'application/json', 'content-length': '2416', 'vary': 'X-Auth-Token'}
RESP BODY: {"access": {"token": {"issued_at": "2013-05-04T23:32:29.848568", "expires": "2013-05-05T23:32:29Z", "id": "166650472b6e4bc0bd0ec3c1ab82a2e2", "tenant": {"description": "Default Tenant - Admin", "enabled": true, "id": "fc9ba4c1d32d48679b5c3e9b2c004b9b", "name": "nicira"}}, "serviceCatalog": [{"endpoints": [{"adminURL": "http://10.34.104.185:8774/v2/fc9ba4c1d32d48679b5c3e9b2c004b9b", "region": "PA", "internalURL": "http://10.34.104.185:8774/v2/fc9ba4c1d32d48679b5c3e9b2c004b9b", "id": "280c800402da47d393e4e0890a5a830e", "publicURL": "http://10.34.104.185:8774/v2/fc9ba4c1d32d48679b5c3e9b2c004b9b"}], "endpoints_links": [], "type": "compute", "name": "nova"}, {"endpoints": [{"adminURL": "http://10.34.104.188:9696", "region": "PA", "internalURL": "http://10.34.104.188:9696", "id": "2b188ab59755429c94324088bb2fa9a2", "publicURL": "http://10.34.104.188:9696"}], "endpoints_links": [], "type": "network", "name": "quantum"}, {"endpoints": [{"adminURL": "http://10.34.104.185:9292", "region": "PA", "internalURL": "http://10.34.104.185:9292", "id": "be1d2f2449ac448299c1258913b16474", "publicURL": "http://10.34.104.185:9292"}], "endpoints_links": [], "type": "image", "name": "glance"}, {"endpoints": [{"adminURL": "http://10.34.104.190:8776/v1/fc9ba4c1d32d48679b5c3e9b2c004b9b", "region": "PA", "internalURL": "http://10.34.104.190:8776/v1/fc9ba4c1d32d48679b5c3e9b2c004b9b", "id": "9ae35a87f24040038851ce9c9e20147d", "publicURL": "http://10.34.104.190:8776/v1/fc9ba4c1d32d48679b5c3e9b2c004b9b"}], "endpoints_links": [], "type": "volume", "name": "cinder"}, {"endpoints": [{"adminURL": "http://10.34.104.185:8773/service/Cloud", "region": "PA", "internalURL": "http://10.34.104.185:8773/service/Cloud", "id": "0ae37a0217d6445e8adbb5ce08146c0b", "publicURL": "http://10.34.104.185:8773/service/Cloud"}], "endpoints_links": [], "type": "ec2", "name": "ec2"}, {"endpoints": [{"adminURL": "http://10.34.104.187:35357/v2.0", "region": "PA", "internalURL": "http://10.34.104.187:5000/v2.0", "id": "37b3aa6fade24ced8d6dae8fdaac8449", "publicURL": "http://10.34.104.187:5000/v2.0"}], "endpoints_links": [], "type": "identity", "name": "keystone"}], "user": {"username": "admin", "roles_links": [], "id": "5e363b8f0665443d89ca9d9787a19a81", "roles": [{"name": "admin"}, {"name": "_member_"}], "name": "admin"}, "metadata": {"is_admin": 0, "roles": ["b04ac30a90f64c3692d54c73e924e2ae", "9fe2ff9ee4384b1894a90878d3e92bab"]}}}

REQ: curl -i http://10.34.104.185:8774/v2/fc9ba4c1d32d48679b5c3e9b2c004b9b/servers/detail -X GET -H "X-Auth-Project-Id: nicira" -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: 166650472b6e4bc0bd0ec3c1ab82a2e2"

INFO (connectionpool:191) Starting new HTTP connection (1): 10.34.104.185
DEBUG (connectionpool:283) "GET /v2/fc9ba4c1d32d48679b5c3e9b2c004b9b/servers/detail HTTP/1.1" 200 15
RESP: [200] {'date': 'Sat, 04 May 2013 23:33:06 GMT', 'x-compute-request-id': 'req-32739176-1998-4b1e-8fa6-c2f7b029b6a7', 'content-type': 'application/json', 'content-length': '15'}
RESP BODY: {"servers": []}

nova-api logs in debug mode:

2013-05-04 16:32:40.958 8633 INFO nova.osapi_compute.wsgi.server [-] (8633) accepted ('10.34.104.185', 58359)

2013-05-04 16:32:41.080 DEBUG nova.api.openstack.wsgi [req-32739176-1998-4b1e-8fa6-c2f7b029b6a7 5e363b8f0665443d89ca9d9787a19a81 fc9ba4c1d32d48679b5c3e9b2c004b9b] No Content-Type provided in request get_body /usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py:791
2013-05-04 16:32:41.080 DEBUG nova.api.openstack.wsgi [req-32739176-1998-4b1e-8fa6-c2f7b029b6a7 5e363b8f0665443d89ca9d9787a19a81 fc9ba4c1d32d48679b5c3e9b2c004b9b] Calling method <bound method Controller.detail of <nova.api.openstack.compute.servers.Controller object at 0x2ed9f90>> _process_stack /usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py:911
2013-05-04 16:32:41.081 DEBUG nova.compute.api [req-32739176-1998-4b1e-8fa6-c2f7b029b6a7 5e363b8f0665443d89ca9d9787a19a81 fc9ba4c1d32d48679b5c3e9b2c004b9b] Searching by: {'deleted': False, 'project_id': u'fc9ba4c1d32d48679b5c3e9b2c004b9b'} get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:1372

2013-05-04 16:33:06.487 INFO nova.osapi_compute.wsgi.server [req-32739176-1998-4b1e-8fa6-c2f7b029b6a7 5e363b8f0665443d89ca9d9787a19a81 fc9ba4c1d32d48679b5c3e9b2c004b9b] 10.34.104.185 "GET /v2/fc9ba4c1d32d48679b5c3e9b2c004b9b/servers/detail HTTP/1.1" status: 200 len: 187 time: 25.5273259

Clearly no instances are returned because there are none and it took (16:32:40 - 16:33:06) 26 seconds.

The same slowness can be observed if you run nova show on any instance.

Tags: db
Revision history for this message
Vish Ishaya (vishvananda) wrote :

Do you have a large database? I'm guessing that we lost our index on deleted.

Changed in nova:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Vish Ishaya (vishvananda) wrote :

yes it looks like the deleted index is gone.

tags: added: grizzly-backport-potential
Revision history for this message
Jacob Cherkas (jcherkas) wrote :

Thanks Vish.

Can we re-index on deleted?

Can you provide the sql command so I dont make it worse?

Thanks.

Revision history for this message
Jacob Cherkas (jcherkas) wrote :

I am under the assumption you are referring to the instances table:
+-----------+------------+-----------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| instances | 0 | PRIMARY | 1 | id | A | 2762 | NULL | NULL | | BTREE | | |
| instances | 0 | uuid | 1 | uuid | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | project_id | 1 | project_id | A | 251 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_host_deleted_idx | 1 | host | A | 69 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_reservation_id_idx | 1 | reservation_id | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_terminated_at_launched_at_idx | 1 | terminated_at | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_terminated_at_launched_at_idx | 2 | launched_at | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_uuid_deleted_idx | 1 | uuid | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_task_state_updated_at_idx | 1 | task_state | A | 2 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_task_state_updated_at_idx | 2 | updated_at | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_host_node_deleted_idx | 1 | host | A | 49 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_host_node_deleted_idx | 2 | node | A | 98 | NULL | NULL | YES | BTREE | | |
+-----------+------------+-----------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
12 rows in set (0.01 sec)

Revision history for this message
aeva black (tenbrae) wrote :

Jacob,

Could you attach a sample of the SQL queries generated by the slow "nova list" and "nova show" commands? You can get this by temporarily setting mysql's long_query_time to 0 (assuming you have slow log enabled), eg.

  SET GLOBAL long_query_time=0;
  # run nova list
  SET GLOBAL long_query_time=2; # or what ever your environment's default is

I did this with a local test deployment from trunk, and captured some odd things here, but would like to compare to your environment.

  http://paste.openstack.org/show/37026/

Thanks,
Devananda

Revision history for this message
Jacob Cherkas (jcherkas) wrote :

Slow Query log from mysql that you requested.

Let me know if there is anything else that you need.

Thanks.

Revision history for this message
Hrushikesh (hrushikesh-gangur) wrote :
Download full text (9.8 KiB)

My postgres mem setting looks dandy! However, what am noticing here is if I de-provision all the launched instances, the navigation through various tabs are quick. If I keep launching VMs and at this point am on 100, the response time goes between 40-60 seconds. Same behavior is seen with nova list command. So, it looks a factor of instances or objects in the openstack.

nova list
2013-05-30 10:07:46.063 37206 INFO nova.osapi_compute.wsgi.server [-] (37206) accepted ('10.1.56.12', 40240)
2013-05-30 10:08:26.249 INFO nova.osapi_compute.wsgi.server [req-fa0cf5a9-270a-4a03-92fe-e0d124919e16 2efad4b253f64b4dae65a28f45438d93 4f342d62fff843fab63dc03316d34251] 10.1.56.12 "GET /v2/4f342d62fff843fab63dc03316d34251/servers/detail HTTP/1.1" status: 200 len: 153050 time: 40.1845958

Observation on nova list:
1. Overall response time to enumerate 100 launched instance takes 40 seconds
2. Throughout this 40 seconds duration, keystone-all is at 70% cpu
3. I took a count of keystone.token rows before and after I executed the command ‘nova list’. The counted bumped up from 46,671 to 46,884 (around 200 rows). Does it mean it is going to keystone for every instance enumeration.

This SQL query if manually executed takes 4321 ms.

SELECT anon_1.instances_created_at AS anon_1_instances_created_at, anon_1.instances_updated_at AS anon_1_instances_updated_at, anon_1.instances_deleted_at AS anon_1_instances_deleted_at, anon_1.instances_deleted AS anon_1_instances_deleted, anon_1.instances_id AS anon_1_instances_id, anon_1.instances_user_id AS anon_1_instances_user_id, anon_1.instances_project_id AS anon_1_instances_project_id, anon_1.instances_image_ref AS anon_1_instances_image_ref, anon_1.instances_kernel_id AS anon_1_instances_kernel_id, anon_1.instances_ramdisk_id AS anon_1_instances_ramdisk_id, anon_1.instances_hostname AS anon_1_instances_hostname, anon_1.instances_launch_index AS anon_1_instances_launch_index, anon_1.instances_key_name AS anon_1_instances_key_name, anon_1.instances_key_data AS anon_1_instances_key_data, anon_1.instances_power_state AS anon_1_instances_power_state, anon_1.instances_vm_state AS anon_1_instances_vm_state, anon_1.instances_task_state AS anon_1_instances_task_state, anon_1.instances_memory_mb AS anon_1_instances_memory_mb, anon_1.instances_vcpus AS anon_1_instances_vcpus, anon_1.instances_root_gb AS anon_1_instances_root_gb, anon_1.instances_ephemeral_gb AS anon_1_instances_ephemeral_gb, anon_1.instances_host AS anon_1_instances_host, anon_1.instances_node AS anon_1_instances_node, anon_1.instances_instance_type_id AS anon_1_instances_instance_type_id, anon_1.instances_user_data AS anon_1_instances_user_data, anon_1.instances_reservation_id AS anon_1_instances_reservation_id, anon_1.instances_scheduled_at AS anon_1_instances_scheduled_at, anon_1.instances_launched_at AS anon_1_instances_launched_at, anon_1.instances_terminated_at AS anon_1_instances_terminated_at, anon_1.instances_availability_zone AS anon_1_instances_availability_zone, anon_1.instances_display_name AS anon_1_instances_display_name, anon_1.instances_display_description AS anon_1_instances_display_description, anon_1.instances_launched_on AS ano...

Revision history for this message
Hrushikesh (hrushikesh-gangur) wrote :

Am sure this has something to do with inefficiency in nova (due to code or configuration) that is causing the overall response time getting sluggish. And, this being factor of no. of active VM instances within a project. 40 seconds response time was with 100 VM instances. Now, I launched around 500, and the reponse time bumped up to 190 seconds:

2013-06-03 13:57:07.481 DEBUG nova.api.openstack.wsgi [req-3da658aa-6a39-4995-8e6f-2d5c7912549e ac5e4da2c17e4f669f8d3e82d7b751dd 5a19956a849542869ce710b9e51439e0] No Content-Type provided in request get_body /usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py:791
2013-06-03 13:57:07.482 DEBUG nova.api.openstack.wsgi [req-3da658aa-6a39-4995-8e6f-2d5c7912549e ac5e4da2c17e4f669f8d3e82d7b751dd 5a19956a849542869ce710b9e51439e0] Calling method <bound method Controller.detail of <nova.api.openstack.compute.servers.Controller object at 0x4132850>> _process_stack /usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py:911
2013-06-03 13:57:07.483 DEBUG nova.compute.api [req-3da658aa-6a39-4995-8e6f-2d5c7912549e ac5e4da2c17e4f669f8d3e82d7b751dd 5a19956a849542869ce710b9e51439e0] Searching by: {'deleted': False, u'project_id': u'5a19956a849542869ce710b9e51439e0'} get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:1373
2013-06-03 14:00:15.336 INFO nova.osapi_compute.wsgi.server [req-3da658aa-6a39-4995-8e6f-2d5c7912549e ac5e4da2c17e4f669f8d3e82d7b751dd 5a19956a849542869ce710b9e51439e0] 10.1.56.12 "GET /v2/5a19956a849542869ce710b9e51439e0/servers/detail?project_id=5a19956a849542869ce710b9e51439e0 HTTP/1.1" status: 200 len: 536479 time: 187.857266

And, this was not the case in Essex. If I can get someone to help me out on profiling what it does during this API call, we can get to some conclusion. As this can go worst after I have 10,00 instances.

Revision history for this message
Ben Nemec (bnemec) wrote :

FWIW, I can't reproduce this on recent devstack with the fake virt driver. I booted 100 instances and this was the time for nova list:

real 0m2.386s
user 0m0.508s
sys 0m0.232s

I know there were a number of database optimization patches that went in at the beginning of Havana, so maybe one of those addresses this issue? More investigation is needed, but I wanted to post what I have found so far.

Revision history for this message
Phil Day (philip-day) wrote :

Are you running with Neutron and the Nova Security Group API extension ?

There are some performance issues in the way that extension (which is called as a side effect of server list) calls Neutron - it will get a list of all ports and security groups if the server list is >1.

If this is a match take a look at this change which Improves this in some cases : https://review.openstack.org/#/c/47651/

David Ripton (dripton)
tags: added: db
Alan Pevec (apevec)
tags: removed: grizzly-backport-potential
Revision history for this message
Joe Gordon (jogo) wrote :

marking as incomplete this bug is old and it appears to be resolved.

Changed in nova:
status: Triaged → Incomplete
Revision history for this message
Sean Dague (sdague) wrote :

Believe this is fixed old incomplete bug should be invalid

Changed in nova:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.