Bug #1176446 “nova list as admin is slow (no vms)” : Bugs : OpenStack Compute (nova)

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2013-05-08:

#1

Do you have a large database? I'm guessing that we lost our index on deleted.

Changed in nova:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2013-05-08:

#2

yes it looks like the deleted index is gone.

tags:

added: grizzly-backport-potential

Revision history for this message

Jacob Cherkas (jcherkas) wrote on 2013-05-08:

#3

Thanks Vish.

Can we re-index on deleted?

Can you provide the sql command so I dont make it worse?

Thanks.

Revision history for this message

Jacob Cherkas (jcherkas) wrote on 2013-05-08:

#4

I am under the assumption you are referring to the instances table:
+-----------+------------+-----------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| instances | 0 | PRIMARY | 1 | id | A | 2762 | NULL | NULL | | BTREE | | |
| instances | 0 | uuid | 1 | uuid | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | project_id | 1 | project_id | A | 251 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_host_deleted_idx | 1 | host | A | 69 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_reservation_id_idx | 1 | reservation_id | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_terminated_at_launched_at_idx | 1 | terminated_at | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_terminated_at_launched_at_idx | 2 | launched_at | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_uuid_deleted_idx | 1 | uuid | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_task_state_updated_at_idx | 1 | task_state | A | 2 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_task_state_updated_at_idx | 2 | updated_at | A | 2762 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_host_node_deleted_idx | 1 | host | A | 49 | NULL | NULL | YES | BTREE | | |
| instances | 1 | instances_host_node_deleted_idx | 2 | node | A | 98 | NULL | NULL | YES | BTREE | | |
+-----------+------------+-----------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
12 rows in set (0.01 sec)

I am under the assumption you are referring to the instances table:
+-----------+------------+-----------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name                                | Seq_in_index | Column_name    | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| instances |          0 | PRIMARY                                 |            1 | id             | A         |        2762 |     NULL | NULL   |      | BTREE      |         |               |
| instances |          0 | uuid                                    |            1 | uuid           | A         |        2762 |     NULL | NULL   | YES  | BTREE      |         |               |
| instances |          1 | project_id                              |            1 | project_id     | A         |         251 |     NULL | NULL   | YES  | BTREE      |         |               |
| instances |          1 | instances_host_deleted_idx              |            1 | host           | A         |          69 |     NULL | NULL   | YES  | BTREE      |         |               |
| instances |          1 | instances_reservation_id_idx            |            1 | reservation_id | A         |        2762 |     NULL | NULL   | YES  | BTREE      |         |               |
| instances |          1 | instances_terminated_at_launched_at_idx |            1 | terminated_at  | A         |        2762 |     NULL | NULL   | YES  | BTREE      |         |               |
| instances |          1 | instances_terminated_at_launched_at_idx |            2 | launched_at    | A         |        2762 |     NULL | NULL   | YES  | BTREE      |         |               |
| instances |          1 | instances_uuid_deleted_idx              |            1 | uuid           | A         |        2762 |     NULL | NULL   | YES  | BTREE      |         |               |
| instances |          1 | instances_task_state_updated_at_idx     |            1 | task_state     | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |               |
| instances |          1 | instances_task_state_updated_at_idx     |            2 | updated_at     | A         |        2762 |     NULL | NULL   | YES  | BTREE      |         |               |
| instances |          1 | instances_host_node_deleted_idx         |            1 | host           | A         |          49 |     NULL | NULL   | YES  | BTREE      |         |               |
| instances |          1 | instances_host_node_deleted_idx         |            2 | node           | A         |          98 |     NULL | NULL   | YES  | BTREE      |         |               |
+-----------+------------+-----------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
12 rows in set (0.01 sec)

Revision history for this message

aeva black (tenbrae) wrote on 2013-05-09:

#5

Jacob,

Could you attach a sample of the SQL queries generated by the slow "nova list" and "nova show" commands? You can get this by temporarily setting mysql's long_query_time to 0 (assuming you have slow log enabled), eg.

  SET GLOBAL long_query_time=0;
  # run nova list
  SET GLOBAL long_query_time=2; # or what ever your environment's default is

I did this with a local test deployment from trunk, and captured some odd things here, but would like to compare to your environment.

http://paste.openstack.org/show/37026/

Thanks,
Devananda

Revision history for this message

Jacob Cherkas (jcherkas) wrote on 2013-05-09:

#6

mysql-slow-query-log Edit (10.3 MiB, text/plain)

Slow Query log from mysql that you requested.

Let me know if there is anything else that you need.

Thanks.

Revision history for this message

Hrushikesh (hrushikesh-gangur) wrote on 2013-05-31:

#7

Download full text (9.8 KiB)

My postgres mem setting looks dandy! However, what am noticing here is if I de-provision all the launched instances, the navigation through various tabs are quick. If I keep launching VMs and at this point am on 100, the response time goes between 40-60 seconds. Same behavior is seen with nova list command. So, it looks a factor of instances or objects in the openstack.

nova list
2013-05-30 10:07:46.063 37206 INFO nova.osapi_compute.wsgi.server [-] (37206) accepted ('10.1.56.12', 40240)
2013-05-30 10:08:26.249 INFO nova.osapi_compute.wsgi.server [req-fa0cf5a9-270a-4a03-92fe-e0d124919e16 2efad4b253f64b4dae65a28f45438d93 4f342d62fff843fab63dc03316d34251] 10.1.56.12 "GET /v2/4f342d62fff843fab63dc03316d34251/servers/detail HTTP/1.1" status: 200 len: 153050 time: 40.1845958

Observation on nova list:
1. Overall response time to enumerate 100 launched instance takes 40 seconds
2. Throughout this 40 seconds duration, keystone-all is at 70% cpu
3. I took a count of keystone.token rows before and after I executed the command ‘nova list’. The counted bumped up from 46,671 to 46,884 (around 200 rows). Does it mean it is going to keystone for every instance enumeration.

This SQL query if manually executed takes 4321 ms.

SELECT anon_1.instances_created_at AS anon_1_instances_created_at, anon_1.instances_updated_at AS anon_1_instances_updated_at, anon_1.instances_deleted_at AS anon_1_instances_deleted_at, anon_1.instances_deleted AS anon_1_instances_deleted, anon_1.instances_id AS anon_1_instances_id, anon_1.instances_user_id AS anon_1_instances_user_id, anon_1.instances_project_id AS anon_1_instances_project_id, anon_1.instances_image_ref AS anon_1_instances_image_ref, anon_1.instances_kernel_id AS anon_1_instances_kernel_id, anon_1.instances_ramdisk_id AS anon_1_instances_ramdisk_id, anon_1.instances_hostname AS anon_1_instances_hostname, anon_1.instances_launch_index AS anon_1_instances_launch_index, anon_1.instances_key_name AS anon_1_instances_key_name, anon_1.instances_key_data AS anon_1_instances_key_data, anon_1.instances_power_state AS anon_1_instances_power_state, anon_1.instances_vm_state AS anon_1_instances_vm_state, anon_1.instances_task_state AS anon_1_instances_task_state, anon_1.instances_memory_mb AS anon_1_instances_memory_mb, anon_1.instances_vcpus AS anon_1_instances_vcpus, anon_1.instances_root_gb AS anon_1_instances_root_gb, anon_1.instances_ephemeral_gb AS anon_1_instances_ephemeral_gb, anon_1.instances_host AS anon_1_instances_host, anon_1.instances_node AS anon_1_instances_node, anon_1.instances_instance_type_id AS anon_1_instances_instance_type_id, anon_1.instances_user_data AS anon_1_instances_user_data, anon_1.instances_reservation_id AS anon_1_instances_reservation_id, anon_1.instances_scheduled_at AS anon_1_instances_scheduled_at, anon_1.instances_launched_at AS anon_1_instances_launched_at, anon_1.instances_terminated_at AS anon_1_instances_terminated_at, anon_1.instances_availability_zone AS anon_1_instances_availability_zone, anon_1.instances_display_name AS anon_1_instances_display_name, anon_1.instances_display_description AS anon_1_instances_display_description, anon_1.instances_launched_on AS ano...

My postgres mem setting looks dandy! However, what am noticing here is if I de-provision all the launched instances, the navigation through various tabs are quick. If I keep launching VMs and at this point am on 100, the response time goes between 40-60 seconds. Same behavior is seen with nova list command. So, it looks a factor of instances or objects in the openstack.

nova list
2013-05-30 10:07:46.063 37206 INFO nova.osapi_compute.wsgi.server [-] (37206) accepted ('10.1.56.12', 40240)
2013-05-30 10:08:26.249 INFO nova.osapi_compute.wsgi.server [req-fa0cf5a9-270a-4a03-92fe-e0d124919e16 2efad4b253f64b4dae65a28f45438d93 4f342d62fff843fab63dc03316d34251] 10.1.56.12 "GET /v2/4f342d62fff843fab63dc03316d34251/servers/detail HTTP/1.1" status: 200 len: 153050 time: 40.1845958

Observation on nova list:
1.       Overall response time to enumerate 100 launched instance takes 40 seconds
2.       Throughout this 40 seconds duration, keystone-all is at 70% cpu
3.       I took a count of keystone.token rows before and after I executed the command ‘nova list’. The counted bumped up from 46,671 to 46,884 (around 200 rows). Does it mean it is going to keystone for every instance enumeration.

This SQL query if manually executed takes 4321 ms.

SELECT anon_1.instances_created_at AS anon_1_instances_created_at, anon_1.instances_updated_at AS anon_1_instances_updated_at, anon_1.instances_deleted_at AS anon_1_instances_deleted_at, anon_1.instances_deleted AS anon_1_instances_deleted, anon_1.instances_id AS anon_1_instances_id, anon_1.instances_user_id AS anon_1_instances_user_id, anon_1.instances_project_id AS anon_1_instances_project_id, anon_1.instances_image_ref AS anon_1_instances_image_ref, anon_1.instances_kernel_id AS anon_1_instances_kernel_id, anon_1.instances_ramdisk_id AS anon_1_instances_ramdisk_id, anon_1.instances_hostname AS anon_1_instances_hostname, anon_1.instances_launch_index AS anon_1_instances_launch_index, anon_1.instances_key_name AS anon_1_instances_key_name, anon_1.instances_key_data AS anon_1_instances_key_data, anon_1.instances_power_state AS anon_1_instances_power_state, anon_1.instances_vm_state AS anon_1_instances_vm_state, anon_1.instances_task_state AS anon_1_instances_task_state, anon_1.instances_memory_mb AS anon_1_instances_memory_mb, anon_1.instances_vcpus AS anon_1_instances_vcpus, anon_1.instances_root_gb AS anon_1_instances_root_gb, anon_1.instances_ephemeral_gb AS anon_1_instances_ephemeral_gb, anon_1.instances_host AS anon_1_instances_host, anon_1.instances_node AS anon_1_instances_node, anon_1.instances_instance_type_id AS anon_1_instances_instance_type_id, anon_1.instances_user_data AS anon_1_instances_user_data, anon_1.instances_reservation_id AS anon_1_instances_reservation_id, anon_1.instances_scheduled_at AS anon_1_instances_scheduled_at, anon_1.instances_launched_at AS anon_1_instances_launched_at, anon_1.instances_terminated_at AS anon_1_instances_terminated_at, anon_1.instances_availability_zone AS anon_1_instances_availability_zone, anon_1.instances_display_name AS anon_1_instances_display_name, anon_1.instances_display_description AS anon_1_instances_display_description, anon_1.instances_launched_on AS anon_1_instances_launched_on, anon_1.instances_locked AS anon_1_instances_locked, anon_1.instances_os_type AS anon_1_instances_os_type, anon_1.instances_architecture AS anon_1_instances_architecture, anon_1.instances_vm_mode AS anon_1_instances_vm_mode, anon_1.instances_uuid AS anon_1_instances_uuid, anon_1.instances_root_device_name AS anon_1_instances_root_device_name, anon_1.instances_default_ephemeral_device AS anon_1_instances_default_ephemeral_device, anon_1.instances_default_swap_device AS anon_1_instances_default_swap_device, anon_1.instances_config_drive AS anon_1_instances_config_drive, anon_1.instances_access_ip_v4 AS anon_1_instances_access_ip_v4, anon_1.instances_access_ip_v6 AS anon_1_instances_access_ip_v6, anon_1.instances_auto_disk_config AS anon_1_instances_auto_disk_config, anon_1.instances_progress AS anon_1_instances_progress, anon_1.instances_shutdown_terminate AS anon_1_instances_shutdown_terminate, anon_1.instances_disable_terminate AS anon_1_instances_disable_terminate, anon_1.instances_cell_name AS anon_1_instances_cell_name, instance_metadata_1.created_at AS instance_metadata_1_created_at, instance_metadata_1.updated_at AS instance_metadata_1_updated_at, instance_metadata_1.deleted_at AS instance_metadata_1_deleted_at, instance_metadata_1.deleted AS instance_metadata_1_deleted, instance_metadata_1.id AS instance_metadata_1_id, instance_metadata_1.key AS instance_metadata_1_key, instance_metadata_1.value AS instance_metadata_1_value, instance_metadata_1.instance_uuid AS instance_metadata_1_instance_uuid, security_groups_1.created_at AS security_groups_1_created_at, security_groups_1.updated_at AS security_groups_1_updated_at, security_groups_1.deleted_at AS security_groups_1_deleted_at, security_groups_1.deleted AS security_groups_1_deleted, security_groups_1.id AS security_groups_1_id, security_groups_1.name AS security_groups_1_name, security_groups_1.description AS security_groups_1_description, security_groups_1.user_id AS security_groups_1_user_id, security_groups_1.project_id AS security_groups_1_project_id, instance_system_metadata_1.created_at AS instance_system_metadata_1_created_at, instance_system_metadata_1.updated_at AS instance_system_metadata_1_updated_at, instance_system_metadata_1.deleted_at AS instance_system_metadata_1_deleted_at, instance_system_metadata_1.deleted AS instance_system_metadata_1_deleted, instance_system_metadata_1.id AS instance_system_metadata_1_id, instance_system_metadata_1.key AS instance_system_metadata_1_key, instance_system_metadata_1.value AS instance_system_metadata_1_value, instance_system_metadata_1.instance_uuid AS instance_system_metadata_1_instance_uuid, instance_info_caches_1.created_at AS instance_info_caches_1_created_at, instance_info_caches_1.updated_at AS instance_info_caches_1_updated_at, instance_info_caches_1.deleted_at AS instance_info_caches_1_deleted_at, instance_info_caches_1.deleted AS instance_info_caches_1_deleted, instance_info_caches_1.id AS instance_info_caches_1_id, instance_info_caches_1.network_info AS instance_info_caches_1_network_info, instance_info_caches_1.instance_uuid AS instance_info_caches_1_instance_uuid
        FROM (SELECT instances.created_at AS instances_created_at, instances.updated_at AS instances_updated_at, instances.deleted_at AS instances_deleted_at, instances.deleted AS instances_deleted, instances.id AS instances_id, instances.user_id AS instances_user_id, instances.project_id AS instances_project_id, instances.image_ref AS instances_image_ref, instances.kernel_id AS instances_kernel_id, instances.ramdisk_id AS instances_ramdisk_id, instances.hostname AS instances_hostname, instances.launch_index AS instances_launch_index, instances.key_name AS instances_key_name, instances.key_data AS instances_key_data, instances.power_state AS instances_power_state, instances.vm_state AS instances_vm_state, instances.task_state AS instances_task_state, instances.memory_mb AS instances_memory_mb, instances.vcpus AS instances_vcpus, instances.root_gb AS instances_root_gb, instances.ephemeral_gb AS instances_ephemeral_gb, instances.host AS instances_host, instances.node AS instances_node, instances.instance_type_id AS instances_instance_type_id, instances.user_data AS instances_user_data, instances.reservation_id AS instances_reservation_id, instances.scheduled_at AS instances_scheduled_at, instances.launched_at AS instances_launched_at, instances.terminated_at AS instances_terminated_at, instances.availability_zone AS instances_availability_zone, instances.display_name AS instances_display_name, instances.display_description AS instances_display_description, instances.launched_on AS instances_launched_on, instances.locked AS instances_locked, instances.os_type AS instances_os_type, instances.architecture AS instances_architecture, instances.vm_mode AS instances_vm_mode, instances.uuid AS instances_uuid, instances.root_device_name AS instances_root_device_name, instances.default_ephemeral_device AS instances_default_ephemeral_device, instances.default_swap_device AS instances_default_swap_device, instances.config_drive AS instances_config_drive, instances.access_ip_v4 AS instances_access_ip_v4, instances.access_ip_v6 AS instances_access_ip_v6, instances.auto_disk_config AS instances_auto_disk_config, instances.progress AS instances_progress, instances.shutdown_terminate AS instances_shutdown_terminate, instances.disable_terminate AS instances_disable_terminate, instances.cell_name AS instances_cell_name
        FROM instances
        WHERE instances.deleted = 0 AND instances.vm_state != 'soft-delete' AND instances.project_id = '4f342d62fff843fab63dc03316d34251' ORDER BY instances.created_at DESC, instances.created_at DESC, instances.created_at DESC, instances.id DESC
         LIMIT 1000) AS anon_1 LEFT OUTER JOIN instance_metadata AS instance_metadata_1 ON instance_metadata_1.instance_uuid = anon_1.instances_uuid AND instance_metadata_1.deleted = 0 LEFT OUTER JOIN security_group_instance_association AS security_group_instance_association_1 ON security_group_instance_association_1.instance_uuid = anon_1.instances_uuid AND anon_1.instances_deleted = 0 LEFT OUTER JOIN security_groups AS security_groups_1 ON security_groups_1.id = security_group_instance_association_1.security_group_id AND security_group_instance_association_1.deleted = 0 AND security_groups_1.deleted = 0 LEFT OUTER JOIN instance_system_metadata AS instance_system_metadata_1 ON instance_system_metadata_1.instance_uuid = anon_1.instances_uuid AND instance_system_metadata_1.deleted = 0 LEFT OUTER JOIN instance_info_caches AS instance_info_caches_1 ON instance_info_caches_1.instance_uuid = anon_1.instances_uuid ORDER BY anon_1.instances_created_at DESC, anon_1.instances_created_at DESC, anon_1.instances_created_at DESC, anon_1.instances_id DESC

Revision history for this message

Hrushikesh (hrushikesh-gangur) wrote on 2013-06-04:

#8

Am sure this has something to do with inefficiency in nova (due to code or configuration) that is causing the overall response time getting sluggish. And, this being factor of no. of active VM instances within a project. 40 seconds response time was with 100 VM instances. Now, I launched around 500, and the reponse time bumped up to 190 seconds:

2013-06-03 13:57:07.481 DEBUG nova.api.openstack.wsgi [req-3da658aa-6a39-4995-8e6f-2d5c7912549e ac5e4da2c17e4f669f8d3e82d7b751dd 5a19956a849542869ce710b9e51439e0] No Content-Type provided in request get_body /usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py:791
2013-06-03 13:57:07.482 DEBUG nova.api.openstack.wsgi [req-3da658aa-6a39-4995-8e6f-2d5c7912549e ac5e4da2c17e4f669f8d3e82d7b751dd 5a19956a849542869ce710b9e51439e0] Calling method <bound method Controller.detail of <nova.api.openstack.compute.servers.Controller object at 0x4132850>> _process_stack /usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py:911
2013-06-03 13:57:07.483 DEBUG nova.compute.api [req-3da658aa-6a39-4995-8e6f-2d5c7912549e ac5e4da2c17e4f669f8d3e82d7b751dd 5a19956a849542869ce710b9e51439e0] Searching by: {'deleted': False, u'project_id': u'5a19956a849542869ce710b9e51439e0'} get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:1373
2013-06-03 14:00:15.336 INFO nova.osapi_compute.wsgi.server [req-3da658aa-6a39-4995-8e6f-2d5c7912549e ac5e4da2c17e4f669f8d3e82d7b751dd 5a19956a849542869ce710b9e51439e0] 10.1.56.12 "GET /v2/5a19956a849542869ce710b9e51439e0/servers/detail?project_id=5a19956a849542869ce710b9e51439e0 HTTP/1.1" status: 200 len: 536479 time: 187.857266

And, this was not the case in Essex. If I can get someone to help me out on profiling what it does during this API call, we can get to some conclusion. As this can go worst after I have 10,00 instances.

Revision history for this message

Ben Nemec (bnemec) wrote on 2013-07-09:

#9

FWIW, I can't reproduce this on recent devstack with the fake virt driver. I booted 100 instances and this was the time for nova list:

real 0m2.386s
user 0m0.508s
sys 0m0.232s

I know there were a number of database optimization patches that went in at the beginning of Havana, so maybe one of those addresses this issue? More investigation is needed, but I wanted to post what I have found so far.

Revision history for this message

Phil Day (philip-day) wrote on 2013-09-25:

#10

Are you running with Neutron and the Nova Security Group API extension ?

There are some performance issues in the way that extension (which is called as a side effect of server list) calls Neutron - it will get a list of all ports and security groups if the server list is >1.

If this is a match take a look at this change which Improves this in some cases : https://review.openstack.org/#/c/47651/

David Ripton (dripton) on 2013-11-06

tags:

added: db

Alan Pevec (apevec) on 2014-03-30

tags:

removed: grizzly-backport-potential

Revision history for this message

Joe Gordon (jogo) wrote on 2014-06-19:

#11

marking as incomplete this bug is old and it appears to be resolved.

Changed in nova:
status:	Triaged → Incomplete

Revision history for this message

Sean Dague (sdague) wrote on 2014-09-04:

#12

Believe this is fixed old incomplete bug should be invalid

Changed in nova:
status:	Incomplete → Invalid

OpenStack Compute (nova)

nova list as admin is slow (no vms)

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches