"nova usage" taking too much time with many VMs in database

Bug #1481262 reported by Antonio Messina
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Guillaume Espanel

Bug Description

Issue found on Kilo 2015.1.0 on Ubuntu Trusty (1:2015.1.0-0ubuntu1.1~cloud0) from http://ubuntu-cloud.archive.canonical.com/ubuntu

When running "nova usage" on a tenant that started many instances O(100k) during the current month, the following happens:

* nova-api is stuck at 100% for a long time
* as a consequence, nova CLI returns "ERROR (ConnectionRefused):
Unable to establish connection to ..."
* on MySQL slow query log I see there is a query like:

SELECT instance_system_metadata.created_at AS
instance_system_metadata_created_at,
instance_system_metadata.updated_at AS
instance_system_metadata_updated_at,
instance_system_metadata.deleted_at AS
instance_system_metadata_deleted_at, instance_system_metadata.deleted
AS instance_system_metadata_deleted, instance_system_metadata.id AS
instance_system_metadata_id, instance_system_metadata.`key` AS
instance_system_metadata_key, instance_system_metadata.value AS
instance_system_metadata_value, instance_system_metadata.instance_uuid
AS instance_system_metadata_instance_uuid
FROM instance_system_metadata
WHERE instance_system_metadata.deleted = 0 AND
 instance_system_metadata.instance_uuid IN (<list of ~100k UUID>)

which took 1.8 seconds.

Also, when logging in from Horizon, login is very slow, and I get an error
"Error: Unable to retrieve usage information.".

Changed in nova:
assignee: nobody → Zhenzan Zhou (zhenzan-zhou)
tags: added: db
tags: added: performance
Revision history for this message
Cale Rath (ctrath) wrote :

Have prior instance been "deleted"? When this occurs, the actual data is not removed from the DB, but is soft deleted. There's a patch here that hasn't landed yet to purge soft-deleted instance data: https://review.openstack.org/#/c/203751/

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Zhenzan Zhou:

Are you still actively working on a patch for this bug? If "yes", please provide a patch in Gerrit in the near future, if "no", please remove yourself as assignee.

Changed in nova:
assignee: Zhenzan Zhou (zhenzan-zhou) → nobody
Sean Dague (sdague)
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
stgleb (gstepanov) wrote :

Could you provide table dump? It will allow me reproduce your problem without annoying creating/deleting instances on
my enviroment.

Changed in nova:
assignee: nobody → stgleb (gstepanov)
Sean Dague (sdague)
Changed in nova:
assignee: stgleb (gstepanov) → nobody
Revision history for this message
Attila Fazekas (afazekas) wrote :

The situation can be even worse with the usage-list call (all tenant),
it can permanently grow the memory allocated by the n-api processes by a huge extend (multiple Gigabytes, each worker).

1. The aggregation should be done on the DB side.
2. n-api should not ever to fetch more then osapi_max_limit of things ever.
3. some these statics should be handled by the telemetry service or depending on service which consuming the telemetry data, instead of having nova to this job.

Revision history for this message
Attila Fazekas (afazekas) wrote :

The situation can be even worse with the usage-list call (all tenant),
it can permanently grow the memory allocated by the n-api processes by a huge extend (multiple Gigabytes, each worker).

1. The aggregation should be done on the DB side.
2. n-api should not ever fetch more than osapi_max_limit of things
3. most of these statistics should be handled by the telemetry service or depending on service which consuming the telemetry data, instead of having nova to (re)do this job.

Changed in nova:
assignee: nobody → Guillaume Espanel (guillaume-espanel)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/343734

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version kilo in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/343734
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.