OpenStack Compute (nova)

"nova usage" taking too much time with many VMs in database

Bug #1481262 reported by Antonio Messina on 2015-08-04

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	In Progress	Medium	Guillaume Espanel

Bug Description

Issue found on Kilo 2015.1.0 on Ubuntu Trusty (1:2015.1.0-0ubuntu1.1~cloud0) from http://ubuntu-cloud.archive.canonical.com/ubuntu

When running "nova usage" on a tenant that started many instances O(100k) during the current month, the following happens:

* nova-api is stuck at 100% for a long time
* as a consequence, nova CLI returns "ERROR (ConnectionRefused):
Unable to establish connection to ..."
* on MySQL slow query log I see there is a query like:

SELECT instance_system_metadata.created_at AS
instance_system_metadata_created_at,
instance_system_metadata.updated_at AS
instance_system_metadata_updated_at,
instance_system_metadata.deleted_at AS
instance_system_metadata_deleted_at, instance_system_metadata.deleted
AS instance_system_metadata_deleted, instance_system_metadata.id AS
instance_system_metadata_id, instance_system_metadata.`key` AS
instance_system_metadata_key, instance_system_metadata.value AS
instance_system_metadata_value, instance_system_metadata.instance_uuid
AS instance_system_metadata_instance_uuid
FROM instance_system_metadata
WHERE instance_system_metadata.deleted = 0 AND
instance_system_metadata.instance_uuid IN (<list of ~100k UUID>)

which took 1.8 seconds.

Also, when logging in from Horizon, login is very slow, and I get an error
"Error: Unable to retrieve usage information.".

Tags:

Zhenzan Zhou (zhenzan-zhou) on 2015-08-05

Changed in nova:
assignee:	nobody → Zhenzan Zhou (zhenzan-zhou)

John Garbutt (johngarbutt) on 2015-08-05

tags:	added: db
tags:	added: performance

Revision history for this message

Cale Rath (ctrath) wrote on 2015-09-24:

Have prior instance been "deleted"? When this occurs, the actual data is not removed from the DB, but is soft deleted. There's a patch here that hasn't landed yet to purge soft-deleted instance data: https://review.openstack.org/#/c/203751/

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2015-12-01:

@Zhenzan Zhou:

Are you still actively working on a patch for this bug? If "yes", please provide a patch in Gerrit in the near future, if "no", please remove yourself as assignee.

Zhenzan Zhou (zhenzan-zhou) on 2015-12-01

Changed in nova:
assignee:	Zhenzan Zhou (zhenzan-zhou) → nobody

Sean Dague (sdague) on 2016-02-17

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

stgleb (gstepanov) wrote on 2016-02-26:

Could you provide table dump? It will allow me reproduce your problem without annoying creating/deleting instances on
my enviroment.

Changed in nova:
assignee:	nobody → stgleb (gstepanov)

Sean Dague (sdague) on 2016-04-18

Changed in nova:
assignee:	stgleb (gstepanov) → nobody

Revision history for this message

Attila Fazekas (afazekas) wrote on 2016-06-02:

The situation can be even worse with the usage-list call (all tenant),
it can permanently grow the memory allocated by the n-api processes by a huge extend (multiple Gigabytes, each worker).

1. The aggregation should be done on the DB side.
2. n-api should not ever to fetch more then osapi_max_limit of things ever.
3. some these statics should be handled by the telemetry service or depending on service which consuming the telemetry data, instead of having nova to this job.

Revision history for this message

Attila Fazekas (afazekas) wrote on 2016-06-02:

The situation can be even worse with the usage-list call (all tenant),
it can permanently grow the memory allocated by the n-api processes by a huge extend (multiple Gigabytes, each worker).

1. The aggregation should be done on the DB side.
2. n-api should not ever fetch more than osapi_max_limit of things
3. most of these statistics should be handled by the telemetry service or depending on service which consuming the telemetry data, instead of having nova to (re)do this job.

Guillaume Espanel (guillaume-espanel) on 2016-07-18

Changed in nova:
assignee:	nobody → Guillaume Espanel (guillaume-espanel)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-18: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/343734

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Sean Dague (sdague) wrote on 2017-06-27:

Automatically discovered version kilo in description. If this is incorrect, please update the description to include 'nova version: ...'

tags:

added: openstack-version.kilo

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-08-01: Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/343734
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.