OpenStack Compute (nova)

os-simple-tenant-usage performs poorly with many instances

Bug #1421471 reported by Richard Jones on 2015-02-13

This bug report is a duplicate of: Bug #1485025: The simple-tenant-usage API should pull instance flavor attrs rather than system_metadata now. Edit Remove

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Confirmed	Wishlist	Diana Clarke

Bug Description

The SQL underlying the os-simple-tenant-usage API call results in very slow operations when the database has many (20,000+) instances. In testing, the objects.InstanceList.get_active_by_window_joined call in nova/api/openstack/compute/contrib/simple_tenant_usage.py:SimpleTenantUsageController._tenant_usages_for_period takes 24 seconds to run.

Some basic timing analysis has shown that the initial query in nova/db/sqlalchemy/api.py:instance_get_active_by_window_joined runs in *reasonable* time (though still 5-6 seconds) and the bulk of the time is spent in the subsequent _instances_fill_metadata call which pulls in system_metadata info by using a SELECT with an IN clause containing the 20,000 uuids listed, resulting in execution times over 15 seconds.

Tony Breeds (o-tony) on 2015-02-13

Changed in nova:
status:	New → Confirmed

Revision history for this message

Joe Gordon (jogo) wrote on 2015-02-13:

If we can fix some low hanging fruit here that is great, but the name simple-tenant-usage says it all, this isn't a feature that should be used in production.

Changed in nova:
importance:	Undecided → Wishlist

Revision history for this message

Tony Breeds (o-tony) wrote on 2015-02-14:

okay the problem is that it's used by horizon. to show the stats on the login page. So while there may have been an intent for it to be niche it's being used a lot ("Build it and they will come" I guess).

So we need to see what can be done here. The real solution may be a different API for Liberty and if that's the case knowing that ASAP is a good thin (TM)

Revision history for this message

Richard Jones (r1chardj0n3s) wrote on 2015-02-15:

I'm afraid this can't be marked "wishlist" - it has a direct impact on users of Horizon. Or, we just accept that simple-tenant-usage is irredeemably broken, and write new API call for Horizon to consume :)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-25: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/159062

Changed in nova:
assignee:	nobody → Ankit Agrawal (ankitagrawal)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-12: Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/159062
Reason: This patch as been stalled for a very long time, so I am going to abandon it to keep the review queue sane. Please restore the change when its ready for review.

Davanum Srinivas (DIMS) (dims-v) on 2016-03-07

Changed in nova:
assignee:	Ankit Agrawal (ankitagrawal) → nobody
status:	In Progress → Confirmed

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2016-05-18:

It's been a while since the performance was measured and there is not activity around this bug report. I'm closing it as "Opinion". If this issue is still observed with the latest release, then the report can be reopended.

Changed in nova:
status:	Confirmed → Opinion

Revision history for this message

Tony Breeds (o-tony) wrote on 2016-06-01:

Confirmed with origin/master SHA:ced89e7b26b3cff323852e1d8a9c6db80334f4dd

Changed in nova:
status:	Opinion → Confirmed

Diana Clarke (diana-clarke) on 2016-10-04

Changed in nova:
assignee:	nobody → Diana Clarke (diana-clarke)

Revision history for this message

Matt Riedemann (mriedem) wrote on 2016-10-17:

Hmm, this bug says it's spending time doing the joins on the system_metadata table, but that should have been resolved with bug 1485025 and fix https://review.openstack.org/#/c/213340/ so that we're only loading up the instance_extra/flavor information, as the REST API code doesn't need system_metadata for the flavors (assuming you're instances have been migrated past kilo where flavors were moved out of the instance_system_metadata table and into instance_extra). That was fixed in liberty.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2016-10-17:

Given this bug was reported before https://review.openstack.org/#/c/213340/ landed then you wouldn't have that fix, but it would be useful to know if it resolves your issue.

Revision history for this message

Diana Clarke (diana-clarke) wrote on 2016-10-17:

#10

Yes, before proposing pagination for these endpoints I spent some time profiling the current queries generated by the simple tenant usage endpoints, and can confirm that they were significantly improved since this bug was initially reported.

That said, 1 tenant with 20,000+ instances is still going to be problematic without paging of some kind unless the server_usages details (via detailed=1) are removed from the API response and the aggregation is moved to the SQL (with a GROUP BY tenant_id clause).

As of stable/newton, the query generated looks like this (note: I replaced the individual fields with stars for brevity):

SELECT instances.*, instance_extra.*
FROM instances
LEFT OUTER JOIN instance_extra ON instance_extra.instance_uuid = instances.uuid
WHERE (instances.terminated_at IS NULL OR instances.terminated_at > '2016-09-28 21:02:51') AND instances.launched_at < '2016-09-28 21:02:51';

Revision history for this message

Richard Jones (r1chardj0n3s) wrote on 2016-10-17:

#11

I don't have the system available for testing this bug out any longer - I'll have to re-investigate setting up a 20,000 instance setup to re-test, which I'll add to my TODO.

As I noted on the proposed fix patch https://review.openstack.org/213340 the usage of this API in Horizon is for summary purposes only - we count the results (for quota and usage summary display). This is in the absence of a more appropriate API call.

I will look into re-testing my scenario and check the performance of the Horizon page in question, and file a followup bug which is more specific about the problem if necessary.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1485025 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.