The time of resource-list is too long

Bug #1264434 reported by Liusheng on 2013-12-27
44
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
Medium
gordon chung

Bug Description

In my all-in-one Havana environment,and ceilometer storage configured as mysql.
When using"time ceilometer meter-list", 193 meters listed in 1.275s:

http://paste.openstack.org/show/57313/

When using "time ceilometer resource-list", 40 resource listed in 2m14.775s:

http://paste.openstack.org/show/57319/

I have searched around the process of listing resource,and I fond there are time-consuming traversal of list.

I think the process should be optimized

Liusheng (liusheng) on 2013-12-27
Changed in ceilometer:
assignee: nobody → Liusheng (liusheng)
Liusheng (liusheng) on 2013-12-27
description: updated
Julien Danjou (jdanjou) on 2013-12-27
Changed in ceilometer:
status: New → Triaged
importance: Undecided → Medium
David Hill (david-hill-ubisoft) wrote :

We are experiencing the same issue with ceilometer and mysql backend.

# ceilometer resource-list
nError communicating with https://obfuscated_url:8777/ The read operation timed out

We only have 899 meters.

Dave

Al Bailey (albailey1974) wrote :

I am seeing the same performance issues using postgres backend.

The horizon dashboard "Resource Usage" (admin/metering/view.py) queries the resource list to calculate its graphs. The time to render those graphs is comparable to the time to invoke ceilometer resource-list.

In the ceilometer code to query the resource list, it populates the resource_links attribute with a second SQL query per resource.

ZhiQiang Fan (aji-zqfan) wrote :

In my all-in-one havana 2013.2.2.dev on sles 11 sp3 with mongodb as backend storage

here is my output:

# time ceilometer meter-list | wc -l
168

real 0m0.554s
user 0m0.352s
sys 0m0.028s

# time ceilometer resource-list | wc -l
37

real 0m3.089s
user 0m0.300s
sys 0m0.040s

The average is 3.089/37=0.083486s

# time ceilometer sample-list -m instance | wc -l
3921

real 0m7.610s
user 0m3.380s
sys 0m0.052s

The average of sample list is 0.761/3921=0.00194s

I think the problem may be caused by improper config option or in the impl_sqlalchemy implementation

as you mentioned that: "I fond there are time-consuming traversal of list." can you provide more detail information
or give a link points to specific line of code?

Thanks

ZhiQiang Fan (aji-zqfan) wrote :

yes, I know your output is based on mysql, I just provide a mongo output for comparison,

thanks

gordon chung (chungg) wrote :

have you tried an icehouse build? we did change ceilometer resource-list to drop some code which pulled down performance (https://review.openstack.org/#/c/65671)

that said, at quick glance, i still think we can improve the query that exists for resource-list

Liusheng (liusheng) wrote :

gordon chung ,thanks ,yes, I have noticed that. the change has improved the performance of resource-list.

but the time of resource-list is still too long. as you said,we can improve the query.

Eoghan Glynn (eglynn) wrote :

Note the suggestion I've made here pertaining to the performance of the mongodb driver:

  https://bugs.launchpad.net/ceilometer/+bug/1288372

could also potentially be applied to the sqlalchemy driver.

gordon chung (chungg) wrote :

i think i have a better query for sql - testing this now.

Fix proposed to branch: master
Review: https://review.openstack.org/80343

Changed in ceilometer:
assignee: Liusheng (liusheng) → gordon chung (chungg)
status: Triaged → In Progress
gordon chung (chungg) wrote :

i've uploaded a patch that should fix performance issue to make it a bit more bearable now.... i saw about a 60-70% reduction on my machine.

the resource-list query can actually return in seconds with over a million samples. the reason it takes so long is the api currently builds a list of url links for related meters to resource. this accounts for almost all the time. i think we should change the behaviour so by default it doesn't generate these links.

Liusheng (liusheng) wrote :

thanks for the patch

Julien Danjou (jdanjou) on 2014-03-20
Changed in ceilometer:
milestone: none → icehouse-rc1

Reviewed: https://review.openstack.org/80343
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=13e2ebcfb0275347261ef631fe5ecad46c92d533
Submitter: Jenkins
Branch: master

commit 13e2ebcfb0275347261ef631fe5ecad46c92d533
Author: Gordon Chung <email address hidden>
Date: Thu Mar 13 13:00:58 2014 -0400

    improve performance of resource-list in sql

    simplify the query to retrieve list of resources. it is much simpler
    to query against raw Sample data then to join against Resource table.

    also, add meter_links param to allow ability to disable link generation of
    relate meters.

    Change-Id: Ia4ceafc568c4a2e4c8e6b7586511135627b9335e
    Closes-Bug: #1264434
    Implements: blueprint big-data-sql

Changed in ceilometer:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2014-03-31
Changed in ceilometer:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2014-04-17
Changed in ceilometer:
milestone: icehouse-rc1 → 2014.1
Matt Lesko (mattlesko-nih) wrote :

Was the comment #10, https://bugs.launchpad.net/ceilometer/+bug/1264434/comments/10 ever acted upon? Even in Icehouse I still see incredible wait times for a "resource-list", often timing out:

/usr/bin/time ceilometer resource-list

Error communicating with http://$CEILOMETER:8777 timed out
      600.84 real 0.39 user 0.16 sys

mysql> select count(*) from ceilometer.resource;
+----------+
| count(*) |
+----------+
| 882 |
+----------+
1 row in set (0.00 sec)

gordon chung (chungg) wrote :

@Matt

there were minor improvements made to icehouse. larger improvements were made in juno.

just to dig deeper, can you confirm which ceilometer build you are using... also, what version of mysql do you have?

Matt Lesko (mattlesko-nih) wrote :

openstack-ceilometer-notification-2014.1.1-3.el6.noarch
openstack-ceilometer-api-2014.1.1-3.el6.noarch
openstack-ceilometer-common-2014.1.1-3.el6.noarch
openstack-ceilometer-central-2014.1.1-3.el6.noarch
openstack-ceilometer-collector-2014.1.1-3.el6.noarch
openstack-ceilometer-alarm-2014.1.1-3.el6.noarch
python-ceilometerclient-1.0.10-1.el6.noarch

RDO Icehouse I believe.

Mysql: mysql-server-5.1.73-3.el6_5.x86_64

So the comments on #10 were fixed in a different commit(s)?

gordon chung (chungg) wrote :

there were additional commits that were not included/backported to icehouse.

in icehouse, we had a fundamental design issue in our sql backend which made read and writes very slow. in juno we worked to simplify the sql backend. some performance values can be found here: https://tank.peermore.com/tanks/cdent-rhat/DatabasePerfTest.

i'm adding python-ceilometerclient because i believe technically, the original performance boost which includes skipping the generation of related meters is not enabled in client and can improve performance further.

Matt Lesko (mattlesko-nih) wrote :

Is this one of the commits? https://review.openstack.org/#/c/111313/

If it's showing up in Juno that'll be great.

gordon chung (chungg) wrote :

@Matt,

yes, the above patch will be in Juno and will probably give the biggest performance benefits.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers