Performance degradation for a cluster due to many events

Bug #1817604 reported by Pavel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
senlin
Fix Released
High
Jude Cross

Bug Description

When number of events for a cluster grows, the performance is degradated significantly.
It happens due to cobination of relationship in models.py and joinedload_all('*') option at api.py:
https://github.com/openstack/senlin/blob/ec532de1e4361e6ceb32a269fd6967829ef2f202/senlin/db/sqlalchemy/api.py#L92

Due to this combination, a basic call like cluster_get returns all dependent actions an policies, which have to be converted to Python objects.

I have a fix, for this issue will push it shortly.

Revision history for this message
Jude Cross (problem-v) wrote :

Hey Pavel,

I too have been addressing this issue. Additionally the conversion in the to_dict method is extremely inefficient the worst offender being in the Action object. I have a patch set I will be uploading shortly to address both of these problems.

Revision history for this message
Jude Cross (problem-v) wrote :

Currently being addressed in this patch:
https://review.openstack.org/#/c/639420/

Some performance metrics.

Action Old Timing New Timing
list 0m11.050s 0m9.163s
action list 1m2.214s 0m17.405s
node list 0m7.397s 0m6.416s
show x 0m8.410s 0m5.098s
action show x 0m5.454s 0m5.110s
node show x 0m7.647s 0m5.029s
policy list 0m5.490s 0m3.946s
policy show x 0m5.485s 0m3.618s

Revision history for this message
Duc Truong (dtruong) wrote :

More readable ASCII table of the above data:

+---------------+------------+------------+
| Action | Old Timing | New Timing |
+---------------+------------+------------+
| list | 0m11.050s | 0m9.163s |
| action list | 1m2.214s | 0m17.405s |
| node list | 0m7.397s | 0m6.416s |
| show x | 0m8.410s | 0m5.098s |
| action show x | 0m5.454s | 0m5.110s |
| node show x | 0m7.647s | 0m5.029s |
| policy list | 0m5.490s | 0m3.946s |
| policy show x | 0m5.485s | 0m3.618s |
+---------------+------------+------------+

Revision history for this message
Duc Truong (dtruong) wrote :

@Jude: How many clusters / nodes did you have in your test environment for the above metrics?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to senlin (master)

Reviewed: https://review.openstack.org/639420
Committed: https://git.openstack.org/cgit/openstack/senlin/commit/?id=9f49cfdfb21a51f479fe2d548d67b039ebc217bd
Submitter: Zuul
Branch: master

commit 9f49cfdfb21a51f479fe2d548d67b039ebc217bd
Author: Jude Cross <email address hidden>
Date: Tue Feb 26 11:20:21 2019 -0800

    Fix Senlin performance issues

    This patch fixes the interaction of Senlin with the database.
    The standard model_query (joinload_all('*')) has been removed
    in favor of using more distinctive join statements.

    Additionally this patch removes the DB calls that were baked into the
    to_dict() method for the Senlin objects and instead retrieves that
    data with joins/single database calls. This allows cluster action
    show to actually return within an appropriate amount of time.

    This patch improves performance all around with considerably less CPU
    usage.

    Closes-Bug: #1817604
    Change-Id: Ie5c1fca080c82833941edc130568e76701ce394c

Changed in senlin:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/senlin 7.0.0.0rc1

This issue was fixed in the openstack/senlin 7.0.0.0rc1 release candidate.

Changed in senlin:
assignee: nobody → Jude Cross (problem-v)
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on senlin (master)

Change abandoned by XueFeng Liu (<email address hidden>) on branch: master
Review: https://review.opendev.org/639161

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.