Bug #1786055 “performance degradation in placement with large nu...” : Bugs : OpenStack Compute (nova)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-08: Fix proposed to nova (master)

#1

Fix proposed to branch: master
Review: https://review.openstack.org/589941

Changed in nova:
assignee:	nobody → Chris Dent (cdent)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-08: Related fix proposed to nova (master)

#2

Related fix proposed to branch: master
Review: https://review.openstack.org/589945

melanie witt (melwitt) on 2018-08-08

tags:

added: rocky-rc-potential

Revision history for this message

melanie witt (melwitt) wrote on 2018-08-08:

#3

Just adding an informational note here from discussion in #openstack-placement, the changes that regressed the performance are these patches that added calls of ResourceProvider.get_by_uuid:

https://review.openstack.org/567113
https://review.openstack.org/564351

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-08: Related fix merged to nova (master)

#4

Reviewed: https://review.openstack.org/589945
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f0cc6e68a9b6e6ca052866ba6583a4914ab6cb4e
Submitter: Zuul
Branch: master

commit f0cc6e68a9b6e6ca052866ba6583a4914ab6cb4e
Author: Jay Pipes <email address hidden>
Date: Wed Aug 8 11:36:51 2018 -0400

get provider IDs once when building summaries

    We were calling ResourceProvider.get_by_uuid() inside
    _build_provider_summaries() main loop over all providers involved in the
    resulting allocation candidates. This results in a query per provider
    involved, which is quite obviously not going to perform well. This patch
    modifies the _build_provider_summaries() function to make a single call
    to a new _provider_ids_from_rp_ids() function instead of multiple calls
    to ResourceProvider.get_by_uuid().

Change-Id: I0e0a44e833afece0775ec712fbdf9fcf4eae7a93
Related-bug: #1786055

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-08: Fix merged to nova (master)

#5

Reviewed: https://review.openstack.org/589941
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3c96ce9158924476d88c815d0889d6d52ff62c76
Submitter: Zuul
Branch: master

commit 3c96ce9158924476d88c815d0889d6d52ff62c76
Author: Chris Dent <email address hidden>
Date: Wed Aug 8 16:22:53 2018 +0100

[placement] Avoid rp.get_by_uuid in allocation_candidates

    We already have a fully loaded resource provider object in the loop, so
    we don't need to create another one. Doing so has a very large
    performance impact, especially when there are many resource providers
    in the collection of summaries (which will be true in a large and
    sparsely used cloud).

    The code which creates the summaries used here as a data source has the
    same expensive use of get_by_uuid in a loop. That will be fixed in a
    separate patch.

Existing functional tests cover this code.

Change-Id: I6068db78240c33a1dcefedc0c94e76740fd8d6e2
Partial-Bug: #1786055

Revision history for this message

melanie witt (melwitt) wrote on 2018-08-09:

#6

I'm going to close this bug to help keep our backlog down since the two patches to fix the major factor in the performance degradation have landed.

"<cdent> melwitt: hmmm. There is more than can be done, but not likely that more will be done _now_, so I would guess closed is probably a reasonable state. The major factor has been addressed. Fixing the rest will involve considerable refactoring"

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-13: Related fix merged to nova (master)

#7

Reviewed: https://review.openstack.org/590388
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9ea340eb0d3bdb103bd64ca40b999bd2b10b80aa
Submitter: Zuul
Branch: master

commit 9ea340eb0d3bdb103bd64ca40b999bd2b10b80aa
Author: Jay Pipes <email address hidden>
Date: Thu Aug 9 10:46:20 2018 -0400

placement: use simple code paths when possible

    Somewhere in the past release, we started using extremely complex code
    paths involving sharing providers, anchor providers, and nested resource
    provider calculations when we absolutely don't need to do so.

    There was a _has_provider_trees() function in the
    nova/api/openstack/placement/objects/resource_provider.py file that used
    to be used for top-level switching between a faster, simpler approach to
    finding allocation candidates for a simple search of resources and
    traits when no sharing providers and no nesting was used. That was
    removed at some point and all code paths -- even for simple "get me
    these amounts of these resources" when no trees or sharing providers are
    present (which is the vast majority of OpenStack deployments) -- were
    going through the complex tree-search-and-match queries and algorithms.

    This patch changes that so that when there's a request for some
    resources and there's no trees or sharing providers, we do the simple
    code path. Hopefully this gets our performance for the simple, common
    cases back to where we were pre-Rocky.

    This change is a prerequisite for the following change which adds
    debugging output to help diagnose which resource classes are running
    out of inventory when GET /allocation_candidates returns 0 results.
    That code is not possible without the changes here as they only
    work if we can identify when a "simpler approach" is possible and
    call that simpler code.

    Related-Bug: #1786055
    Partial-Bug: #1786519
    Change-Id: I1fdbcdb7a1dd51e738924c8a30238237d7ac74e1

Reviewed:  https://review.openstack.org/590388
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9ea340eb0d3bdb103bd64ca40b999bd2b10b80aa
Submitter: Zuul
Branch:    master

commit 9ea340eb0d3bdb103bd64ca40b999bd2b10b80aa
Author: Jay Pipes <jaypipes@gmail.com>
Date:   Thu Aug 9 10:46:20 2018 -0400

placement: use simple code paths when possible
    
    Somewhere in the past release, we started using extremely complex code
    paths involving sharing providers, anchor providers, and nested resource
    provider calculations when we absolutely don't need to do so.
    
    There was a _has_provider_trees() function in the
    nova/api/openstack/placement/objects/resource_provider.py file that used
    to be used for top-level switching between a faster, simpler approach to
    finding allocation candidates for a simple search of resources and
    traits when no sharing providers and no nesting was used. That was
    removed at some point and all code paths -- even for simple "get me
    these amounts of these resources" when no trees or sharing providers are
    present (which is the vast majority of OpenStack deployments) -- were
    going through the complex tree-search-and-match queries and algorithms.
    
    This patch changes that so that when there's a request for some
    resources and there's no trees or sharing providers, we do the simple
    code path. Hopefully this gets our performance for the simple, common
    cases back to where we were pre-Rocky.
    
    This change is a prerequisite for the following change which adds
    debugging output to help diagnose which resource classes are running
    out of inventory when GET /allocation_candidates returns 0 results.
    That code is not possible without the changes here as they only
    work if we can identify when a "simpler approach" is possible and
    call that simpler code.
    
    Related-Bug: #1786055
    Partial-Bug: #1786519
    Change-Id: I1fdbcdb7a1dd51e738924c8a30238237d7ac74e1

Matt Riedemann (mriedem) on 2018-08-16

tags:	removed: rocky-rc-potential
tags:	added: performance

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-09-06: Related fix proposed to nova (stable/rocky)

#8

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/600447

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-09-14: Related fix merged to nova (stable/rocky)

#9

Reviewed: https://review.openstack.org/600447
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1cae9b8d4392ef597d93f2934ce18ead8828da98
Submitter: Zuul
Branch: stable/rocky

commit 1cae9b8d4392ef597d93f2934ce18ead8828da98
Author: Jay Pipes <email address hidden>
Date: Thu Aug 9 10:46:20 2018 -0400

placement: use simple code paths when possible

    Somewhere in the past release, we started using extremely complex code
    paths involving sharing providers, anchor providers, and nested resource
    provider calculations when we absolutely don't need to do so.

    There was a _has_provider_trees() function in the
    nova/api/openstack/placement/objects/resource_provider.py file that used
    to be used for top-level switching between a faster, simpler approach to
    finding allocation candidates for a simple search of resources and
    traits when no sharing providers and no nesting was used. That was
    removed at some point and all code paths -- even for simple "get me
    these amounts of these resources" when no trees or sharing providers are
    present (which is the vast majority of OpenStack deployments) -- were
    going through the complex tree-search-and-match queries and algorithms.

    This patch changes that so that when there's a request for some
    resources and there's no trees or sharing providers, we do the simple
    code path. Hopefully this gets our performance for the simple, common
    cases back to where we were pre-Rocky.

    This change is a prerequisite for the following change which adds
    debugging output to help diagnose which resource classes are running
    out of inventory when GET /allocation_candidates returns 0 results.
    That code is not possible without the changes here as they only
    work if we can identify when a "simpler approach" is possible and
    call that simpler code.

    Related-Bug: #1786055
    Partial-Bug: #1786519
    Change-Id: I1fdbcdb7a1dd51e738924c8a30238237d7ac74e1
    (cherry picked from commit 9ea340eb0d3bdb103bd64ca40b999bd2b10b80aa)

Reviewed:  https://review.openstack.org/600447
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1cae9b8d4392ef597d93f2934ce18ead8828da98
Submitter: Zuul
Branch:    stable/rocky

commit 1cae9b8d4392ef597d93f2934ce18ead8828da98
Author: Jay Pipes <jaypipes@gmail.com>
Date:   Thu Aug 9 10:46:20 2018 -0400

placement: use simple code paths when possible
    
    Somewhere in the past release, we started using extremely complex code
    paths involving sharing providers, anchor providers, and nested resource
    provider calculations when we absolutely don't need to do so.
    
    There was a _has_provider_trees() function in the
    nova/api/openstack/placement/objects/resource_provider.py file that used
    to be used for top-level switching between a faster, simpler approach to
    finding allocation candidates for a simple search of resources and
    traits when no sharing providers and no nesting was used. That was
    removed at some point and all code paths -- even for simple "get me
    these amounts of these resources" when no trees or sharing providers are
    present (which is the vast majority of OpenStack deployments) -- were
    going through the complex tree-search-and-match queries and algorithms.
    
    This patch changes that so that when there's a request for some
    resources and there's no trees or sharing providers, we do the simple
    code path. Hopefully this gets our performance for the simple, common
    cases back to where we were pre-Rocky.
    
    This change is a prerequisite for the following change which adds
    debugging output to help diagnose which resource classes are running
    out of inventory when GET /allocation_candidates returns 0 results.
    That code is not possible without the changes here as they only
    work if we can identify when a "simpler approach" is possible and
    call that simpler code.
    
    Related-Bug: #1786055
    Partial-Bug: #1786519
    Change-Id: I1fdbcdb7a1dd51e738924c8a30238237d7ac74e1
    (cherry picked from commit 9ea340eb0d3bdb103bd64ca40b999bd2b10b80aa)

tags:

added: in-stable-rocky

OpenStack Compute (nova)

performance degradation in placement with large number of resource providers

Bug Description

Other bug subscribers

Remote bug watches