[cellv2] the performance issue of cellv2 when creating 500 instances concurrently

Bug #1737465 reported by xhzhf on 2017-12-11
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Unassigned

Bug Description

Description
===========
we test cellv2 solution and execute creating instances concurrently.
But there is the performance issue due to query instance info in scheduler process

Steps to reproduce
==================
use rally to execute creating 500 instances concurrently

Expected result
===============
all instances are created successfully

Actual result
=============
many instances failed due to timeout

Environment
===========
1. Exact version of OpenStack you are running. See the following
openstack-nova-scheduler-16.0.3-1.el7.noarch
openstack-nova-placement-api16.0.3-1.el7.noarch

2. Which hypervisor did you use?
KVM. We have 2000 hosts

2. Which storage type did you use?
Our own storage device

3. Which networking type did you use?
Our own network device

Analysis
==============
In scheduler process, get_host_states_by_uuids calls _get_host_state.
_get_host_state call host_state.update and _get_instance_info
So _get_instance_info will be called at every request and every hosts.
Because cellv2 have disable instance info cache.
So there's a performance issue even we don't use affinity filter policy

Changed in nova:
assignee: nobody → Surya Seetharaman (tssurya)
melanie witt (melwitt) on 2018-02-14
tags: added: cells scheduler
melanie witt (melwitt) wrote :

Sorry for getting to this so long after it was opened. What do you mean by, "Because cellv2 have disable instance info cache."? I don't notice anything in the code that is deliberately disabling the instance info cache in the HostManager.

melanie witt (melwitt) wrote :

To be clear, if the cache was disabled, it was unintentional and it is a bug.

Jiang (jiangpf) wrote :

This is a performance issue. Compute instances info just used by affinity filter.
Anti affinity will judgment the instance whether it belongs to the current computing node. The current code is to determine if the instance uuid is in the instances of the compute node.
We can determine if the current instance exists. If it exists, we get the host it belongs to. Then we judge in anti- affinity whether the host name of the compute node is the same as the instance host name.

Changed in nova:
assignee: Surya Seetharaman (tssurya) → Jiang (jiangpf)
status: New → Confirmed
Matt Riedemann (mriedem) wrote :

By "Because cellv2 have disable instance info cache." I assume you mean you have disabled the track_instance_changes configuration option, correct?

https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.track_instance_changes

Changed in nova:
assignee: Jiang (jiangpf) → Matt Riedemann (mriedem)
importance: Undecided → Medium
Matt Riedemann (mriedem) wrote :

By the way, "track_instance_changes" doesn't really have anything to do with cells v2 unless you have a split MQ deployment where the scheduler and compute services (which are in the cells) are on different MQs - because then the computes can't RPC cast to the scheduler topic "at the top" of the deployment, as discussed here:

https://docs.openstack.org/nova/latest/user/cellsv2-layout.html

Fix proposed to branch: master
Review: https://review.openstack.org/569218

Changed in nova:
status: Confirmed → In Progress
Matt Riedemann (mriedem) wrote :

You said, "So there's a performance issue even we don't use affinity filter policy" - are you actually not using any of the ServerGroup(Anti)AffinityFilter or SameHost/DifferentHostFilter filters? If so, we could potentially short-circuit the HostManager._get_instance_info() method to not even pull instances from the database if an affinity filter is not enabled.

That would break any out of tree filters that rely on the HostState.instances information, however, so we'd have to consider that if it were a change we made.

Changed in nova:
assignee: Matt Riedemann (mriedem) → Eric Fried (efried)

Reviewed: https://review.openstack.org/569218
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a1a335f19a986a593487a802d7b9448f7efc5bab
Submitter: Zuul
Branch: master

commit a1a335f19a986a593487a802d7b9448f7efc5bab
Author: Matt Riedemann <email address hidden>
Date: Thu May 17 14:36:23 2018 -0400

    Avoid unnecessary joins in HostManager._get_instances_by_host

    While getting a HostState object for a given compute node during
    scheduling, if the HostState does not have its instance info
    set, either because it's out of date or because config option
    "track_instance_changes" is False, the HostManager still pulls
    the list of instances for that host from the database and stores
    it in HostState.instances.

    This is *only* used (in-tree) by the affinity filters and even
    then the only thing those filters use from HostState.instances
    is the set of keys from the dict, which is the list of instance
    UUIDs on a given host. The actual Instance objects aren't used
    at all. See blueprint put-host-manager-instance-info-on-a-diet
    for more details on that.

    The point of this change, is that when we go to pull the set
    of instances from the database for a given host, we don't need
    to join on the default columns (info_cache and security_groups)
    defined in the _instance_get_all_query() method in the DB API.

    This should be at least some minor optimization in scheduling
    for hosts that have several instances on them in a large cloud.
    As noted in the comment in the code, any out of tree filters
    that rely on using the info_cache or security_groups from the
    instance are now going to be hit with a lazy-load penalty
    per instance, but we have no contract on out of tree filters
    so if this happens, the people maintaining said filters can
    (1) live with it (2) fork the HostManager code or (3) upstream
    their filter so it's in-tree.

    A more impactful change would be to refactor
    HostManager._get_host_states to bulk query the instances on
    the given set of compute nodes in a single query per cell. But
    that is left for a later change.

    Change-Id: Iccefbfdfa578515a004ef6ac718bac1a49d5c5fd
    Partial-Bug: #1737465

Eric Fried (efried) on 2018-05-19
Changed in nova:
assignee: Eric Fried (efried) → Matt Riedemann (mriedem)

Reviewed: https://review.openstack.org/570083
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bfb9d3f179607c25f91d77585f97df6d0704c253
Submitter: Zuul
Branch: stable/queens

commit bfb9d3f179607c25f91d77585f97df6d0704c253
Author: Matt Riedemann <email address hidden>
Date: Thu May 17 14:36:23 2018 -0400

    Avoid unnecessary joins in HostManager._get_instances_by_host

    While getting a HostState object for a given compute node during
    scheduling, if the HostState does not have its instance info
    set, either because it's out of date or because config option
    "track_instance_changes" is False, the HostManager still pulls
    the list of instances for that host from the database and stores
    it in HostState.instances.

    This is *only* used (in-tree) by the affinity filters and even
    then the only thing those filters use from HostState.instances
    is the set of keys from the dict, which is the list of instance
    UUIDs on a given host. The actual Instance objects aren't used
    at all. See blueprint put-host-manager-instance-info-on-a-diet
    for more details on that.

    The point of this change, is that when we go to pull the set
    of instances from the database for a given host, we don't need
    to join on the default columns (info_cache and security_groups)
    defined in the _instance_get_all_query() method in the DB API.

    This should be at least some minor optimization in scheduling
    for hosts that have several instances on them in a large cloud.
    As noted in the comment in the code, any out of tree filters
    that rely on using the info_cache or security_groups from the
    instance are now going to be hit with a lazy-load penalty
    per instance, but we have no contract on out of tree filters
    so if this happens, the people maintaining said filters can
    (1) live with it (2) fork the HostManager code or (3) upstream
    their filter so it's in-tree.

    A more impactful change would be to refactor
    HostManager._get_host_states to bulk query the instances on
    the given set of compute nodes in a single query per cell. But
    that is left for a later change.

    Change-Id: Iccefbfdfa578515a004ef6ac718bac1a49d5c5fd
    Partial-Bug: #1737465
    (cherry picked from commit a1a335f19a986a593487a802d7b9448f7efc5bab)

tags: added: in-stable-queens

Reviewed: https://review.openstack.org/569247
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=91f5af7ee7f7140eafb7237875f6cd6ea1abcd38
Submitter: Zuul
Branch: master

commit 91f5af7ee7f7140eafb7237875f6cd6ea1abcd38
Author: Matt Riedemann <email address hidden>
Date: Thu May 17 16:06:39 2018 -0400

    Trim the fat on HostState.instances

    The only in-tree filters that rely on HostState.instances
    are the affinity filters (and one of the weighers). And they
    don't even look at the values in the HostState.instances dict,
    just the keys (which are the instance uuids for the instances
    on the host).

    So rather than pull the full instance objects, we can just get
    the list of instance uuids off the host and fake out the object.

    Custom filters/weighers will still be able to lazy-load fields
    on the Instance objects in HostState.instances if needed, but
    it will mean a performance penalty due to the round trip to the
    database per instance, per host. Out of tree filters/weighers
    are encouraged to be contributed upstream.

    Related to blueprint put-host-manager-instance-info-on-a-diet

    Related-Bug: #1737465

    Change-Id: I766bb5645e3b598468d092fb9e4f18e720617c52

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/571928

Matt Riedemann (mriedem) wrote :

I've pushed some related changes for this bug am removing myself as assignee since I think we still have some known performance issues during scheduling, specifically around this method to get all host states from all cells, and for each compute (host state) getting the aggregates and instances from each cell db per host:

https://github.com/openstack/nova/blob/b93b40c6c01a1161f40592d29353c3461669de19/nova/scheduler/host_manager.py#L704

Changed in nova:
assignee: Matt Riedemann (mriedem) → nobody
status: In Progress → Confirmed
Matt Riedemann (mriedem) wrote :

Some related changes:

1. Jay Pipes' patch https://review.opendev.org/#/c/623558/ to redo the host mapping / instances query to go from O(N) to O(1) which CERN is using in production.

2. Surya's patch https://review.opendev.org/#/c/635532/ to pre-filter cells based on the allocation candidates (hosts) returned from placement.

Related fix proposed to branch: master
Review: https://review.opendev.org/663388

Reviewed: https://review.opendev.org/663387
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=426a303ee4cb848e75b9ed2acc498479c651c539
Submitter: Zuul
Branch: master

commit 426a303ee4cb848e75b9ed2acc498479c651c539
Author: Matt Riedemann <email address hidden>
Date: Wed Jun 5 12:54:45 2019 -0400

    Convert HostMapping.cells to a dict

    This changes the HostMapping.cells list to a dict,
    keyed by cell uuid, to each CellMapping object.
    This is going to be used in a later change [1] where
    we will create a cached mapping of host to cell uuid
    so we can go quickly from compute node host value to
    get the CellMapping without having to lookup the
    HostMapping again.

    [1] Ic6b1edfad2e384eb32c6942edc522ee301123cbc

    Change-Id: I75e5bf2c45688f68eeca27404641f4c438b023c6
    Related-Bug: #1737465

Reviewed: https://review.opendev.org/663388
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a69fdb333e80f5ad2e025a9823b11483fad55c2
Submitter: Zuul
Branch: master

commit 3a69fdb333e80f5ad2e025a9823b11483fad55c2
Author: Matt Riedemann <email address hidden>
Date: Wed Jun 5 13:30:25 2019 -0400

    Cache host to cell mapping in HostManager

    If the instances per host are not cached in the HostManager
    we lookup the HostMapping per candidate compute node during
    each scheduling request to get the CellMapping so we can target
    that cell database to pull the instance uuids on the given host.

    For example, if placement returned 20 compute node allocation
    candidates and we don't have the instances cached for any of those,
    we'll do 20 queries to the API DB to get host mappings.

    We can improve this by caching the host to cell uuid after the first
    lookup for a given host and then after that, get the CellMapping
    from the cells cache (which is a dict, keyed by cell uuid, to the
    CellMapping for that cell).

    Change-Id: Ic6b1edfad2e384eb32c6942edc522ee301123cbc
    Related-Bug: #1737465

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/676257

Reviewed: https://review.opendev.org/676256
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=30552298410605ef5710fd564f5e46e5b0df0597
Submitter: Zuul
Branch: stable/stein

commit 30552298410605ef5710fd564f5e46e5b0df0597
Author: Matt Riedemann <email address hidden>
Date: Wed Jun 5 12:54:45 2019 -0400

    Convert HostMapping.cells to a dict

    This changes the HostMapping.cells list to a dict,
    keyed by cell uuid, to each CellMapping object.
    This is going to be used in a later change [1] where
    we will create a cached mapping of host to cell uuid
    so we can go quickly from compute node host value to
    get the CellMapping without having to lookup the
    HostMapping again.

    [1] Ic6b1edfad2e384eb32c6942edc522ee301123cbc

    Change-Id: I75e5bf2c45688f68eeca27404641f4c438b023c6
    Related-Bug: #1737465
    (cherry picked from commit 426a303ee4cb848e75b9ed2acc498479c651c539)

tags: added: in-stable-stein

Reviewed: https://review.opendev.org/676257
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=677c95280b21eb6628a7758f173cb98c4e3572c2
Submitter: Zuul
Branch: stable/stein

commit 677c95280b21eb6628a7758f173cb98c4e3572c2
Author: Matt Riedemann <email address hidden>
Date: Wed Jun 5 13:30:25 2019 -0400

    Cache host to cell mapping in HostManager

    If the instances per host are not cached in the HostManager
    we lookup the HostMapping per candidate compute node during
    each scheduling request to get the CellMapping so we can target
    that cell database to pull the instance uuids on the given host.

    For example, if placement returned 20 compute node allocation
    candidates and we don't have the instances cached for any of those,
    we'll do 20 queries to the API DB to get host mappings.

    We can improve this by caching the host to cell uuid after the first
    lookup for a given host and then after that, get the CellMapping
    from the cells cache (which is a dict, keyed by cell uuid, to the
    CellMapping for that cell).

    Change-Id: Ic6b1edfad2e384eb32c6942edc522ee301123cbc
    Related-Bug: #1737465
    (cherry picked from commit 3a69fdb333e80f5ad2e025a9823b11483fad55c2)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers