Nested stacks are still loaded into memory too often

Bug #1731349 reported by Zane Bitter on 2017-11-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
High
Zane Bitter

Bug Description

Loading nested stacks into memory at the same time as the parent stack concentrates memory use into a single engine, and tends to result in a high-water-mark effect. Since Kilo, when we split operations on nested stacks so that they happen over RPC in different engine worker processes, we've been endeavouring to eliminate loading of nested stacks using the StackResource.nested() call, most notably in these patches:

https://review.openstack.org/#/c/384718/
https://review.openstack.org/#/c/383839/

which together reduced TripleO memory usage in the gate by >20%.[1]

However, there remain other places where nested() is still called - many hidden by the monstrosity that is grouputils. The following changes could further improve memory usage:

* StackResource._validate_nested_resources() should use the current template or something - anything - to ascertain the number of resources currently in the stack in preference to loading the nested stack to count them. AFAICT this will result in every nested stack being loaded into memory simultaneously for validation purposes whenever we do a stack update, unless the max_resources_per_stack config option is set to -1. (TripleO *does* set it to -1, but most installations do not.)

* InstanceGroup._replace() and ResourceGroup._replace() (including ResourceGroup._count_black_listed()) should use some other method of determining the current size when doing a rolling update.

* AutoscalingGroup, InstanceGroup, ResourceGroup, and ResourceChain should use outputs from the nested stack (added in Pike by https://review.openstack.org/#/c/475931/) to get attribute values from nested resources, and only fall back on the grouputils functions when the outputs don't exist (which would happen if the stack has not been updated since before Pike).

* ResourceChain should add outputs to the nested stack to get reference IDs for nested resources, and only fall back on the grouputils functions when the outputs don't exist.

* grouputils.get_members() should be reimplemented or replaced with something that does a resource list over RPC to get the list of non-failed child resources. (And get_size() should be reimplemented to just return len(get_members(...)).)

[1] http://lists.openstack.org/pipermail/openstack-dev/2017-January/109748.html

Fix proposed to branch: master
Review: https://review.openstack.org/529715

Changed in heat:
assignee: nobody → Zane Bitter (zaneb)
status: Triaged → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/530968

Fix proposed to branch: master
Review: https://review.openstack.org/530972

Fix proposed to branch: master
Review: https://review.openstack.org/531925

Fix proposed to branch: master
Review: https://review.openstack.org/531926

Fix proposed to branch: master
Review: https://review.openstack.org/531929

Fix proposed to branch: master
Review: https://review.openstack.org/531930

Fix proposed to branch: master
Review: https://review.openstack.org/531931

Zane Bitter (zaneb) wrote :

Looks like a 14% reduction in TripleO memory use to 1.12GiB, with convergence enabled (see https://review.openstack.org/#/c/531931/ for details).

Zane Bitter (zaneb) on 2018-01-09
Changed in heat:
importance: Medium → High

Fix proposed to branch: master
Review: https://review.openstack.org/532301

Fix proposed to branch: master
Review: https://review.openstack.org/532302

Fix proposed to branch: master
Review: https://review.openstack.org/532341

Fix proposed to branch: master
Review: https://review.openstack.org/532571

Zane Bitter (zaneb) on 2018-01-10
Changed in heat:
milestone: none → queens-3

Reviewed: https://review.openstack.org/529715
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=e5707618f3a5cf22e2f4260ed1aa64b598d3a941
Submitter: Zuul
Branch: master

commit e5707618f3a5cf22e2f4260ed1aa64b598d3a941
Author: Zane Bitter <email address hidden>
Date: Mon Jan 8 17:23:12 2018 -0500

    Avoid always loading nested stack on update

    Previously, when calling StackResource._validate_nested_resources() (which
    we do whenever we create or update a nested stack), we would load the
    nested stack into memory to validate the number of resources in the nested
    stack, unless the max_resources_per_stack config option was set to -1. This
    meant we would load the nested stack into memory in the same engine as the
    parent on every update.

    To reduce the memory high-water mark, fetch the information we need over
    RPC from another engine instead.

    To ensure this is only called once, move the call into the validate code.
    (Previously it was called again in the create/update itself.)

    Change-Id: I78d12ecc8240c697e26893ae2d7172b60883fb93
    Partial-Bug: #1731349

Reviewed: https://review.openstack.org/530968
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=ce120bfda848e55ba06b606681e7c1b8ae3de923
Submitter: Zuul
Branch: master

commit ce120bfda848e55ba06b606681e7c1b8ae3de923
Author: Zane Bitter <email address hidden>
Date: Mon Jan 8 17:23:12 2018 -0500

    Avoid loading nested stack in some grouputils functions

    We want to avoid loading a nested stack in memory wherever possible, since
    that is known to cause memory high-water-mark issues. The grouputils
    functions are among the worst offenders at doing this. Some of the data
    that they return is easily obtained from an RPC call to
    list_stack_resources, so swap out the implementations using nested().

    Rather than simply add more utility functions, a GroupInspector class is
    created that can cache the data returned. In future this will allow groups
    that need to access multiple functions from the grouputils to do so without
    making multiple RPC calls. (Previously, the data was cached in the group's
    nested Stack.)

    Change-Id: Icd35d91bce30ee36d9592b0516767ef273a9f34d
    Partial-Bug: #1731349

Reviewed: https://review.openstack.org/530972
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=b023fa88d1ed7194faf0e54330814f520c7a010e
Submitter: Zuul
Branch: master

commit b023fa88d1ed7194faf0e54330814f520c7a010e
Author: Zane Bitter <email address hidden>
Date: Mon Jan 8 17:23:12 2018 -0500

    Don't load nested stack in batched ResourceGroup

    Previously whenever we did a batched operation on a ResourceGroup, we
    loaded the nested stack into memory in the local engine in order to
    determine the current capacity. Change to using the GroupInspector class to
    calculate the capacity using only RPC.

    Change-Id: Ie4c6791bf70df01a66e49cb8ef104e8155c90443
    Partial-Bug: #1731349

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers