max_resources_per_stack causes scaling issues on large stacks
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Heat |
Fix Released
|
High
|
Steve Baker |
Bug Description
I've been doing some testing, trying to mimic some performance issues observed on TripleO, which makes heavy use of ResourceGroups containing nested stacks.
Here's my minimal reproducer, which I've been testing on devstack:
heat_template_
description: >
Stress test, create many stacks in a RG
resources:
NovaComputes:
type: OS::Heat:
properties:
count: 400
resource_def:
type: dummy_node.yaml
-bash-4.3$ cat dummy_node.yaml·
heat_template_
description: >
Single cirros node
parameters:
resources:
server:
type: OS::Heat:
All it does is create 400 nested stacks inside a ResourceGroup, each containing one RandomString resource.
Initially, this took around 13minutes (!) on my devstack (core i7 laptop with 6 heat-engine workers)
$ heat event-list twostress2
+------
| resource_name | id | resource_
+------
| twostress2 | 0fa22280-
| NovaComputes | 491cf434-
| NovaComputes | edeaa3ff-
| twostress2 | d8127910-
+------
I then set max_resources_
$ heat event-list twostress2
+------
| resource_name | id | resource_
+------
| twostress2 | 19a65c8f-
| NovaComputes | 02663989-
| NovaComputes | 2aceaa54-
| twostress2 | 4b00e49f-
+------
I really think we should consider setting the default to be -1, and including a warning in the option documentation that it's potentially very costly to enable. The services called by heat implement their own quotas, so I think this resource count is mostly pretty arbitrary anyway.
Changed in heat: | |
importance: | Undecided → High |
summary: |
- max_resources_per_stack should be disabled by default + max_resources_per_stack causes scaling issues on large stacks |
Changed in heat: | |
milestone: | none → liberty-rc1 |
Changed in heat: | |
assignee: | Steven Hardy (shardy) → Steve Baker (steve-stevebaker) |
Changed in heat: | |
status: | Fix Committed → Fix Released |
Changed in heat: | |
milestone: | liberty-rc1 → 5.0.0 |
Ok, I know this has been previously discussed and rejected, but I think we should at least revisit that discussion, because this makes our performance suck for non-trivial deployments by default, so I think we should at least give consideration to disabling it unless we can make the check much less expensive.