queuepool limit of size 5 overflow

Bug #1306743 reported by Robert Collins
30
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Critical
Clint Byrum
Icehouse
Fix Released
High
Steve Baker

Bug Description

2014-04-11 17:39:09.591 15980 ERROR heat.openstack.common.rpc.amqp [req-96c183a9-39b3-4922-95e7-8e8de9d4f87c None] Exception during message handling
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp Traceback (most recent call last):
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/openstack/common/rpc/amqp.py", line 462, in _process_data
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp **args)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/engine/service.py", line 63, in wrapped
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp return func(self, ctx, *args, **kwargs)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/engine/service.py", line 852, in describe_stack_resource
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp s = self._get_stack(cnxt, stack_identity)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/engine/service.py", line 338, in _get_stack
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp show_deleted=show_deleted)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/db/api.py", line 110, in stack_get
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp tenant_safe=tenant_safe)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/db/sqlalchemy/api.py", line 271, in stack_get
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp result = model_query(context, models.Stack).get(stack_id)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 827, in get
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp return loading.load_on_ident(self, key)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 226, in load_on_ident
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp return q.one()
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2317, in one
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp ret = list(self)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2360, in __iter__
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp return self._execute_and_instances(context)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2373, in _execute_and_instances
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp close_with_result=True)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2364, in _connection_from_session
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp **kw)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 799, in connection
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp close_with_result=close_with_result)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 805, in _connection_for_bind
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp return engine.contextual_connect(**kwargs)
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1661, in contextual_connect
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp self.pool.connect(),
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 326, in connect
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp return _ConnectionFairy(self).checkout()
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 485, in __init__
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp rec = self._connection_record = pool._do_get()
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 766, in _do_get
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp (self.size(), self.overflow(), self._timeout))
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30
2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp

Changed in heat:
assignee: nobody → Matthew Gilliard (matthew-gilliard-u)
Changed in heat:
status: New → Triaged
importance: Undecided → Critical
milestone: none → juno-1
Revision history for this message
Matthew Gilliard (matthew-gilliard-u) wrote :

During a deploy of ~30 instances with heat, we see heat engine take 100% of the cpu and load mysql up. This cause the stack to fail with the flowing:

| stack_status_reason | Resource CREATE failed: TimeoutError: QueuePool limit
| | of size 5 overflow 10 reached, connection timed out,
| | timeout 30

There are approx 300-400 reqs/sec hitting mysql from heat-engine, mostly SELECT from resource and from resource_data tables.

We can make thing better but not fix the issue my increasing the look time in the heat scheduler:

heat/engine/scheduler.py line 196
def run_to_completion(self, wait_time=20): change this form the default 1 to 20;

However this seems to only reduce the issue rather than fix it (i.e. we get further but the same happened).
We also see the loading stay high even after the failure and only at heat stack-delete do you see the load decrease.

Revision history for this message
Zane Bitter (zaneb) wrote :

Random notes from IRC:

* The trace above is during a metadata polling operation from os-collect-config.
* os-collect-config polls once every 30s from each server.
* The high rate of DB requests suggests that maybe we are creating multiple sessions for each incoming RPC request, and thus needing to repeatedly retrieve data that should already have been cached in the request's master session.
* Note that when retrieving the metadata we refresh the cache: https://github.com/openstack/heat/blob/master/heat/engine/resource.py#L81 - this may or may not be relevant.
* It would be interesting to compare the rate of incoming RPC requests to the rate of DB requests.

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Currently every os-collect-config metadata poll will cause the entire stack to be parsed and loaded, but ultimately just returns the results of a list-deployments query. This is not scalable in the long term.

The chain is currently this for the default Server software_config_transport: POLL_SERVER_CFN:
occ -> heat-api-cfn(resource-show) -> heat-engine(parse the stack to get the server resource) -> heat-api(deployments-list) -> heat-engine(metadata_software_deployments)

And is no better for Server software_config_transport: POLL_SERVER_HEAT
occ -> heat-api(resource-show) -> heat-engine(parse the stack to get the server resource) -> heat-api(deployments-list) -> heat-engine(metadata_software_deployments)

What we need (and what I've been working towards) is a software_config_transport: POLL_DEPLOYMENTS:
occ -> heat-api(deployments-list) -> heat-engine(metadata_software_deployments)

I'll continue to work towards POLL_DEPLOYMENTS, but Zane's optimisation process would still be helpful. If the first API calls could be directed to the heat-engine which is already IN_PROGRESS with that stack then we could find a way of reusing the existing stack rather than reloading it from the database.

Revision history for this message
Matthew Gilliard (matthew-gilliard-u) wrote :

I had assigned this to me, but I'm not making much progress and I don't want to prevent anyone else having a look.

Changed in heat:
assignee: Matthew Gilliard (matthew-gilliard-u) → nobody
Revision history for this message
Robert Collins (lifeless) wrote :

What about hooking in memcache in the api - the results are constant for a given last-event-in-a-stack right?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

So I think this is just good old fashioned optimization.

What we need is to know the user ids that are allowed to access this resource. We currently determine that by parsing _the entire_ stack from the database.

But we don't need to do that. We can simply look up the access rules and apply them to the user id. That would reduce the impact to two well indexed SQL queries.

Will look at doing an implementation tomorrow if somebody else hasn't picked this up.

Revision history for this message
Thom Leggett (tteggel) wrote :

Further analysis of this on HP side confirms clint-fewbar and zaneb's analysis. With the current impl of metadata_update in service.py we suspect we'll see DB queries for waitcondition requests = (number of resources in stack) ^ 2. Or worse. Not sure I have the skills to fix it.

Changed in heat:
assignee: nobody → Clint Byrum (clint-fewbar)
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

I attempted to reproduce this by doing the following:
* heat.conf [database] max_pool_size=1, max_overflow=1
* launch 8 stacks each with 4 resources including one server
* call resource-metadata on all 8 stacks in a tight loop in 8 different processes

I failed to reproduce the issue but I may have some dependencies which are not up to date. To me this doesn't rule out a regression caused by a newer dependency.

I do agree that there is a straight-up optimisation that needs to happen to minimise the number of db calls for a stack load. For my part I'm going to look at replacing all the calls to db_api.resource_data_get with a db_api.resource_data_get_all that is called only once per resource load.

Clint's approach will help too, but access rules are decided by callback in the resource rather than a static rule, which may complicate things.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

This is indeed a bit tougher to unwind than I had thought. Any incremental fixes to improve the general performance of the queries is a good idea. However, I think we need to do this immediately as well:

* Let resources add resource_data to a resource that indicates IDs that can be allowed access.
* Add a way for resource plugins to register a resource data key based access handler at load time. For example we could have them register 'user_ids' and 'ec2_access_keys'. Things that create users would add user_ids to the resources on which they need access. Things that create EC2 access keys would do likewise.
* Change classes that register resource access handlers to instead add resource data that indicates things like user Ids or EC2 access keys to give access to metadata.
* Add a configuration option to disable checking for generic resource access handlers once all use of that is out of core Heat. Default to false.
* Update describe_stack_resource to load the resource and any known handled access keys from resource_data first, and look for a positive match in resource_data. If there is no match, and we are still allowing generic resource handlers, load the stack and check with generic resource handlers.
* Deprecate letting resources register generic resource handlers.

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Clint, as a plan to not require a stack load to perform access-allowed checks, the above looks fine.

However I worry that after this is done you'll still need to do a stack load to do the thing *after* the access-allowed check (like get the metadata for a server resource)

Revision history for this message
Robert Collins (lifeless) wrote :

@clint this sounds like a plausible approach; how much work is it?

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/88497

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/88498

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/88499

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/88457
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=0114afcf560de862eb8a28e4eb4302702cdbc53e
Submitter: Jenkins
Branch: master

commit 0114afcf560de862eb8a28e4eb4302702cdbc53e
Author: Zane Bitter <email address hidden>
Date: Thu Apr 17 18:42:03 2014 -0400

    Avoid redundant polling of DB for metadata

    Every time self.metadata is read, it triggers a refresh from the database.
    We were accessing it multiple times in a loop, which is not only wasteful
    but can get very O(n^2) for a WaitCondition with a high Count.

    Partial-Bug: #1306743

    Change-Id: I1a82afac6c4ef56dbf87722648034c05abef4010

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

I've figured out how to turn Server metadata back into a dumb store, so Clint's approach should have a fairly short payoff.

calls to EngineService.(create|update)_software_deployment is when the complete deployments metadata can be pushed to pollable stores (resource metadata, swift object)

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 1306743] Re: queuepool limit of size 5 overflow
Download full text (6.7 KiB)

\o/

Excerpts from Steve Baker's message of 2014-04-21 22:58:16 UTC:
> I've figured out how to turn Server metadata back into a dumb store, so
> Clint's approach should have a fairly short payoff.
>
> calls to EngineService.(create|update)_software_deployment is when the
> complete deployments metadata can be pushed to pollable stores (resource
> metadata, swift object)
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1306743
>
> Title:
> queuepool limit of size 5 overflow
>
> Status in Orchestration API (Heat):
> Triaged
>
> Bug description:
> 2014-04-11 17:39:09.591 15980 ERROR heat.openstack.common.rpc.amqp [req-96c183a9-39b3-4922-95e7-8e8de9d4f87c None] Exception during message handling
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp Traceback (most recent call last):
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/openstack/common/rpc/amqp.py", line 462, in _process_data
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp **args)
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/openstack/common/rpc/dispatcher.py", line 172, in dispatch
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/engine/service.py", line 63, in wrapped
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp return func(self, ctx, *args, **kwargs)
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/engine/service.py", line 852, in describe_stack_resource
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp s = self._get_stack(cnxt, stack_identity)
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/engine/service.py", line 338, in _get_stack
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp show_deleted=show_deleted)
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/db/api.py", line 110, in stack_get
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp tenant_safe=tenant_safe)
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/heat/db/sqlalchemy/api.py", line 271, in stack_get
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp result = model_query(context, models.Stack).get(stack_id)
> 2014-04-11 17:39:09.591 15980 TRACE heat.openstack.common.rpc.amqp File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 827, in get
> 2014-04-11 17:39:09.591 15980 TRACE heat.ope...

Read more...

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/88497
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=5808fbffe000dcc894159f1a7b5cdb32f7545d88
Submitter: Jenkins
Branch: master

commit 5808fbffe000dcc894159f1a7b5cdb32f7545d88
Author: Steve Baker <email address hidden>
Date: Fri Apr 18 15:22:30 2014 +1200

    Optional data for resource_data_get_all

    resource_data_get_all queries the database for the data, and
    transforms the results to a decrypted dict.

    If the data is already loaded then it is unnecessary to do the
    database query again.

    This change allows data to be passed in as an optional argument.
    If the passed data is not None then no database call will be made
    and the dict transformation will occur on passed data.

    Change-Id: I579225d9a3f3b038e0ca41a5900c58413d6e25ac
    Related-Bug: #1306743

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/89735

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/89736

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/89737

Changed in heat:
status: Triaged → In Progress
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/88498
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=7383f40eb013d4188c1ae4e2f41ef4b3e6acff92
Submitter: Jenkins
Branch: master

commit 7383f40eb013d4188c1ae4e2f41ef4b3e6acff92
Author: Steve Baker <email address hidden>
Date: Fri Apr 18 15:37:08 2014 +1200

    An IO optimised method for accessing resource data

    This provides a wrapper over the db_api.resource_data_* methods that
    attempts to avoid unnecessary database calls for get operations.

    On resource create, all resource data is loaded with a single query and
    stored with the resource as a dict. Calling data_set() or data_delete()
    clears this dict so the next time data() is called all resource
    data is loaded again from the database.

    There is a future potential optimisation to prefetch the resource.data on
    resource_get_by_name_and_stack, which would result in zero resource data
    queries for a Stack.load.

    This is part of an effort to reduce the number of database queries required
    for a Stack.load operation.

    Change-Id: I2564a9f953841d895acd5b853a67aa4bc375b635
    Related-Bug: #1306743

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote :

Reviewed: https://review.openstack.org/88499
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=9bd4f3fcc555a100cf1f15192e88a250daa7d2e8
Submitter: Jenkins
Branch: master

commit 9bd4f3fcc555a100cf1f15192e88a250daa7d2e8
Author: Steve Baker <email address hidden>
Date: Fri Apr 18 15:42:31 2014 +1200

    Port all resources to new resource data methods

    Change-Id: I9f7e984f5224cee16a2ffd3e9d6b118304766701
    Related-Bug: #1306743

Changed in heat:
assignee: Clint Byrum (clint-fewbar) → Steve Baker (steve-stevebaker)
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote :

Reviewed: https://review.openstack.org/89735
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=7678f0c29c7678a2030eae71695ddb02c877c803
Submitter: Jenkins
Branch: master

commit 7678f0c29c7678a2030eae71695ddb02c877c803
Author: Steve Baker <email address hidden>
Date: Wed Apr 23 09:31:36 2014 +1200

    Prefetch data in resource_get_by_name_and_stack

    This loads resource and resource data in a single query, which means
    one less sql query per stack resource in a call to describe_stack_resource.

    Change-Id: Ic2e1ab68cf651f290be75206fd5c0c841de4ef82
    Related-Bug: #1306743

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/92033

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/89736
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=96791aaa107b54908e3c7a7531d96f08cb17acd8
Submitter: Jenkins
Branch: master

commit 96791aaa107b54908e3c7a7531d96f08cb17acd8
Author: Steve Baker <email address hidden>
Date: Wed Apr 23 14:04:57 2014 +1200

    Use resource methods for metadata get/set

    The current approach of using the Metadata descriptor
    class has some issues including:
    * Unintended database queries on accessing a metadata attribute
    * get/modify/set patterns since the attribute isn't backed by a real dict
    * lack of control over whether to fetch metadata locally or from the db

    This change creates Resource methods metadata_get and metadata_set to use
    for reading and modifying resource data. This is a refactoring change only so
    there should be no change in behaviour.

    This change will help in the future with
    Related-Bug: #1306743

    Change-Id: I7cd87a071ac410a388d787f132c9aee194030714

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote :

Reviewed: https://review.openstack.org/89737
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=65c6c3bac874a6f978854895943c15c15eb542e3
Submitter: Jenkins
Branch: master

commit 65c6c3bac874a6f978854895943c15c15eb542e3
Author: Steve Baker <email address hidden>
Date: Wed Apr 23 14:49:25 2014 +1200

    Do not query database for every metadata_get

    This change stores the rsrc_metadata locally when the Resource object
    is created so that the database is not queried every time
    metadata_get is called. There are some instances where the metadata
    must come from the database (eg, polling for waitcondition signal) so
    an optional refresh arg is added to metadata_get to force a database
    refresh of the stored rsrc_metadata.

    This results in one less sql query for every time metadata_get is called,
    which will help the optimising effort for
    Related-Bug: #1306743

    Change-Id: Iad2d810c299347ae3b6a4a8329bbd314ee4b5c16

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/90002
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=7ed7031466cd77e4568de9f5ab0cee9cc5afafb8
Submitter: Jenkins
Branch: master

commit 7ed7031466cd77e4568de9f5ab0cee9cc5afafb8
Author: Steve Baker <email address hidden>
Date: Thu Apr 24 16:25:55 2014 +1200

    Fetch all db resources in one query

    Instead of calling resource_get_by_name_and_stack once for every
    resource in the stack, db_api.resource_get_all_by_stack is called
    only once for all resources.

    This reduces the number of sql queries during a describe_stack_resource
    call to 3:
    1. load the stack
    2. load the template
    3. load the resources

    This is a big improvement over the start of this patch series, which
    is 2 queries plus:
    * 1 per resource in the stack
    * 1 per access to a resource metadata attribute
    * 1 per access to a resource data value

    There is probably still potential to reduce queries from 3, but this
    may well be a fix for
    Partial-Bug: 1306743

    Change-Id: I80be5d3de8744813d974f2e9860c148ad258f385

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/93134

Changed in heat:
assignee: Steve Baker (steve-stevebaker) → Clint Byrum (clint-fewbar)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/92033
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=f8b15273a25ea665b600a4a8a468245d6a42282f
Submitter: Jenkins
Branch: master

commit f8b15273a25ea665b600a4a8a468245d6a42282f
Author: Steve Baker <email address hidden>
Date: Mon May 5 11:43:38 2014 +1200

    Make Server metadata a passive store again

    This change reverts the software deployments change which polls
    the deployments API whenever a Server resource metadata is read.

    The Server metadata has reverted to being a standard resource
    metadata store, and the deployments are pushed into this metadata
    directly in the database whenever the deployments API is used to
    create or update a deployment.

    As part of the effort to not require a Stack.load for polling metadata
    this change is for
    Partial-Bug: #1306743

    Change-Id: Ib461e1bab816d4b9e332786004c53a2f73d773e8

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Can this be marked Fix Committed? It can always be re-opened if it becomes a scaling limit again.

Revision history for this message
Thom Leggett (tteggel) wrote :

I think we can close this now - this is no longer the scaling limit.

Changed in heat:
status: In Progress → Fix Committed
tags: added: icehouse-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/95940

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (stable/icehouse)

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/95941

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/95942

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/95943

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/95944

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/95946

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/95947

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/95950

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/icehouse)

Reviewed: https://review.openstack.org/95940
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=368b6d39be6006c3aef429d47bc644713255b31e
Submitter: Jenkins
Branch: stable/icehouse

commit 368b6d39be6006c3aef429d47bc644713255b31e
Author: Zane Bitter <email address hidden>
Date: Thu Apr 17 18:42:03 2014 -0400

    Avoid redundant polling of DB for metadata

    Every time self.metadata is read, it triggers a refresh from the database.
    We were accessing it multiple times in a loop, which is not only wasteful
    but can get very O(n^2) for a WaitCondition with a high Count.

    Partial-Bug: #1306743

    Change-Id: I1a82afac6c4ef56dbf87722648034c05abef4010

tags: added: in-stable-icehouse
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

I don't think this is critical for icehouse. Its critical for tripleo which runs heat trunk

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

According to https://wiki.openstack.org/wiki/Bugs

Critical == "Data corruption / complete failure affecting most users, no workaround"

I am on the fence if we can count users of 30+ server stacks with post-boot metadata as "most". Without data, I tend to err on the side of the higher importance.

Anyway, it has landed in stable/icehouse right? So is the importance going to drive any decision making at this point?

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Most of the optimisations haven't landed, and some of them involve internal API changes so they may not be appropriate to backport.

(I think they can be backported as API additions with non-deprecated shims)

https://review.openstack.org/#/q/status:open+project:openstack/heat+branch:stable/icehouse+topic:bug/1306743,n,z

Alan Pevec (apevec)
tags: removed: in-stable-icehouse
Thierry Carrez (ttx)
Changed in heat:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (stable/icehouse)

Change abandoned by Steve Baker (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/95941

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Steve Baker (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/95942

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Steve Baker (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/95943

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Steve Baker (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/95944

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Steve Baker (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/95946

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Steve Baker (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/95947

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Steve Baker (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/95950

Thierry Carrez (ttx)
Changed in heat:
milestone: juno-1 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (master)

Change abandoned by Clint 'SpamapS' Byrum (<email address hidden>) on branch: master
Review: https://review.openstack.org/93134
Reason: ENOTIME

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.