Slowdown in create stack request between Havana and Icehouse

Bug #1324102 reported by William C. Arnold
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Unassigned

Bug Description

The time to respond to a heat stack create request, for a moderate size pattern with 10 VMs, increased 2-3X from Havana stable to Icehouse stable in devstack configurations. (Timings may have improved slightly post-icehouse). The delay for a response to a stack create can increase to 10s of seconds under load.
Config: single machine devstacks, with fake hypervisor driver, 16 fake hypervisors, running in VM with 8 vcpus, 16GB memory, rabbitmq/mysql, flat networking

Template:
heat_template_version: '2013-05-23'
description: Start 10 VMs
resources:
  VM_0:
    type: OS::Nova::Server
    properties:
      key_name: mykey
      flavor: m1.micro
      image: F17-i386-cfntools
      security_groups: [test]
  VM_1:
    type: OS::Nova::Server
...
through VM_9

Test did sequential stack create requests using CLI, with no (except cli and test script overhead) delay between requests.
The timings are representative. They changed very little (several percent) between test runs.

havana/stable (2014/05/27)

+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| e5f6f965-82a9-481a-a26d-0eb77c7b8a77 | T10VMs_01 | CREATE_IN_PROGRESS | 2014-05-27T21:54:34Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m2.488s
user 0m0.280s
sys 0m0.096s
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| e5f6f965-82a9-481a-a26d-0eb77c7b8a77 | T10VMs_01 | CREATE_IN_PROGRESS | 2014-05-27T21:54:34Z |
| 959320a8-5e86-4138-a82c-fb859462be97 | T10VMs_02 | CREATE_IN_PROGRESS | 2014-05-27T21:54:37Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m2.554s
user 0m0.256s
sys 0m0.120s

icehouse/stable 2014/05/27
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 0e502e52-6f51-4391-b2e5-8a6dfeabcc32 | T10VMs_01 | CREATE_IN_PROGRESS | 2014-05-27T23:07:19Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m7.305s
user 0m0.308s
sys 0m0.076s
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 0e502e52-6f51-4391-b2e5-8a6dfeabcc32 | T10VMs_01 | CREATE_IN_PROGRESS | 2014-05-27T23:07:19Z |
| a00ecb2a-6fb5-4b0f-8029-d3608b4dcbd8 | T10VMs_02 | CREATE_IN_PROGRESS | 2014-05-27T23:07:27Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m8.492s
user 0m0.208s
sys 0m0.152s

Master 2014/05/27

+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 3d104172-cdd9-4308-a9f8-95064630d1e0 | T10VMs_01 | CREATE_IN_PROGRESS | 2014-05-27T21:55:10Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m6.554s
user 0m0.296s
sys 0m0.116s
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 3d104172-cdd9-4308-a9f8-95064630d1e0 | T10VMs_01 | CREATE_IN_PROGRESS | 2014-05-27T21:55:10Z |
| 37b681c4-170c-4d97-ae4c-e87857f83ec2 | T10VMs_02 | CREATE_IN_PROGRESS | 2014-05-27T21:55:18Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m7.464s
user 0m0.328s
sys 0m0.068s

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

The fake hypervisor driver would help, but it is still hard to know if the performance regression came from heat or another openstack component.

Can you try running stable/icehouse heat against stable/havana everything-else?

Changed in heat:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
William C. Arnold (barnold-8) wrote :

OK, here's the same test for stable/icehouse heat, stable/havana for the other openstack services, using the same VM. (Devstack cloned master for the python clients - can we assume that's not an issue for performance? (Devstack might have a bug here?))
The timings are very similar to those for pure stable/icehouse.
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| d2aeb4da-7668-4b06-93d7-74cef8559357 | T10VMs_01 | CREATE_IN_PROGRESS | 2014-05-31T01:23:50Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m8.071s
user 0m0.324s
sys 0m0.052s
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| d2aeb4da-7668-4b06-93d7-74cef8559357 | T10VMs_01 | CREATE_IN_PROGRESS | 2014-05-31T01:23:50Z |
| b8b8921f-6bb1-4060-be79-ba04c9d1b041 | T10VMs_02 | CREATE_IN_PROGRESS | 2014-05-31T01:24:00Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m10.745s
user 0m0.320s
sys 0m0.032s

Revision history for this message
William C. Arnold (barnold-8) wrote :
Download full text (7.5 KiB)

Some further investigation:
I added some trace decorators to a branch of the master heat code to track down time usage in heat stack create CLI calls. This is the time that it takes for the heat api to respond to a POST, and to a subsequent GET (list) of stacks. Using fake hypervisor driver to be consistent with the above.
The test case here was a simple single VM template, below. Validation and stack store are chewing up nearly all the time in a create. A stack with more VM resources would have multiple occurrences of heat.engine.resources.server.validate() section
Note also, this is from a post-icehouse devstack, which I found to be slightly faster than Icehouse for this particular test. (~15-20%, but only a couple of measurements of Icehouse)

description: Start 1 VMs
resources:
  VM_0:
    type: OS::Nova::Server
    properties:
      key_name: mykey
      flavor: m1.micro
      image: F17-i386-cfntools
      security_groups: [test]

2014-06-05 21:56:39.759 INFO heat.engine.service [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter create_stack
2014-06-05 21:56:39.768 INFO heat.engine.service [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter _validate_new_stack
2014-06-05 21:56:39.779 INFO heat.engine.service [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] exit _validate_new_stack
2014-06-05 21:56:39.781 INFO heat.engine.parameters [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter validate
2014-06-05 21:56:39.781 INFO heat.engine.parameters [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter validate
2014-06-05 21:56:39.782 INFO heat.engine.parameters [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] exit validate
2014-06-05 21:56:39.782 INFO heat.engine.parameters [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter validate
2014-06-05 21:56:39.782 INFO heat.engine.parameters [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] exit validate
2014-06-05 21:56:39.783 INFO heat.engine.parameters [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] exit validate
2014-06-05 21:56:39.783 INFO heat.engine.service [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter _validate_deferred_auth_context
2014-06-05 21:56:39.784 INFO heat.engine.service [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] exit _validate_deferred_auth_context
2014-06-05 21:56:39.784 INFO heat.engine.parser [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter validate
2014-06-05 21:56:39.784 INFO heat.engine.template [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter validate
2014-06-05 21:56:39.785 INFO heat.engine.template [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] exit validate
2014-06-05 21:56:39.786 INFO heat.engine.parameter_groups [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter validate
2014-06-05 21:56:39.787 INFO heat.engine.parameter_groups [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] exit validate
2014-06-05 21:56:39.791 INFO heat.engine.resources.server [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter validate
2014-06-05 21:56:39.791 INFO heat.engine.resource [req-5a03a85f-2916-46c5-bfe9-9237da69bf88 admin admin] enter ...

Read more...

Revision history for this message
William C. Arnold (barnold-8) wrote :

I tried with patch https://review.openstack.org/#/c/97569/ (Use auth_token_info to initialize heat_keystoneclient sessions), using same vm, heat updated to master (2014/06/10 11:20 Eastern Time)
With this patch, the timings are much improved. Still not at havana levels, but closer.

ubuntu@wcadevstack01:~/heat_tests$ source start_4_stacks.sh
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 622c9c07-4f7a-4fc4-a459-9c6ab7ba782c | T10VMs_01 | CREATE_IN_PROGRESS | 2014-06-10T15:20:45Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m3.751s
user 0m0.320s
sys 0m0.096s
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 622c9c07-4f7a-4fc4-a459-9c6ab7ba782c | T10VMs_01 | CREATE_IN_PROGRESS | 2014-06-10T15:20:45Z |
| ebade198-c634-49de-b11f-93109010f08e | T10VMs_02 | CREATE_IN_PROGRESS | 2014-06-10T15:20:50Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m4.605s
user 0m0.316s
sys 0m0.100s

Changed in heat:
status: Incomplete → New
Revision history for this message
Steven Hardy (shardy) wrote :

It's kind of interesting that you're seeing havana performance is much better, since that code did request a token via the code in heat_keystoneclient, just like the icehouse code does.

Sounds like further profiling ref comment #3 would be worthwhile, so we can figure out where else needs optimising.

Revision history for this message
Steven Hardy (shardy) wrote :

Possibly related to the validation overhead you're seeing:

https://github.com/openstack/heat/blob/master/heat/engine/properties.py#L338

We're revalidating every time a property is accessed, which could be very expensive, particularly if you're using any custom contraints which involve a call to another service.

Working on a patch to see if we can move that validation into the validate method instead.

Revision history for this message
William C. Arnold (barnold-8) wrote :

FYI,
Reversion of https://review.openstack.org/#/c/97569/ killed performance for this test, roughly doubling the time for the create-stack request to respond (note timings include the stack-list done by the cli).

ubuntu@wcadevstack01:~/heat_tests$ source start_4_stacks.sh
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 47bf4447-86bf-4a58-ad63-2879caf67b3f | T10VMs_01 | CREATE_IN_PROGRESS | 2014-06-17T15:32:08Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m10.257s
user 0m0.308s
sys 0m0.112s
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 47bf4447-86bf-4a58-ad63-2879caf67b3f | T10VMs_01 | CREATE_IN_PROGRESS | 2014-06-17T15:32:08Z |
| 501ba918-1126-4e92-af21-0544c3475076 | T10VMs_02 | CREATE_IN_PROGRESS | 2014-06-17T15:32:20Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m11.754s
user 0m0.356s
sys 0m0.072s

vs with https://review.openstack.org/#/c/97569/ reapplied

ubuntu@wcadevstack01:~/heat_tests$ source start_4_stacks.sh
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 67d37630-a6a0-46b3-9b97-0d4603211575 | T10VMs_01 | CREATE_IN_PROGRESS | 2014-06-17T15:55:03Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m4.452s
user 0m0.320s
sys 0m0.120s
+--------------------------------------+------------+--------------------+----------------------+
| id | stack_name | stack_status | creation_time |
+--------------------------------------+------------+--------------------+----------------------+
| 67d37630-a6a0-46b3-9b97-0d4603211575 | T10VMs_01 | CREATE_IN_PROGRESS | 2014-06-17T15:55:03Z |
| f07dc5fd-27b3-4271-99a4-187b2f04d248 | T10VMs_02 | CREATE_IN_PROGRESS | 2014-06-17T15:55:07Z |
+--------------------------------------+------------+--------------------+----------------------+

real 0m4.075s
user 0m0.332s
sys 0m0.088s

Revision history for this message
William C. Arnold (barnold-8) wrote :
Download full text (6.4 KiB)

Re comment #6, I added some more tracing, and found that the major performance culprit in this test is constraint validation of the image and the key. Roughly constant-time hits include checking against quotas and template store. The image is being validated twice.
2014-06-17 15:13:26.167 8381 INFO heat.engine.service [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter create_stack
  2014-06-17 15:13:26.178 8381 INFO heat.engine.service [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter _validate_new_stack
  2014-06-17 15:13:26.433 8381 INFO heat.engine.service [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] exit _validate_new_stack
2014-06-17 15:13:26.435 8381 INFO heat.engine.parameters [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate
2014-06-17 15:13:26.435 8381 INFO heat.engine.parameters [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate
2014-06-17 15:13:26.436 8381 INFO heat.engine.parameters [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] exit validate
2014-06-17 15:13:26.436 8381 INFO heat.engine.parameters [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate
2014-06-17 15:13:26.436 8381 INFO heat.engine.parameters [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] exit validate
2014-06-17 15:13:26.437 8381 INFO heat.engine.parameters [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] exit validate
2014-06-17 15:13:26.438 8381 INFO heat.engine.service [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter _validate_deferred_auth_context
2014-06-17 15:13:26.438 8381 INFO heat.engine.service [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] exit _validate_deferred_auth_context
2014-06-17 15:13:26.438 8381 INFO heat.engine.parser [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate
2014-06-17 15:13:26.439 8381 INFO heat.engine.template [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate
2014-06-17 15:13:26.439 8381 INFO heat.engine.template [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] exit validate
2014-06-17 15:13:26.440 8381 INFO heat.engine.parameter_groups [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate
2014-06-17 15:13:26.441 8381 INFO heat.engine.parameter_groups [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] exit validate
2014-06-17 15:13:26.445 8381 INFO heat.engine.resources.server [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate
2014-06-17 15:13:26.446 8381 INFO heat.engine.resource [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate
2014-06-17 15:13:26.447 8381 INFO heat.engine.properties [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate
[note these are image and key]
  2014-06-17 15:13:26.447 8381 INFO heat.engine.properties [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate_data
  2014-06-17 15:13:26.824 8381 INFO heat.engine.properties [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] exit validate_data
  2014-06-17 15:13:26.825 8381 INFO heat.engine.properties [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] enter validate_data
  2014-06-17 15:13:27.254 8381 INFO heat.engine.properties [req-a06a43f7-ad72-4e21-b513-ea5f54d79e2a None] exit validate_data
2014-06-17 15:13:27.255 8381 INFO heat.engine.propert...

Read more...

Revision history for this message
Thomas Spatzier (thomas-spatzier) wrote :

I have posted another patch which reduces the number of validations during stack create quite a bit. Can you check if this helps:

https://review.openstack.org/#/c/103397

Changed in heat:
assignee: nobody → Thomas Spatzier (thomas-spatzier)
status: New → In Progress
Revision history for this message
William C. Arnold (barnold-8) wrote :

Re patch https://review.openstack.org/#/c/103397, I tried it relative to today's master, with the same test case (flat 10 VM stack create) and same devstack configuration. The patch reduced the time for the stack create to respond by about 30-35 %.

Baseline (2014/06/30):
Run1:
real 0m11.961s
real 0m14.524s
Run2:
real 0m13.324s
real 0m17.462s

With patch https://review.openstack.org/#/c/103397
Run1:
real 0m7.826s
real 0m11.534s
Run2:
real 0m8.275s
real 0m9.458s

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/103397
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=b160972b08ba0c1c6aa90459c7c0da7da8452856
Submitter: Jenkins
Branch: master

commit b160972b08ba0c1c6aa90459c7c0da7da8452856
Author: Thomas Spatzier <email address hidden>
Date: Mon Jun 23 18:54:15 2014 +0200

    Only do property validation in validate()

    This patch changes resource property validation to only validate
    properties in the Properties.validate() method.

    So far, validation was done for each access to a property's value
    in the __getitem__ method of the Properties class. This causes a
    lot of un-necessary revalidation and especially with custom
    constraints for flavors, images etc. this can become very expensive.

    Doing validation in the validate() method should be sufficient since
    this invoked as part of the pre-processing of resource actions that
    receive new or updated properties (CREATE, UPDATE).

    As part of the change in properties.py the validate_data method of
    Property was also renamed to get_value with an optional validate
    flag, since to the most part this method is really used to get a
    sanitized representation of a property's value with only basic
    data type checking.

    Partial-Bug: #1324102

    Change-Id: Ibe0aefbe8fa554c0cf227f16f971baedaa52cd8d

Revision history for this message
William C. Arnold (barnold-8) wrote :

With the patch, before merging, of https://review.openstack.org/#/c/103397 (validate properties only on validation), I tried it relative to master as of 2014/06/30, with the same test case (flat 10 VM stack create) and same devstack configuration. The patch reduced the time for the stack create to respond by about 30-35 %.

Baseline (2014/06/30):
Run1:
real 0m11.961s
real 0m14.524s
Run2:
real 0m13.324s
real 0m17.462s

With patch https://review.openstack.org/#/c/103397
Run1:
real 0m7.826s
real 0m11.534s
Run2:
real 0m8.275s
real 0m9.458s

Revision history for this message
William C. Arnold (barnold-8) wrote :

With the merged patch https://review.openstack.org/#/c/103397 (validate properties only on validation), plus the https://review.openstack.org/#/c/99769 ("token patch"), I see additional improved on the same test case, same vm.
Response to the stack create request is now close to Havana performance for this test case. Other tests cases might show improvement over havana. (Note the numbers are for the heat cli, and have a small roughly constant overhead of about 0.5 second)

Devstack 2014/07/01 +
Run1:
real 0m4.432s
real 0m3.931s
Run2:
real 0m3.929s
real 0m3.156s

Baseline (devstack havana/stable installed 2014/07/01 same vm):
Run1:
real 0m3.456s
real 0m4.044s
Run2
real 0m3.135s
real 0m4.515s

Revision history for this message
William C. Arnold (barnold-8) wrote :

The "Use auth_token_info to initialize heat_keystoneclient sessions" patch (https://review.openstack.org/#/c/99769/) was merged yesterday, and now the baseline performance, in a devstack install, in the test used above is similar to that shown in comment #13.
Run1:
real 0m5.980s
real 0m3.894s
Run2:
real 0m3.530s
real 0m4.602s

Angus Salkeld (asalkeld)
Changed in heat:
assignee: Thomas Spatzier (thomas-spatzier) → nobody
status: In Progress → Triaged
Thomas Herve (therve)
Changed in heat:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.