OpenStack Heat

Convergence: Add waiting time throttle for sync point

Bug #1529567 reported by Rico Lin on 2015-12-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Heat	Fix Released	Medium	Anant Patil	OpenStack Heat newton-1 "n1"

Bug Description

Current retry method on get sync with sync point and update database table by a while loop.
When a convergence stack with huge number of resources doing any action, all resource try to grep the sync point and cause contentions. This reduce the performance of total sync action.

We should add some method to reduce pointless contentions caused by unlimited retry.

We can add a time throttle and limit the unlimited retry. For example if we give a limit to 5 times and with time throttle set to random number between 0.5 -1 sec.

After five times of retry, resource should consider under a contention, therefore start add a waiting time throttle before next retry.

Tested by a devstack vcpu2 memory 4G
config:(without improvement)
max_resources_per_stack = -1
rpc_response_timeout = 3600 (to avoid hitting [2])

-------------------------------------------------------------------
| Size | Time taken (secs) | max_db_cons |
-------------------------------------------------------------------
| 100 | 70 | 90 |
-------------------------------------------------------------------
| 200 | 180 | 123 |
-------------------------------------------------------------------
| 400 | 665 | 125 |
------------------------------------------------------------------
| 600 | 950 | 199 |
------------------------------------------------------------------

config:(with improvement)
free_sync_retry_limit = 5
time_throttle = 0.5
max_resources_per_stack = -1
rpc_response_timeout = 3600 (to avoid hitting [2])

-------------------------------------------------------------------
| Size | Time taken (secs) | max_db_cons |
-------------------------------------------------------------------
| 100 | 44 | 100 |
-------------------------------------------------------------------
| 200 | 88 | 172 |
-------------------------------------------------------------------
| 400 | 190 | 142 |
------------------------------------------------------------------
| 600 | 330 | 240 |
------------------------------------------------------------------

Same size are tested together in different time, so we won't compare the increase ratio between size.

We can tell that with proper waiting time the overall cost of time get a pretty good improvement.

The db connection look also grep a little advance on this, but only make sure through more detail test.

[1] https://github.com/openstack/heat/blob/master/heat/engine/sync_point.py#L121

[2] https://bugs.launchpad.net/heat/+bug/1491185

Tags:

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-28: Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/261929

Changed in heat:
status:	New → In Progress

Rico Lin (rico-lin) on 2016-02-22

Changed in heat:
milestone:	none → mitaka-3

Sergey Kraynev (skraynev) on 2016-03-02

Changed in heat:
milestone:	mitaka-3 → newton-1

OpenStack Infra (hudson-openstack) on 2016-04-19

Changed in heat:
assignee:	Rico Lin (rico-lin) → Anant Patil (ananta)

OpenStack Infra (hudson-openstack) on 2016-04-19

Changed in heat:
assignee:	Anant Patil (ananta) → Rico Lin (rico-lin)

OpenStack Infra (hudson-openstack) on 2016-05-09

Changed in heat:
assignee:	Rico Lin (rico-lin) → Anant Patil (ananta)

OpenStack Infra (hudson-openstack) on 2016-05-30

Changed in heat:
assignee:	Anant Patil (ananta) → Rico Lin (rico-lin)

OpenStack Infra (hudson-openstack) on 2016-05-31

Changed in heat:
assignee:	Rico Lin (rico-lin) → Anant Patil (ananta)

Rico Lin (rico-lin) on 2016-06-01

Changed in heat:
importance:	Undecided → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-02: Fix merged to heat (master)

Reviewed: https://review.openstack.org/261929
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=f5e7a319cbb2b2fe113e3c22053e5d75424e787b
Submitter: Jenkins
Branch: master

commit f5e7a319cbb2b2fe113e3c22053e5d75424e787b
Author: Anant Patil <email address hidden>
Date: Tue Apr 19 13:49:46 2016 +0530

Convergence: Throttle to sync point updates

    Throttle sync point updates by inducing some sleep after each conflict
    and before retry. The sleeping time is randomly generated based on
    number of potential conflicts. The randomness in sleep time is required
    to reduce the number of conflicts when updating sync points.

Closes-Bug: 1529567

Change-Id: Icd36d275a0c9fd15a86de34e79312e2a857d4621

Changed in heat:
status:	In Progress → Fix Released

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-06-02: Fix included in openstack/heat 7.0.0.0b1

This issue was fixed in the openstack/heat 7.0.0.0b1 development milestone.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.