Convergence: Add waiting time throttle for sync point

Bug #1529567 reported by Rico Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Anant Patil

Bug Description

Current retry method on get sync with sync point and update database table by a while loop.
When a convergence stack with huge number of resources doing any action, all resource try to grep the sync point and cause contentions. This reduce the performance of total sync action.

We should add some method to reduce pointless contentions caused by unlimited retry.

We can add a time throttle and limit the unlimited retry. For example if we give a limit to 5 times and with time throttle set to random number between 0.5 -1 sec.

After five times of retry, resource should consider under a contention, therefore start add a waiting time throttle before next retry.

Tested by a devstack vcpu2 memory 4G
config:(without improvement)
max_resources_per_stack = -1
rpc_response_timeout = 3600 (to avoid hitting [2])

 -------------------------------------------------------------------
| Size | Time taken (secs) | max_db_cons |
 -------------------------------------------------------------------
| 100 | 70 | 90 |
 -------------------------------------------------------------------
| 200 | 180 | 123 |
 -------------------------------------------------------------------
| 400 | 665 | 125 |
 ------------------------------------------------------------------
| 600 | 950 | 199 |
 ------------------------------------------------------------------

config:(with improvement)
free_sync_retry_limit = 5
time_throttle = 0.5
max_resources_per_stack = -1
rpc_response_timeout = 3600 (to avoid hitting [2])

 -------------------------------------------------------------------
| Size | Time taken (secs) | max_db_cons |
 -------------------------------------------------------------------
| 100 | 44 | 100 |
 -------------------------------------------------------------------
| 200 | 88 | 172 |
 -------------------------------------------------------------------
| 400 | 190 | 142 |
 ------------------------------------------------------------------
| 600 | 330 | 240 |
 ------------------------------------------------------------------

Same size are tested together in different time, so we won't compare the increase ratio between size.

We can tell that with proper waiting time the overall cost of time get a pretty good improvement.

The db connection look also grep a little advance on this, but only make sure through more detail test.

[1] https://github.com/openstack/heat/blob/master/heat/engine/sync_point.py#L121

[2] https://bugs.launchpad.net/heat/+bug/1491185

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/261929

Changed in heat:
status: New → In Progress
Rico Lin (rico-lin)
Changed in heat:
milestone: none → mitaka-3
Changed in heat:
milestone: mitaka-3 → newton-1
Changed in heat:
assignee: Rico Lin (rico-lin) → Anant Patil (ananta)
Changed in heat:
assignee: Anant Patil (ananta) → Rico Lin (rico-lin)
Changed in heat:
assignee: Rico Lin (rico-lin) → Anant Patil (ananta)
Changed in heat:
assignee: Anant Patil (ananta) → Rico Lin (rico-lin)
Changed in heat:
assignee: Rico Lin (rico-lin) → Anant Patil (ananta)
Rico Lin (rico-lin)
Changed in heat:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/261929
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=f5e7a319cbb2b2fe113e3c22053e5d75424e787b
Submitter: Jenkins
Branch: master

commit f5e7a319cbb2b2fe113e3c22053e5d75424e787b
Author: Anant Patil <email address hidden>
Date: Tue Apr 19 13:49:46 2016 +0530

    Convergence: Throttle to sync point updates

    Throttle sync point updates by inducing some sleep after each conflict
    and before retry. The sleeping time is randomly generated based on
    number of potential conflicts. The randomness in sleep time is required
    to reduce the number of conflicts when updating sync points.

    Closes-Bug: 1529567

    Change-Id: Icd36d275a0c9fd15a86de34e79312e2a857d4621

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/heat 7.0.0.0b1

This issue was fixed in the openstack/heat 7.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.