periodic tasks should avoid synchronized execution
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo-incubator |
Fix Released
|
Medium
|
Tom Cammann |
Bug Description
Periodic tasks run at defined intervals measured from the start time of the last execution. The run_periodic_task method loops through a list of registered tasks, determines if each has passed its spacing interval since its last execution, and marks the current time as the last time it was executed. This is shown in the following code:
def run_periodic_
"""Tasks to be run at a periodic interval."""
idle_for = DEFAULT_INTERVAL
for task_name, task in self._periodic_
now = timeutils.utcnow()
spacing = self._periodic_
# If a periodic task is _nearly_ due, then we'll run it early
if spacing is not None and last_run is not None:
due = last_run + datetime.
if not timeutils.
if spacing is not None:
<---- code to execute task here ---->
return idle_for
If a periodic task is blocked for a period of time and then becomes unblocked, the rest of the tasks in the loop will run at, or close to, the time that the it became unblocked.
If the cause of the blockage also affects other nodes, periodic tasks at all nodes will become synchronized.
An example of this behavior has been observed when the nova database has had blocked transactions for a period of time due to an error. Many periodic tasks access the database and so many became blocked. When the error in the database was cleared, the periodic tasks across all Nova compute managers executed at the same time and became synchronized from then on.
Two changes can avoid or lessen this behavior, these are both almost standard practice in distributed systems:
1. in stead of setting the value of _periodic_
2. add some jitter to the value of _periodic_
Changed in oslo: | |
assignee: | nobody → Paul Murray (pmurray) |
Changed in oslo: | |
milestone: | none → juno-2 |
status: | Fix Committed → Fix Released |
Yeah, I feel like that nearly due stuff was intended to address some very specific case, and I wouldn't be sorry to see it go. That said, we've caused plenty of angst in the past by removing odd behaviors like that so we'll have to be a little careful if we do decide we want to remove it. :-)