Intervals should be staggered

Bug #1788518 reported by Ryan Finnie on 2018-08-22
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Landscape Client
High
Unassigned

Bug Description

The various intervals appear to be statically determined from program start. As a practical example, a rebooted VM host with a large number of guests means all the guests start landscape-client at about the same time. Then every 30 minutes, package-monitor runs (which is a rather heavy operation). Multiply by a hundred or so and you've got a large flood on the host.

When the client starts up, it would be good to have the initial schedule for each timer be ((interval / 2) + random(interval)), so the collective load is spread out over the entire interval.

Tags: is Edit Tag help
Changed in landscape-client:
importance: Undecided → High
tags: added: is
Joel Sing (jsing) wrote :

In addition to (or instead of) spreading the initial schedule, it would be beneficial to +/- a random percentage (e.g. between -10% and +10%) to the next interval on each run. This means that instead of running at hard intervals, the runs become skewed but remain at the given interval on average. This avoids synchronisation and consistent load spikes if the initial offset results in it matching another fixed processing schedule (e.g. from a cronjob). The same applies with load distribution on the server side (if applicable).

no longer affects: ubuntu
Changed in landscape-client:
status: New → Triaged
Ryan Finnie (fo0bar) wrote :

The attached patch spreads out the various reactor calls +/- 10% and was used as a cowboy in our use case. It's not production ready in that it should be a configurable option (and would also need to do the initial stagger which I couldn't figure out how to do -- mine works over time, not immediately), but should be a good starting point.

Ryan Finnie (fo0bar) wrote :

Sorry, I should clarify this patch does NOT solve the problem over time, since the +/- 10% theoretically averages out to zero over time. What solved our synchronized load spikes was pushing out the patched landscape-client via landscape, so the units updated themselves over an N-hour window and spread out the initial timers.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers