Change the way upstream jobs percentage is done

Bug #2062537 reported by Skia
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
Fix Released
Undecided
Skia

Bug Description

This code[1] is mostly broken: it's only when connecting to the queues that we decide if a worker is going to take `upstream` jobs, but this is based on randomness. I see the following flaws:
* with low number of workers (20 is low), there is a high chance that the distribution will not be uniform at all, meaning you can end up with far more or far less workers taking upstream jobs than intended.
* the dice is rolled only when the worker starts, and that means that is you end up with too many/few workers taking upstream jobs, the situation is going to stay until the workers are restarted.

Here is a quick Python one-liner to play with. It gives the number of workers that will pick up upstream jobs, over a total of 20, with a chance of 50%:
`len([r for r in [random.randint(1, 100) for _ in range(20)] if r > 50])`
For only 20 workers and 50% chances, it's common to see a run with a deviation of more than 3 points from the objective (objective: 10, we often see <=7 or >=13). This gets worse when setting a lower threshold (eg 15), because we more often hit the extreme where no worker at all would pick upstream jobs.

A better solution would be to roll the dice in the `request` function callback [2] so that for every upstream test request, the chance of processing get calculated. This will make the number of rolls proportionate with the number of jobs processed, and that means far greater numbers, which means far more accurate percentage of jobs filtered by the threshold over time.

[1]: https://git.launchpad.net/autopkgtest-cloud/tree/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker#n1482
[2]: around here: https://git.launchpad.net/autopkgtest-cloud/tree/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker#n640

Related branches

Skia (hyask)
Changed in auto-package-testing:
status: New → Fix Committed
assignee: nobody → Skia (hyask)
Revision history for this message
Tim Andersson (andersson123) wrote :

This is deployed, but before marking as Fix Released Skia would like to do some testing and take some measurements to ensure that we're getting the intended behaviour out of the code changes.

Revision history for this message
Skia (hyask) wrote :

I've made some measurements on the last 12 hours of running tests, the proportion of upstream tests seems appropriate, so I'll mark this as "Fix released"

Changed in auto-package-testing:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.