Long running builds can monopolise the build farm

Bug #393546 reported by Rockwalrus on 2009-06-29
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Michael Nelson

Bug Description

Right now the turn-around time on a Jaunty PPA build is about five hours. When I look at https://launchpad.net/builders, it looks like a lot of the time is monopolized by large numbers of automated daily builds. I have nothing against daily builds, but it would be nice if there was a way for people using such scripts to be able to voluntarily lower their build scores so that "interactive" PPA builds get through faster. I realize I could (ab)use the urgency field in the changelog to get a few extra build points, but I feel like the default of "low" is the right urgency to be using according to policy.

Related branches

Julian Edwards (julian-edwards) wrote :

My immediate reaction to this is that auto builds have the same right to the PPA build resources as anything else does.

However, I'd like to invite comments from others before I go any further with this.

tags: added: ppa soyuz-build
Changed in soyuz:
status: New → Incomplete
Rockwalrus (rockwalrus) wrote :

I'm not saying that autobuilds should be restricted from using PPA build resources, or even that they should be forced to build at a lower priority. I'm just saying that they should be given the option to do so if they so desire. If I were running an autobuild, I'd want to be able to run it at a lower priority for the same reason that I often nice CPU-intensive background tasks.

William Grant (wgrant) wrote :

The daily builds that exist now (thinking of the Mozilla and Chromium ones, in particular) take forever to build. They also build across multiple series, and three architectures. That's a lot of buildd time. With the current scheduling, that can for several hours monopolize the reduced buildd farm that we often see. While it is a useful use of buildd time, it makes the PPA service less attractive for others when there are waiting times of many hours.

But, of course, this couldn't be an easy problem to solve. How should the builds be scheduled to avoid this monopoly?

Most PPA builds take in the order of a few minutes, and code already exists to estimate the build duration. Maybe the builddmaster could refuse to fill up the buildds with builds expected to take more than, say, 10 minutes. By always leaving some fraction of buildds performing short tasks, the effect of the daily DoS could be decreased substantially.

Julian Edwards (julian-edwards) wrote :

I think this situation can be ameliorated using the fix from bug 382804. It's not automatic, but we could ask the owners of the PPAs who do daily builds if they would be happy to see their scores adjusted.

Let me know what you think.

William Grant (wgrant) wrote :

That wouldn't help too much on its own -- as soon as the queue is empty (apart from the problematic builds) for just a moment, the queued long-running low-priority builds will start, inhibiting the start of any subsequent builds.

On Tuesday 30 June 2009 10:59:12 William Grant wrote:
> That wouldn't help too much on its own -- as soon as the queue is empty
> (apart from the problematic builds) for just a moment, the queued long-
> running low-priority builds will start, inhibiting the start of any
> subsequent builds.

Well, they need to build *sometime*, is there a better time to start them than
when the queue is empty?

There is no good time to let long-running builds start on all of the buildds.

Michael Bienia (geser) wrote :

What about limiting that a PPA can only use one buildd per arch at a time? That way an upload of a long-running build for several series won't occupy all available buildds for an architecture at the same time but build them one after another.
IMHO it's fair to expect that someone (or a team) uploading a huge amount of package in short time has to wait a little bit to get them all build. If the packages are build fast that shouldn't introduce a huge extra delay (depending on the queue of course) and long-builds won't take all available buildds.
Of course a small number of teams doing daily-builds will still be able clog the buildds but they all have the same rights to use them as any other PPA user.

An other option would be that a certain amount of buildds is reserved for the "general" use and not used by the daily upload builds. That way "normal" PPA user have a chance to get their package build in a timely manner.

Which option is more preferable depends also on the usage rate of the buildds. Are there any statistics about it (what part of a day are they building or waiting for jobs)?

Julian Edwards (julian-edwards) wrote :

I'm going to triage this, but will also mention that:
 * I /think/ you only noticed this when there were some builders temporarily taken out of the rotation which made the queue get bigger
 * We're working on making the existing builders work for both PPA and Ubuntu builds
 * We're working on making the builders build for any architecture

The last two points there will help quite a lot, as will adding more builders as the PPA service increases in popularity.

Changed in soyuz:
importance: Undecided → Low
status: Incomplete → Triaged
tags: added: feature

I think Michael's suggestion of "limiting that a PPA can only use one buildd per arch at a time" is sounding quite good now.

Changed in soyuz:
importance: Low → Medium
Steve Magoun (smagoun) on 2009-10-05
tags: added: oem-services
Changed in soyuz:
importance: Medium → High
milestone: none → 3.1.10
assignee: nobody → Michael Nelson (michael.nelson)
summary: - Super-low priority option for automated PPA builds
+ Long running builds can monopolise the build farm

Just some initial planning/investigation: currently the BuilddManager.scan() iterates each builder, and effectively for each idle builder it calls builder.findBuildCandidate() - and if it finds *one* - dispatches it.

So, afaics, the main issue for implementing the above is that we have all our builders picking single builds off the one primary build queue, where as it would be handy to have instead individual queues for each builder and dispatch a set of builds from the primary queue to the individual builder queue.

Eventually I think it would be great to utilize a messenging system (RabbitMQ - where are you?) to do the above, but in the mean time, another possible solution might be to have a new build-state in between NEEDSBUILD and BUILDING - perhaps BUILDER_ASSIGNED. During builder.findBuildCandidate() - in addition to returning the next build - we could set the status of all the other builds of same priority for the same SPR.creator (? hrm... uploader?), assign the builder and set the BuildStatus.BUILDER_ASSIGNED. Subsequent calls to builder.findBuildCandidate() would always find and dispatch the assigned builds first before checking the general queue again?

Actually, as a slight improvement, the builder would first check if there are any *higher* priority builds in the build queue (and there are a number of options for what it could do if one/some are found, fairest probably being to reset existing assigned builds to NEEDSBUILD).

This behaviour in IBuilder.findBuildCandidate() would have to be optional, as I assume we would *not* want this behaviour for urgent builds (eg. security). There are a number of ways we could make it optional - ppa setting, or a priority threshold etc.

Thoughts?

Sorry - I had my head fixed on ensuring that all the builds are assigned/built by the same builder - but that's not necessary at all - so as suggested by Julian, we can simply ensure during findBuildCandidate() that we skip potential builds iff there is already a build for the same architecture+archive currently building on any builder.

Rockwalrus (rockwalrus) wrote :

The build start time estimation code will have to take the new assignment algorithm into account as well.

Very good point! But that should be a separate bug as it's more important to get this one fixed in its own right first.

Changed in soyuz:
status: Triaged → In Progress
Changed in soyuz:
status: In Progress → Fix Committed

I've outlined a few ways this initial implementation could be improved/tweaked in bug 450124.

Jeremy Bicha (jbicha) wrote :

Is it possible to restrict which builders the excessive daily builds can use? I believe Chromium takes about 3 hours to build each of the 4 i386 builds every single day which contributes to a ridiculous backlog for everyone else.

I don't think restricting by builder will help. Let's see how this fix works out and then we can decide if further action is needed.

Changed in soyuz:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers