slave-scanner shouldn't block on chroot extraction
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
launchpad-buildd |
Fix Released
|
Medium
|
Celso Providelo |
Bug Description
When the slave-scanner dispatches jobs to launchpad-buildd, it blocks on the buildd returning from the tarball extraction. Not only does this lead to timeouts on cold caches (as seen on the PPA buildds constantly, and now on some hardy-upgraded buildds), but it also means the scanner is blocking for 60+ seconds on one buildd, when it could be looping through them all to dispatch/gather other builds.
Fixing this to make chroot dispatching an async job may or may not require code changes to launchpad-buildd as well as slave-scanner. If that's the case, I'm happy to work with you guys on this, but we do need it fixed -- the overall impact of this is going from "irritating time-waster" to "(relatively) large portions of my day are wasted re-enabling timed-out buildds".
Note that this change would have a positive effect on buildd performance in general, as queues would clear faster due to faster job polling (and we'd even be able to reduce the current timeout to detect broken buildds faster!), so it seems like an all-around win.
Changed in soyuz: | |
status: | New → Confirmed |
Changed in soyuz: | |
assignee: | nobody → cprov |
importance: | Undecided → Medium |
status: | Confirmed → In Progress |
Agreed, but I believe this bug/fix belongs to the launchpad-buildd code, which is the part that actually blocks, not slave-scanner.
It isn't that simple as it looks, because unpacking the chroot is part of the start_build xmlrpc command and making this call asynchronous will possibly involve postponing this procedure to the build time, i.e. 'start_build' would return after downloading the chroot.
Anyway, this bug exists and I'm more than happy to help you with the implementation.