buildd handling lives in ivory tower of perfect networks
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
High
|
Julian Edwards |
Bug Description
The buildd design seems to assume perfect networks. If a connection drops, not only is the current build lost forever, but the buildd itself is marked NOT OK and never retried until a human comes along to reset it.
This is really bad for a couple of reasons:
(1) Even in our data centre, the network is not perfect, cables get knocked out, typos happen in firewall scripts etc. And if it's the right cable or machine, it can affect all buildds at once which affects a lot of builds and is a lot of human work to recover from.
(2) The launchpad buildd stuff is meant to scale to remote buildds connected to the build-master over the internet. The connection there is going to be even more fragile and it's entirely possible that connection drops will be routine.
Related branches
- Jonathan Lange (community): Approve
-
Diff: 7192 lines (+2211/-3509)24 files modifiedlib/lp/buildmaster/doc/builder.txt (+2/-118)
lib/lp/buildmaster/interfaces/builder.py (+83/-62)
lib/lp/buildmaster/manager.py (+205/-469)
lib/lp/buildmaster/model/builder.py (+240/-224)
lib/lp/buildmaster/model/buildfarmjobbehavior.py (+60/-52)
lib/lp/buildmaster/model/packagebuild.py (+6/-0)
lib/lp/buildmaster/tests/mock_slaves.py (+157/-32)
lib/lp/buildmaster/tests/test_builder.py (+582/-154)
lib/lp/buildmaster/tests/test_manager.py (+248/-782)
lib/lp/buildmaster/tests/test_packagebuild.py (+12/-0)
lib/lp/code/model/recipebuilder.py (+32/-28)
lib/lp/soyuz/browser/tests/test_builder_views.py (+1/-1)
lib/lp/soyuz/doc/buildd-dispatching.txt (+0/-371)
lib/lp/soyuz/doc/buildd-slavescanner.txt (+0/-876)
lib/lp/soyuz/model/binarypackagebuildbehavior.py (+59/-41)
lib/lp/soyuz/tests/test_binarypackagebuildbehavior.py (+290/-8)
lib/lp/soyuz/tests/test_doc.py (+0/-6)
lib/lp/testing/factory.py (+8/-2)
lib/lp/translations/doc/translationtemplatesbuildbehavior.txt (+0/-114)
lib/lp/translations/model/translationtemplatesbuildbehavior.py (+20/-14)
lib/lp/translations/stories/buildfarm/xx-build-summary.txt (+1/-1)
lib/lp/translations/tests/test_translationtemplatesbuildbehavior.py (+202/-153)
lib/lp_sitecustomize.py (+3/-0)
utilities/migrater/file-ownership.txt (+0/-1)
- Brad Crittenden (community): Approve (code)
Changed in soyuz: | |
status: | New → Triaged |
importance: | Undecided → Medium |
tags: | added: buildd-manager |
tags: | added: canonical-losa-lp |
Changed in soyuz: | |
status: | Triaged → In Progress |
assignee: | nobody → Julian Edwards (julian-edwards) |
tags: | added: bad-commit-11801 |
tags: |
added: qa-ok removed: qa-needstesting |
tags: | added: buildd-scalability |
Changed in soyuz: | |
status: | Fix Committed → Fix Released |
I don't think this happens any more, marking as released.