buildd-manager fails to deal with "Fault 8002" errors

Bug #496574 reported by Tom Haddon
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Julian Edwards

Bug Description

The buildd-manager was returning "Fault 8002" in the logs. This was preventing it from processing new builds, and we were only made aware of the problem from a user report.

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

See also bug 451351 and bug 369109

tags: added: buildd-manager soyuz-build
Changed in soyuz:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

I also got this while testing the translation templates build jobs on dogfood. Failure of the jobs themselves was sort of expected, what with some setup remaining to be done, but my job just kept being restarted.

Tom Haddon (mthaddon)
tags: added: canonical-losa-lp
Revision history for this message
Tom Haddon (mthaddon) wrote :

Happened again on four i386 buildds just now.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

In the most recent case, the builders were disabled, which is the correct thing to do. So in terms of "handling" the problem, I'm not sure what else it should be doing. I don't think performing an automatic reset is really correct in case someone needs to debug a builder problem.

Revision history for this message
Tom Haddon (mthaddon) wrote : Re: [Bug 496574] Re: buildd-manager fails to deal with "Fault 8002" errors

On Mon, 2010-06-07 at 10:22 +0000, Julian Edwards wrote:
> In the most recent case, the builders were disabled, which is the
> correct thing to do. So in terms of "handling" the problem, I'm not sure
> what else it should be doing. I don't think performing an automatic
> reset is really correct in case someone needs to debug a builder
> problem.
>

Maybe we should be approaching it slightly differently. What causes a
"Fault 8002"? If we don't exactly know, perhaps that's where the focus
of this bug should be...

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Monday 07 June 2010 12:44:29 Tom Haddon wrote:
> Maybe we should be approaching it slightly differently. What causes a
> "Fault 8002"? If we don't exactly know, perhaps that's where the focus
> of this bug should be...

It's a genuine error on the slave, and this is how Twisted XMLRPC materialises
it on the client (buildd-manager) side.

It could be a bunch of different problems but it usually indicates a fatal
problem on the slave, such as a coding error or something similiar that leads
to an exception. Disabling the builder is the right thing to do until we work
out what the problem is. In this particular case, we need to investigate more
though, I suspect that Twisted is throwing a wobbly somewhere.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

The linked branch is the buildd-manager almost-re-write. It handles failures *much* better and will shut a job down if it Goes Bad.

Changed in soyuz:
status: Triaged → In Progress
assignee: nobody → Julian Edwards (julian-edwards)
Brad Crittenden (bac)
tags: added: bad-commit-11801
Revision history for this message
Launchpad QA Bot (lpqabot) wrote : Bug fixed by a commit
Changed in soyuz:
milestone: none → 10.11
tags: added: qa-needstesting
Changed in soyuz:
status: In Progress → Fix Committed
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :

Fixed in stable r11815 (http://bazaar.launchpad.net/~launchpad-pqm/launchpad/stable/revision/11815) by a commit, but not testable.

tags: added: qa-untestable
removed: qa-ok
Revision history for this message
Robert Collins (lifeless) wrote :

Needs a deployment to the buildd-manager machine to fix-release this.

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :

Fixed in stable r11815 (http://bazaar.launchpad.net/~launchpad-pqm/launchpad/stable/revision/11815) by a commit, but not testable.

tags: added: buildd-scalability
Changed in soyuz:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.