Build-farm slave FSM fails early on

Bug #539499 reported by Jeroen T. Vermeulen on 2010-03-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Jeroen T. Vermeulen

Bug Description

On my system at least, running in a chroot, the TranslationTemplatesBuildManager fails at an early stage when processing a real TranslationTemplatesBuildJob fed to it by the buildd master.

The slave does set up its chroot, but after that it goes (IIRC) from "unpacking" state straight to cleanup, where it breaks even more for apparently unrelated or loosely-related reasons. The FSM code receives an unexpected success code of None, which is not equal to 0 and is therefore interpreted as a failure. An attempt to close down the iteration then triggers an exception because the shutdown is supposed to happen only while in BUILDING state.

Related branches

Changed in rosetta:
importance: Undecided → High
Jeroen T. Vermeulen (jtv) wrote :

This is following the instructions on https://dev.launchpad.net/Translations/BuildTemplatesOnLocalBuildFarm

Those instructions may be wrong, or it may be something with my system... big mystery so far.

Henning Eggers (henninge) wrote :

To be clear: These are two unrelated issues.

1) Why does the "UNPACK" state come back with a "None" result?

2) iterate_CLEANUP calls _slave.buildComplete() on both failed and successful runs but that throws an exception as mentioned. Should it be doing that?

Jeroen T. Vermeulen (jtv) wrote :

wgrant reports he's seen this working, so it's probably just something in my local setup.

Let's see if we can confirm this on another machine before we worry about it.

Changed in rosetta:
status: New → Incomplete
Jeroen T. Vermeulen (jtv) wrote :

Got the little creep. TranslationTemplatesBuildJob.getName used self.job.id instead of getUtility(IBuildQueueSet).getByJob(self.job).id. So verifying it failed.

Told you this id stuff was unnecessarily complex!

Changed in rosetta:
status: Incomplete → In Progress
assignee: nobody → Jeroen T. Vermeulen (jtv)
milestone: none → 10.03
Jeroen T. Vermeulen (jtv) wrote :

And in case anyone was wondering why I didn't test for this: I did. It just so happened in the test that the matched Job and BuildQueue objects I created in the test got the same ids, and the test passed by sheer accident.

Changed in rosetta:
status: In Progress → Fix Committed
tags: added: qa-needstesting
tags: added: qa-ok
removed: qa-needstesting
Changed in rosetta:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers