Comment 2 for bug 618955

Revision history for this message
William Grant (wgrant) wrote : Re: [Bug 618955] Re: Resume trigger hangs buildd-manager

On Tue, 2010-08-17 at 08:31 +0000, Julian Edwards wrote:
> Actually I saw this happening on dogfood but it always came back to
> life. There was also no break in activity on other builders.

Can you try to confirm the lack of an activity break on other builders?
My testing locally showed that it did indeed block other builders, and
the production issue this morning strongly suggests it.

> The traceback shows the call stack to be updateBuilderStatus ->
> checkSlaveAlive -> self.slave.echo

Yes. The exception handler then prints the exception, and calls
handleTimeout which resumes the slave (or at least sets the flag to
request it).

> Can you explain what you mean by "timeout code doesn't work when a
> resume is triggered from within Builder" ?

There is meant to be a timeout applied to the resume trigger. This
doesn't seem to work when Builder.handleTimeout calls it, but I'm not
sure if it even works in the normal case (when called on build start).