On Tue, 2010-08-17 at 08:31 +0000, Julian Edwards wrote:
> Actually I saw this happening on dogfood but it always came back to
> life. There was also no break in activity on other builders.
Can you try to confirm the lack of an activity break on other builders?
My testing locally showed that it did indeed block other builders, and
the production issue this morning strongly suggests it.
> The traceback shows the call stack to be updateBuilderStatus ->
> checkSlaveAlive -> self.slave.echo
Yes. The exception handler then prints the exception, and calls
handleTimeout which resumes the slave (or at least sets the flag to
request it).
> Can you explain what you mean by "timeout code doesn't work when a
> resume is triggered from within Builder" ?
There is meant to be a timeout applied to the resume trigger. This
doesn't seem to work when Builder.handleTimeout calls it, but I'm not
sure if it even works in the normal case (when called on build start).
On Tue, 2010-08-17 at 08:31 +0000, Julian Edwards wrote:
> Actually I saw this happening on dogfood but it always came back to
> life. There was also no break in activity on other builders.
Can you try to confirm the lack of an activity break on other builders?
My testing locally showed that it did indeed block other builders, and
the production issue this morning strongly suggests it.
> The traceback shows the call stack to be updateBuilderStatus ->
> checkSlaveAlive -> self.slave.echo
Yes. The exception handler then prints the exception, and calls
handleTimeout which resumes the slave (or at least sets the flag to
request it).
> Can you explain what you mean by "timeout code doesn't work when a
> resume is triggered from within Builder" ?
There is meant to be a timeout applied to the resume trigger. This handleTimeout calls it, but I'm not
doesn't seem to work when Builder.
sure if it even works in the normal case (when called on build start).