Comment 12 for bug 1357578

Brent Eagles (beagles) wrote :

Okay, after reviewing the commit messages in the patch(es) and the
comments in the code, it seems pretty obvious. It does appear that the
gist of the bug is simply "this times out but we don't know why at the
moment". Unless the timeout is set wicked short, considering the test a
valid timeout scenario seems pretty improbable. Good test to find
something so weird.

So are these a few of the choices?

- Workers don't shutdown properly when they get the signal.

- Somehow the signal isn't getting to a worker in the first place. At
  the moment, this seems possible only if something were to happen to
  the ProcessLauncher so that it would "forget" about the child/worker
  PID, i.e. leaking a child process. One scenario would be some kind of
  oddness where a worker process caused os.waitpid() to return some kind
  of status for itself in _wait_child(). The logging Jay Pipes added
  will hilight any weirdness like that.

- By some very strange twist, a previous test case didn't cleanup
  properly and there is a timeout hanging around to cause trouble? Is
  that even possible?