> So if they are still failing, that would indicate the single test is
> taking more than 1 minute to complete.
AIUI, your first diagnosis was that the test itself needed more than 4s to
complete under heavy load.
That doesn't sound to be the case with 1 min, so something else is going on
> Note that if this really is the failure, the connection retry code
> that I wrote should also handle this by just retrying from the client
True. But that wouldn't address the underlying (and yet unknown) issue.
> Though in that traceback there isn't a ConnectionTimeout. Instead it
> looks like one side just closes the connection.
I had concerns about closing too much too often server-side, that may be
> Specifically, it looks like the client sends some data, and indicates
> that the message is done. The server reads this information, but feels
> there is more that needs to be said, and raises an exception, closing
> the connection to the client.
> Now, if the client is genuinely sending a malformed message, that is
> how it is supposed to work.
But in this case the failure should be permanent, not transient no ?
> If find it odd that the test highlighted here is also the one that was
> failing because of timeouts, though.
Could it be that the server is closing the wrong client connection then ?