Comment 18 for bug 1322407

Revision history for this message
Stefan Bader (smb) wrote :

Deeper inspection of the logs looks like the problem is some connection attempt when xprt is not connected. Part of that procedure is to re-use the connection which forces the xprt to disconnect (so the socket can be re-used). This triggers a state change (TCP_CLOSE) and wakes up the task waiting for the connection. But the connection state then in INPROGRESS which somehow gets translated into EGAIN and that triggers call_bind which repeats the re-use of socket process.

With that lead, I found two commits upstream referring to this commit that introduces that behaviour:

* 561ec1603171 (SUNRPC: call_connect_status should recheck bind..)

The two fixes related to that are:

* 1fa3e2e SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN tasks
* 485f225 SUNRPC: Ensure that call_connect times out correctly

The latter would at least cause timeouts to be re-adjusted before looping back into call_bind. So it might be worth trying those. I build a trusty kernel with those two patches added. The debs are at http://people.canonical.com/~smb/lp1322407/
Could you install those on the server side and see whether this helps with the problem?