Deeper inspection of the logs looks like the problem is some connection attempt when xprt is not connected. Part of that procedure is to re-use the connection which forces the xprt to disconnect (so the socket can be re-used). This triggers a state change (TCP_CLOSE) and wakes up the task waiting for the connection. But the connection state then in INPROGRESS which somehow gets translated into EGAIN and that triggers call_bind which repeats the re-use of socket process.
With that lead, I found two commits upstream referring to this commit that introduces that behaviour:
* 561ec1603171 (SUNRPC: call_connect_status should recheck bind..)
The two fixes related to that are:
* 1fa3e2e SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN tasks
* 485f225 SUNRPC: Ensure that call_connect times out correctly
The latter would at least cause timeouts to be re-adjusted before looping back into call_bind. So it might be worth trying those. I build a trusty kernel with those two patches added. The debs are at http://people.canonical.com/~smb/lp1322407/
Could you install those on the server side and see whether this helps with the problem?
Deeper inspection of the logs looks like the problem is some connection attempt when xprt is not connected. Part of that procedure is to re-use the connection which forces the xprt to disconnect (so the socket can be re-used). This triggers a state change (TCP_CLOSE) and wakes up the task waiting for the connection. But the connection state then in INPROGRESS which somehow gets translated into EGAIN and that triggers call_bind which repeats the re-use of socket process.
With that lead, I found two commits upstream referring to this commit that introduces that behaviour:
* 561ec1603171 (SUNRPC: call_connect_status should recheck bind..)
The two fixes related to that are:
* 1fa3e2e SUNRPC: Ensure call_connect_ status( ) deals correctly with SOFTCONN tasks
* 485f225 SUNRPC: Ensure that call_connect times out correctly
The latter would at least cause timeouts to be re-adjusted before looping back into call_bind. So it might be worth trying those. I build a trusty kernel with those two patches added. The debs are at http:// people. canonical. com/~smb/ lp1322407/
Could you install those on the server side and see whether this helps with the problem?