NFS server "shutting down socket" during large file copies Ubunutu 14.04.4 LTS

Bug #1602827 reported by Gus Hoppes on 2016-07-13
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

This is very similar to bug 585657: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/585657
Difference is my server doesn't freeze, it closes the socket. System integrity is still responsive.

I'm running NFS version 4 on Ubuntu 14.04.4 LTS and during large rsync file copies (200MB+) they die midway. This is killing my zimbra email backups :(. I haven't had the system freeze during this scenario but what ends up happening is the copy fails on the client side and then my NFS server shuts the socket down due to excessive failures. Smaller file copies work perfectly fine.

Here are the client/server logs.

#Client
Jul 9 01:00:39 mailserv01 kernel: [591473.216168] nfs: server NFS-Backup01 not responding, still trying
Jul 9 01:00:39 mailserv01 kernel: [591473.216174] nfs: server NFS-Backup01 not responding, still trying
Jul 9 01:00:39 mailserv01 kernel: [591473.218841] nfs: server NFS-Backup01 OK
Jul 9 01:00:39 mailserv01 kernel: [591473.219159] nfs: server NFS-Backup01 OK
Jul 9 01:01:14 mailserv01 kernel: [591508.710863] nfs: server NFS-Backup01 not responding, still trying
Jul 9 01:08:16 mailserv01 kernel: [591929.870358] nfs: server NFS-Backup01 OK
Jul 9 01:08:44 mailserv01 kernel: [591958.514953] nfs: server NFS-Backup01 not responding, still trying
Jul 9 01:10:23 mailserv01 kernel: [592056.840273] nfs: server NFS-Backup01 OK
Jul 9 01:11:19 mailserv01 kernel: [592113.298900] nfs: server NFS-Backup01 not responding, still trying
Jul 9 01:13:07 mailserv01 kernel: [592221.275966] nfs: server NFS-Backup01 OK
Jul 9 01:15:22 mailserv01 kernel: [592355.580401] nfs: server NFS-Backup01 not responding, still trying
Jul 9 01:15:22 mailserv01 kernel: [592355.615649] nfs: server NFS-Backup01 OK
Jul 9 01:16:03 mailserv01 kernel: [592397.459848] nfs: server NFS-Backup01 not responding, still trying
Jul 9 01:16:08 mailserv01 kernel: [592401.663381] nfs: server NFS-Backup01 OK
Jul 9 01:17:06 mailserv01 kernel: [592460.381059] nfs: server NFS-Backup01 not responding, still trying

#NFS Server
Jul 9 01:01:51 NFS-Backup01 kernel: [130980.762798] RPC request reserved 156 but used 176
Jul 9 01:08:28 NFS-Backup01 kernel: [131377.839675] RPC request reserved 156 but used 176
Jul 9 01:08:46 NFS-Backup01 kernel: [131395.943517] rpc-srv/tcp: nfsd: got error -32 when sending 136 bytes - shutting down socket
Jul 9 01:10:17 NFS-Backup01 kernel: [131486.962508] RPC request reserved 156 but used 176
Jul 9 01:10:22 NFS-Backup01 kernel: [131492.447565] RPC request reserved 156 but used 176
Jul 9 01:10:27 NFS-Backup01 kernel: [131496.837561] rpc-srv/tcp: nfsd: got error -32 when sending 136 bytes - shutting down socket
Jul 9 01:10:54 NFS-Backup01 kernel: [131524.423501] RPC request reserved 156 but used 176
Jul 9 01:15:31 NFS-Backup01 kernel: [131801.603553] rpc-srv/tcp: nfsd: got error -32 when sending 136 bytes - shutting down socket
Jul 9 01:16:55 NFS-Backup01 kernel: [131885.232973] rpc-srv/tcp: nfsd: got error -32 when sending 136 bytes - shutting down socket

#Client fstab
NFS-Backup01:/mnt/backup /mnt/backup nfs rsize=8192,wsize=8192,nfsvers=4,hard,timeo=14,intr

#NFS Server fstab
/mnt/backup /mnt/backup none bind 0 0

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1602827

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Gus Hoppes (g-style) on 2016-07-13
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Gus Hoppes (g-style) on 2016-07-15
Changed in linux (Ubuntu):
status: Confirmed → New
status: New → Confirmed
Gus Hoppes (g-style) on 2016-07-18
Changed in linux (Ubuntu):
status: Confirmed → New
Brad Figg (brad-figg) wrote :

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1602827

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Gus Hoppes (g-style) on 2016-07-18
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.7 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.7-rc7

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Gus Hoppes (g-style) wrote :

No this did not happen after an upgrade. I built 3 brand new ubuntu 14.04.4 LTS servers all pushing nightly backups to my NFS server. I have no problem with the other two servers as their data is not large. They still work fine. Its always the one with the large data that seems to choke. Which of course is my main mail server.

I'll try the new upstream kernel and let you know how it goes. Thanks!

Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Luigi Molinaro (luigi-molinaro) wrote :

It is expired?
No resolution yet ?
I have upgrade my kernel to 3.19.0-80-generic ed it's affected

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers