NFSv4 client hang under network load

Bug #1074470 reported by Konstantin L. Metlov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nfs-utils (Ubuntu)
New
Undecided
Unassigned

Bug Description

While trying to upgrade some of my systems to Ubuntu 12.04 "Precise" I'm seeing strange hangs of various processes working with files on nfs4-mounted /home. KDE sessions in particular hang very often on startup or after short usage.

In hanged state all the processes, accessing NFS-mounted /home, enter the state of uninterruptable sleep (D). Sometimes, after long wait (around 10-15 minutes) some of these processes wake up and continue, but realistically reboot is the only option to bring the machine back on-line for a brief period before the next hang. After the hang dmesg displays a number of kernel stack traces "process XXX blocked for more than YYY seconds" with "ktime_get_ts" and "rpc_make_runnable" on the top of call stack. It happens with both TCP and UDP transports.

The hang happens only when the network is loaded. When client is connected directly to the NFS server (running under ubuntu Lucid with oneiric backported kernel) via a separate Ethernet switch NFS on it works perfectly ! But, if there is network congestion, the NFS accesses randomly hang.

It is also possible to reproduce the hang by making a large rsync file transfer to the client, while accessing the NFS-mounted /home. In this case the NFS-reading processes hang almost instantly even when logging in via console.

By all symptoms this hang resembles the one fixed by "SUNRPC: Fix a UDP transport regression" in 3.2.0-32.51 Ubuntu kernel (exactly the kernel I'm using and seeng hangs on). RPC traces show a number of hanged requests, in "q:xprt_sending" state like this

Nov 2 20:22:51 XXX kernel: [15060.853376] -pid- flgs status -client- --rqstp- -timeout ---ops--
Nov 2 20:22:51 XXX kernel: [15060.853393] 9903 0821 -11 f243f000 f256d700 0 f870d0f4 nfsv4 READ a:call_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853401] 9904 0821 -11 f243f000 f256d600 0 f870d0f4 nfsv4 READ a:call_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853408] 9916 0080 -11 f243f000 f256d500 0 f86c1b18 nfsv4 STATFS a:call_connect_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853415] 9917 0080 -11 f243f000 f256d200 0 f86c1b18 nfsv4 ACCESS a:call_connect_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853423] 9914 0281 -11 f256d800 f256d300 0 f870d8ec nfsv4 RENEW a:call_status q:xprt_sending

The problem can be similar to the one, fixed by "SUNRPC: Fix a UDP transport regression", but in NFSv4.

I'm ready to provide more information on my configuration if necessary.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.