Comment 2 for bug 420508

Revision history for this message
In , Flavio (flavio-redhat-bugs) wrote :

Created attachment 322418
suggested patch

POSIX requires that ctime and mtime, as reported by the stat(2) call,
reflect the activity of the most recent write(2). To that end, nfs_getattr()
flushes pending dirty writes to a file before doing a GETATTR to allow the
NFS server to set the file's size, ctime, and mtime properly.

However, nfs_getattr() can be starved when a constant stream of application
writes to a file prevents nfs_sync_inode_wait() from completing. This usually
results in hangs of programs doing a stat against an NFS file that is being
written. "ls -l" is a common victim of this behavior.

[root@node3 ~]# uname -a
Linux node3 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:12 EDT 2008 i686 i686
i386 GNU/Linux

[root@node3 ~]# ls -lh /tmp/Test1G
-rw-r--r-- 1 root root 1000M Oct 24 12:37 /tmp/Test1G

(from /proc/mounts):
cluster1:/usr/local /usr/local nfs
rw,vers=3,rsize=32768,wsize=32768,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=cluster1

[root@node3 tmp]# ls /usr/local/tmp
locking
[root@node3 tmp]# time cp Test1G /usr/local/tmp &
[1] 25030
[root@node3 tmp]# time ls -lh /usr/local/tmp

real 2m13.872s
user 0m0.230s
sys 0m7.897s
total 1001M
drwxr-xr-x 2 root root 4.0K Jun 23 12:51 locking
-rw-r--r-- 1 root root 1000M Oct 28 09:39 Test1G
[1]+ Done time cp -i Test1G /usr/local/tmp

real 2m2.414s
user 0m0.230s
sys 0m8.062s

To prevent starvation, hold the file's i_mutex in nfs_getattr() to
freeze applications writes temporarily so the client can more quickly obtain
clean values for a file's size, mtime, and ctime.

Below is the upstream patch fixing this issue:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=28c494c5c8d425e15b7b82571e4df6d6bc34594d

Another interesting patch to be applied:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=634707388baa440d9c9d082cfc4c950500c8952b

they were backported to -92.EL, see the feedback below:
----
After booting the test kernel with that change, I was able to slip a few "ls" commands in during the write:

[root@node3 tmp]# time cp Test1G /usr/local/tmp &
[1] 10142
[root@node3 tmp]# time ls -lh /usr/local/tmp
total 68M
drwxr-xr-x 2 root root 4.0K Jun 23 12:51 locking
-rw-r--r-- 1 root root 68M Nov 3 10:44 Test1G

real 0m6.090s
user 0m0.001s
sys 0m0.054s
[root@node3 tmp]# time ls -lh /usr/local/tmp
total 114M
drwxr-xr-x 2 root root 4.0K Jun 23 12:51 locking
-rw-r--r-- 1 root root 114M Nov 3 10:44 Test1G

real 0m5.185s
user 0m0.005s
sys 0m0.016s
[root@node3 tmp]# time ls -lh /usr/local/tmp
total 148M
drwxr-xr-x 2 root root 4.0K Jun 23 12:51 locking
-rw-r--r-- 1 root root 148M Nov 3 10:44 Test1G

real 0m3.077s
user 0m0.000s
sys 0m0.030s
[root@node3 tmp]#
real 2m26.065s
user 0m0.211s
sys 0m8.042s

The bonus is that the impact to the write seemed to be minimal.
---

Attaching backported patch for RHEL-5 -92.el5