nfs_getattr starvation with heavy NFS write activity
Bug #420508 reported by
Brent Nelson
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-ports-meta (Fedora) |
Fix Released
|
Critical
|
|||
linux-ports-meta (Ubuntu) |
Triaged
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: linux-image
This is a known bug in kernels prior to 2.6.25 (not sure when it was introduced). If you have a long write task (such as a dd) to an NFS mount, an "ls -l" on the NFS mount won't complete until the write finishes. If you are copying a file that takes 20 minutes to complete, a simple ls -l will also take 20 minutes. A "\ls" (ls with no arguments) will work fine.
This was fixed in a really tiny patch:
Changed in linux-ports-meta (Ubuntu): | |
status: | New → Triaged |
Changed in linux-ports-meta (Fedora): | |
status: | Unknown → Fix Released |
Changed in linux-ports-meta (Fedora): | |
importance: | Unknown → Critical |
To post a comment you must log in.
Created attachment 322418
suggested patch
POSIX requires that ctime and mtime, as reported by the stat(2) call,
reflect the activity of the most recent write(2). To that end, nfs_getattr()
flushes pending dirty writes to a file before doing a GETATTR to allow the
NFS server to set the file's size, ctime, and mtime properly.
However, nfs_getattr() can be starved when a constant stream of application inode_wait( ) from completing. This usually
writes to a file prevents nfs_sync_
results in hangs of programs doing a stat against an NFS file that is being
written. "ls -l" is a common victim of this behavior.
[root@node3 ~]# uname -a
Linux node3 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:12 EDT 2008 i686 i686
i386 GNU/Linux
[root@node3 ~]# ls -lh /tmp/Test1G
-rw-r--r-- 1 root root 1000M Oct 24 12:37 /tmp/Test1G
(from /proc/mounts): 3,rsize= 32768,wsize= 32768,hard, proto=tcp, timeo=600, retrans= 2,sec=sys, addr=cluster1
cluster1:/usr/local /usr/local nfs
rw,vers=
[root@node3 tmp]# ls /usr/local/tmp
locking
[root@node3 tmp]# time cp Test1G /usr/local/tmp &
[1] 25030
[root@node3 tmp]# time ls -lh /usr/local/tmp
real 2m13.872s
user 0m0.230s
sys 0m7.897s
total 1001M
drwxr-xr-x 2 root root 4.0K Jun 23 12:51 locking
-rw-r--r-- 1 root root 1000M Oct 28 09:39 Test1G
[1]+ Done time cp -i Test1G /usr/local/tmp
real 2m2.414s
user 0m0.230s
sys 0m8.062s
To prevent starvation, hold the file's i_mutex in nfs_getattr() to
freeze applications writes temporarily so the client can more quickly obtain
clean values for a file's size, mtime, and ctime.
Below is the upstream patch fixing this issue: git.kernel. org/?p= linux/kernel/ git/torvalds/ linux-2. 6.git;a= commit; h=28c494c5c8d42 5e15b7b82571e4d f6d6bc34594d
http://
Another interesting patch to be applied: git.kernel. org/?p= linux/kernel/ git/torvalds/ linux-2. 6.git;a= commit; h=634707388baa4 40d9c9d082cfc4c 950500c8952b
http://
they were backported to -92.EL, see the feedback below:
----
After booting the test kernel with that change, I was able to slip a few "ls" commands in during the write:
[root@node3 tmp]# time cp Test1G /usr/local/tmp &
[1] 10142
[root@node3 tmp]# time ls -lh /usr/local/tmp
total 68M
drwxr-xr-x 2 root root 4.0K Jun 23 12:51 locking
-rw-r--r-- 1 root root 68M Nov 3 10:44 Test1G
real 0m6.090s
user 0m0.001s
sys 0m0.054s
[root@node3 tmp]# time ls -lh /usr/local/tmp
total 114M
drwxr-xr-x 2 root root 4.0K Jun 23 12:51 locking
-rw-r--r-- 1 root root 114M Nov 3 10:44 Test1G
real 0m5.185s
user 0m0.005s
sys 0m0.016s
[root@node3 tmp]# time ls -lh /usr/local/tmp
total 148M
drwxr-xr-x 2 root root 4.0K Jun 23 12:51 locking
-rw-r--r-- 1 root root 148M Nov 3 10:44 Test1G
real 0m3.077s
user 0m0.000s
sys 0m0.030s
[root@node3 tmp]#
real 2m26.065s
user 0m0.211s
sys 0m8.042s
The bonus is that the impact to the write seemed to be minimal.
---
Attaching backported patch for RHEL-5 -92.el5