Transfering large files to nfs mount causes system freeze
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Tim Gardner | ||
Hardy |
Fix Released
|
Undecided
|
Tim Gardner | ||
Lucid |
Fix Released
|
Undecided
|
Tim Gardner | ||
Maverick |
Fix Released
|
Undecided
|
Tim Gardner | ||
Natty |
Fix Released
|
Undecided
|
Tim Gardner |
Bug Description
Binary package hint: nfs-kernel-server
I have verified this bug on both karmic and lucid on both the server and client:
-------
Description: Ubuntu 9.10
Release: 9.10
nfs-common:
Installed: 1:1.2.0-2ubuntu8
nfs-kernel-server:
Installed: 1:1.2.0-2ubuntu8
portmap:
Installed: 6.0-10ubuntu2
-------
Description: Ubuntu 10.04 LTS
Release: 10.04
nfs-common:
Installed: 1:1.2.0-4ubuntu4
nfs-kernel-server:
Installed: 1:1.2.0-4ubuntu4
portmap:
Installed: 6.0.0-1ubuntu2
-------
Expected behavior:
Copying large files from local directories to an nfs mounted directory should complete without error.
-------
Actual behavior:
The system freezes while trying to copy large files from a local directory (e.g. /tmp) to an nfs mounted directory. This causes various things to fail to respond, ultimately resulting in a hard reboot and potential loss of data. When this occurs I am able to log into the box via ssh, but even sudo is unable to kill -9 the wayward file copy or reboot the machine gracefully.
-------
Details:
The server exports several directories, for example:
/home/shared
/home/user1/
/home/user1/
The client mounts these as follows:
server1:
server1:
server1:
I see lots of messages like this in /var/log/syslog:
May 22 10:44:31 client1 kernel: [ 1680.390484] INFO: task cp:2791 blocked for more than 120 seconds.
May 22 10:44:31 client1 kernel: [ 1680.390488] "echo 0 > /proc/sys/
May 22 10:44:31 client1 kernel: [ 1680.390492] cp D 00000000ffffffff 0 2791 2503 0x00000000
May 22 10:44:31 client1 kernel: [ 1680.390501] ffff88012a457c48 0000000000000082 0000000000015bc0 0000000000015bc0
May 22 10:44:31 client1 kernel: [ 1680.390508] ffff8801291331a0 ffff88012a457fd8 0000000000015bc0 ffff880129132de0
May 22 10:44:31 client1 kernel: [ 1680.390516] 0000000000015bc0 ffff88012a457fd8 0000000000015bc0 ffff8801291331a0
May 22 10:44:31 client1 kernel: [ 1680.390523] Call Trace:
May 22 10:44:31 client1 kernel: [ 1680.390545] [<ffffffffa0cff
May 22 10:44:31 client1 kernel: [ 1680.390552] [<ffffffff8153e
May 22 10:44:31 client1 kernel: [ 1680.390573] [<ffffffffa0cff
May 22 10:44:31 client1 kernel: [ 1680.390579] [<ffffffff8153f
May 22 10:44:31 client1 kernel: [ 1680.390587] [<ffffffff812b6
May 22 10:44:31 client1 kernel: [ 1680.390608] [<ffffffffa0cff
May 22 10:44:31 client1 kernel: [ 1680.390615] [<ffffffff8153f
May 22 10:44:31 client1 kernel: [ 1680.390622] [<ffffffff81085
May 22 10:44:31 client1 kernel: [ 1680.390643] [<ffffffffa0cff
May 22 10:44:31 client1 kernel: [ 1680.390665] [<ffffffffa0d03
May 22 10:44:31 client1 kernel: [ 1680.390688] [<ffffffffa0d04
May 22 10:44:31 client1 kernel: [ 1680.390711] [<ffffffffa0d04
May 22 10:44:31 client1 kernel: [ 1680.390733] [<ffffffffa0d04
May 22 10:44:31 client1 kernel: [ 1680.390751] [<ffffffffa0cf3
May 22 10:44:31 client1 kernel: [ 1680.390770] [<ffffffffa0cf4
May 22 10:44:31 client1 kernel: [ 1680.390777] [<ffffffff81140
May 22 10:44:31 client1 kernel: [ 1680.390783] [<ffffffff81140
May 22 10:44:31 client1 kernel: [ 1680.390790] [<ffffffff81013
Changed in linux (Ubuntu Natty): | |
status: | In Progress → Fix Released |
Changed in linux (Ubuntu Hardy): | |
assignee: | nobody → Tim Gardner (timg-tpi) |
status: | New → In Progress |
Changed in linux (Ubuntu Hardy): | |
status: | In Progress → Fix Committed |
tags: | added: verification-needed-hardy verification-needed-lucid verification-needed-maverick |
I'm seeing the same thing 10.04 64 bit.
[773760.910061] INFO: task tar:14596 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. 280>] ? nfs_wait_ bit_uninterrupt ible+0x0/ 0x20 [nfs] 5f7>] io_schedule+ 0x47/0x70 28e>] nfs_wait_ bit_uninterrupt ible+0xe/ 0x20 [nfs] c1f>] __wait_ on_bit+ 0x5f/0x90 280>] ? nfs_wait_ bit_uninterrupt ible+0x0/ 0x20 [nfs] cc8>] out_of_ line_wait_ on_bit+ 0x78/0x90 fe0>] ? wake_bit_ function+ 0x0/0x40 26f>] nfs_wait_ on_request+ 0x2f/0x40 [nfs] 66f>] nfs_wait_ on_requests_ locked+ 0x7f/0xd0 [nfs] aae>] nfs_sync_ mapping_ wait+0x9e/ 0x1a0 [nfs] e99>] nfs_write_ mapping+ 0x79/0xb0 [nfs] f07>] nfs_wb_ all+0x17/ 0x20 [nfs] e9a>] nfs_do_ fsync+0x2a/ 0x60 [nfs] 0e5>] nfs_file_ flush+0x75/ 0xa0 [nfs] f2c>] filp_close+ 0x3c/0x90 037>] sys_close+ 0xb7/0x120 1b2>] system_ call_fastpath+ 0x16/0x1b
[773760.926430] "echo 0 > /proc/sys/
[773760.958906] tar D 00000000ffffffff 0 14596 14568 0x00000004
[773760.958912] ffff8802b217dc48 0000000000000082 0000000000015bc0 0000000000015bc0
[773760.958917] ffff8801f5fc1ab0 ffff8802b217dfd8 0000000000015bc0 ffff8801f5fc16f0
[773760.958921] 0000000000015bc0 ffff8802b217dfd8 0000000000015bc0 ffff8801f5fc1ab0
[773760.958925] Call Trace:
[773760.958951] [<ffffffffa01b2
[773760.958960] [<ffffffff81555
[773760.958972] [<ffffffffa01b2
[773760.958976] [<ffffffff81555
[773760.958988] [<ffffffffa01b2
[773760.958993] [<ffffffff81555
[773760.958999] [<ffffffff81084
[773760.959011] [<ffffffffa01b2
[773760.959024] [<ffffffffa01b6
[773760.959037] [<ffffffffa01b7
[773760.959050] [<ffffffffa01b7
[773760.959062] [<ffffffffa01b7
[773760.959073] [<ffffffffa01a6
[773760.959084] [<ffffffffa01a7
[773760.959089] [<ffffffff81140
[773760.959092] [<ffffffff81141
[773760.959098] [<ffffffff81013