Ubuntu

Transfering large files to nfs mount causes system freeze

Reported by Nathan Adams on 2010-05-26
134
This bug affects 24 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Tim Gardner
Hardy
Undecided
Tim Gardner
Lucid
Undecided
Tim Gardner
Maverick
Undecided
Tim Gardner
Natty
Undecided
Tim Gardner

Bug Description

Binary package hint: nfs-kernel-server

I have verified this bug on both karmic and lucid on both the server and client:

-------------------------------------------------------------------------------

Description: Ubuntu 9.10
Release: 9.10

nfs-common:
  Installed: 1:1.2.0-2ubuntu8

nfs-kernel-server:
  Installed: 1:1.2.0-2ubuntu8

portmap:
  Installed: 6.0-10ubuntu2

-------------------------------------------------------------------------------

Description: Ubuntu 10.04 LTS
Release: 10.04

nfs-common:
  Installed: 1:1.2.0-4ubuntu4

nfs-kernel-server:
  Installed: 1:1.2.0-4ubuntu4

portmap:
  Installed: 6.0.0-1ubuntu2

-------------------------------------------------------------------------------

Expected behavior:

Copying large files from local directories to an nfs mounted directory should complete without error.

-------------------------------------------------------------------------------

Actual behavior:

The system freezes while trying to copy large files from a local directory (e.g. /tmp) to an nfs mounted directory. This causes various things to fail to respond, ultimately resulting in a hard reboot and potential loss of data. When this occurs I am able to log into the box via ssh, but even sudo is unable to kill -9 the wayward file copy or reboot the machine gracefully.

-------------------------------------------------------------------------------

Details:

The server exports several directories, for example:

/home/shared
/home/user1/Documents
/home/user1/Development

The client mounts these as follows:

server1:/home/shared /home/shared nfs rw,soft,intr 0 0
server1:/home/user1/Development /home/server1/user1/Development nfs rw,soft,intr 0 0
server1:/home/user1/Documents /home/server1/user1/Documents nfs rw,soft,intr 0 0

I see lots of messages like this in /var/log/syslog:

May 22 10:44:31 client1 kernel: [ 1680.390484] INFO: task cp:2791 blocked for more than 120 seconds.
May 22 10:44:31 client1 kernel: [ 1680.390488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 22 10:44:31 client1 kernel: [ 1680.390492] cp D 00000000ffffffff 0 2791 2503 0x00000000
May 22 10:44:31 client1 kernel: [ 1680.390501] ffff88012a457c48 0000000000000082 0000000000015bc0 0000000000015bc0
May 22 10:44:31 client1 kernel: [ 1680.390508] ffff8801291331a0 ffff88012a457fd8 0000000000015bc0 ffff880129132de0
May 22 10:44:31 client1 kernel: [ 1680.390516] 0000000000015bc0 ffff88012a457fd8 0000000000015bc0 ffff8801291331a0
May 22 10:44:31 client1 kernel: [ 1680.390523] Call Trace:
May 22 10:44:31 client1 kernel: [ 1680.390545] [<ffffffffa0cff2b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390552] [<ffffffff8153eb87>] io_schedule+0x47/0x70
May 22 10:44:31 client1 kernel: [ 1680.390573] [<ffffffffa0cff2be>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390579] [<ffffffff8153f3df>] __wait_on_bit+0x5f/0x90
May 22 10:44:31 client1 kernel: [ 1680.390587] [<ffffffff812b6234>] ? __lookup_tag+0x64/0x120
May 22 10:44:31 client1 kernel: [ 1680.390608] [<ffffffffa0cff2b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390615] [<ffffffff8153f488>] out_of_line_wait_on_bit+0x78/0x90
May 22 10:44:31 client1 kernel: [ 1680.390622] [<ffffffff81085360>] ? wake_bit_function+0x0/0x40
May 22 10:44:31 client1 kernel: [ 1680.390643] [<ffffffffa0cff29f>] nfs_wait_on_request+0x2f/0x40 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390665] [<ffffffffa0d036af>] nfs_wait_on_requests_locked+0x7f/0xd0 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390688] [<ffffffffa0d04aee>] nfs_sync_mapping_wait+0x9e/0x1a0 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390711] [<ffffffffa0d04ed9>] nfs_write_mapping+0x79/0xb0 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390733] [<ffffffffa0d04f47>] nfs_wb_all+0x17/0x20 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390751] [<ffffffffa0cf3eba>] nfs_do_fsync+0x2a/0x60 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390770] [<ffffffffa0cf4105>] nfs_file_flush+0x75/0xa0 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390777] [<ffffffff8114051c>] filp_close+0x3c/0x90
May 22 10:44:31 client1 kernel: [ 1680.390783] [<ffffffff81140627>] sys_close+0xb7/0x120
May 22 10:44:31 client1 kernel: [ 1680.390790] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

Thag (bruce-edge) wrote :

I'm seeing the same thing 10.04 64 bit.

[773760.910061] INFO: task tar:14596 blocked for more than 120 seconds.
[773760.926430] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[773760.958906] tar D 00000000ffffffff 0 14596 14568 0x00000004
[773760.958912] ffff8802b217dc48 0000000000000082 0000000000015bc0 0000000000015bc0
[773760.958917] ffff8801f5fc1ab0 ffff8802b217dfd8 0000000000015bc0 ffff8801f5fc16f0
[773760.958921] 0000000000015bc0 ffff8802b217dfd8 0000000000015bc0 ffff8801f5fc1ab0
[773760.958925] Call Trace:
[773760.958951] [<ffffffffa01b2280>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
[773760.958960] [<ffffffff815555f7>] io_schedule+0x47/0x70
[773760.958972] [<ffffffffa01b228e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[773760.958976] [<ffffffff81555c1f>] __wait_on_bit+0x5f/0x90
[773760.958988] [<ffffffffa01b2280>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
[773760.958993] [<ffffffff81555cc8>] out_of_line_wait_on_bit+0x78/0x90
[773760.958999] [<ffffffff81084fe0>] ? wake_bit_function+0x0/0x40
[773760.959011] [<ffffffffa01b226f>] nfs_wait_on_request+0x2f/0x40 [nfs]
[773760.959024] [<ffffffffa01b666f>] nfs_wait_on_requests_locked+0x7f/0xd0 [nfs]
[773760.959037] [<ffffffffa01b7aae>] nfs_sync_mapping_wait+0x9e/0x1a0 [nfs]
[773760.959050] [<ffffffffa01b7e99>] nfs_write_mapping+0x79/0xb0 [nfs]
[773760.959062] [<ffffffffa01b7f07>] nfs_wb_all+0x17/0x20 [nfs]
[773760.959073] [<ffffffffa01a6e9a>] nfs_do_fsync+0x2a/0x60 [nfs]
[773760.959084] [<ffffffffa01a70e5>] nfs_file_flush+0x75/0xa0 [nfs]
[773760.959089] [<ffffffff81140f2c>] filp_close+0x3c/0x90
[773760.959092] [<ffffffff81141037>] sys_close+0xb7/0x120
[773760.959098] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

Sutanto Kurniawan (tanto) wrote :

Could this bug related to this report:
https://bugzilla.kernel.org/show_bug.cgi?id=15552 ?

Sutanto Kurniawan (tanto) wrote :

Oops, I meant this one (which included the fix):
https://bugzilla.kernel.org/show_bug.cgi?id=15578

The 15552 ones is the duplicate.

vertex.vr4 (vertex-vr4) wrote :

The patch referred to in the last post appears to be in the current kernel-image.
I believe this issue can be closed as fixed.

Regards,
John

Nathan Adams (nadams) wrote :

Please do not close this bug until:

1) a tester is able to reproduce the bug on an unpatched system, and

2) that same tester is able to verify, with certainty, that the patch resolves the problem.

Perhaps that is what you meant?

David McBride (david-mcbride) wrote :

This appears to be a duplicate of Launchpad bug #561210.

David Ressman (davidressman) wrote :

I'm not certain it's a duplicate of #561210, but I'm not certain it isn't either. This one starts from within nfs_wb_all() and the other hang starts in nfs_wb_page(). At any rate, I see this problem in 10.04 with both Ubuntu's 2.6.32-24.39 and with the stock kernel.org 2.6.32.18.

cotillion (tobias-schwan) wrote :

Is it possible, the nfs client opens too many ports, than the hardware can handle?

How large are the files, when producing this problem?

Andrew Soroka (andrew-soroka) wrote :

It happens for me every time.

I want to backup my file server 1.7TB and get about 250GB through and I get a system freeze on the client. My files are 1-4GB in size.

Reading from NFS writing to local mdadm raid5 array.

cotillion (tobias-schwan) wrote :

Hmm, dont have such big files, so I cannot reproduce the bug.

Have you tried to use the option "async" in your exports? Maybe your problem is related to the problem discussed and solved here: http://art.ubuntuforums.org/showthread.php?t=1478413

David McBride (david-mcbride) wrote :

Using "async" is not a viable workaround. From `man exports`:

       async
              This option allows the NFS server to violate the NFS protocol
              and reply to requests before any changes made by that request
              have been committed to stable storage (e.g. disc drive).

              Using this option usually improves performance, but at the cost
              that an unclean server restart (i.e. a crash) can cause data to
              be lost or corrupted.

The fact that using 'async' results in higher performance is not a surprise as it is much more careful with data-handling. The fact that (according to the forum thread) enabling it happens not to trigger this particular bug is perhaps interesting from a debugging perspective, but not an acceptable solution to the problem for most organisations.

If need to make a large file for testing, `dd if=/dev/zero of=my-large-file bs=1M count=$SIZE` will make you an arbitrarily-sized file containing all-zeroes. (Other nodes in /dev may well produce more interesting output..)

Thag (bruce-edge) wrote :

10.04.1 still has the same problem.

4 months later - critical failure and still "unassigned"?

What are canonical spending all their time on, eye candy? Come on people, this is a core failure. This is very bad. There are dozens of the same report that are all "unassigned", with one a "medium".
Jeez, Mark S should be lying awake a night over this one.

BlueBuntu (bluebuntu) wrote :

Ah, disregard my comment, it appears that you are already on that bug.

Nrm (smith32-35) wrote :

Hi everyone,

I've got the same problem, and if I use my WIFI card, it's "solved".
My ethernet card is :

09:00.0 Ethernet controller: Atheros Communications Atheros AR8132 / L1c Gigabit Ethernet Adapter (rev c0)

David Ressman (davidressman) wrote :

I believe this issue is solved by commit 0702099bd86c33c2dcdbd3963433a61f3f503901 (NFS: fix the return value of
nfs_file_fsync()).

Tim Gardner (timg-tpi) wrote :

David - to test your theory, how about subscribing to 'deb http://ppa.launchpad.net/kernel-ppa/ppa/ubuntu lucid main' and install linux-image-server-lts-backport-natty.

affects: nfs-utils (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
David Ressman (davidressman) wrote :

Unfortunately, in the environment we have, the latest we can run is 2.6.32.xx (IB drivers, filesystem modules, etc.), so even if I installed it, I wouldn't be able to use NFS). I can verify that I added the patch from that commit into 2.6.32.24-generic and the problem disappeared. When we booted back into the stock 2.6.32.24-generic, it reappeared.

Sorry.

Tim Gardner (timg-tpi) wrote :

That works for me. Did your patch look like this:

Tim Gardner (timg-tpi) on 2011-02-11
Changed in linux (Ubuntu Natty):
status: In Progress → Fix Released
David Ressman (davidressman) wrote :

It looked precisely like that. :)

Tim Gardner (timg-tpi) wrote :

SRU Justification

Impact: Large NFS file copies can orphan resources and block tasks

Patch Description: NFS: fix the return value of nfs_file_fsync()

Changed in linux (Ubuntu Lucid):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → Fix Committed
Changed in linux (Ubuntu Maverick):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → Fix Committed
Tim Gardner (timg-tpi) on 2011-02-14
Changed in linux (Ubuntu Hardy):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Stefan Bader (smb) on 2011-02-15
Changed in linux (Ubuntu Hardy):
status: In Progress → Fix Committed
David Ressman (davidressman) wrote :

You're a scholar and a gentleman, Tim.

Sean Clarke (sean-clarke) wrote :

I am hitting this problem under 10.10 x64:

uname -a
Linux enterprise 2.6.35-27-server #47-Ubuntu SMP Fri Feb 11 23:09:19 UTC 2011 x86_64 GNU/Linux

I've reported it under a couple of other open bugs relating to issues around this area, can you let me know what kernel version to expect the change to be rolled out in? It is a huge problem for us as we run KVM images over NFS and this happens every time.

After it happens, we also get "false" timeouts on the NFS server and the whole system stutters and stalls. The NFS server continues to serve files to other systems and can be ping'd from the failed client, it even serves files to it - but you get very regular (5 seconds?) timeout messages in the logs:

[15594.126931] nfs: server XXXXXX not responding, timed out
[15598.336861] nfs: server XXXXXX not responding, timed out
[15602.546851] nfs: server XXXXXX not responding, timed out
[15606.757764] nfs: server XXXXXX not responding, timed out
[15610.966788] nfs: server XXXXXX not responding, timed out
[15615.176756] nfs: server XXXXXX not responding, timed out

PING XXXXXX 56(84) bytes of data.
64 bytes from XXXXXX: icmp_req=1 ttl=64 time=0.097 ms
64 bytes from XXXXXX: icmp_req=2 ttl=64 time=0.059 ms
64 bytes from XXXXXX: icmp_req=3 ttl=64 time=0.079 ms

Tim Gardner (timg-tpi) wrote :

Sean - its best if you start your own bug report using 'ubuntu-bug linux'. Your symptoms appear unrelated to this bug.

Accepted linux into hardy-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Martin Pitt (pitti) wrote :

Accepted linux into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Martin Pitt (pitti) wrote :

Accepted linux into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Brad Figg (brad-figg) on 2011-03-03
tags: added: verification-needed-hardy verification-needed-lucid verification-needed-maverick
Martin Pitt (pitti) wrote :

Accepted linux-ec2 into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Steve Conklin (sconklin) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-<release>' to 'verification-done-<release>'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

jeffetflo (jeff-jeffetflo) wrote :

Sorry, but which package I have to test ?
I don't see anything nfs......

Tim Gardner (timg-tpi) wrote :

Steve - the only kernel that I'm comfortable marking verified is Lucid. I'm happy to have you revert Maverick and Hardy so that it forces someone to do the testing as I don't have a reproducer.

tags: added: verification-done-lucid
removed: verification-needed-lucid
Tim Gardner (timg-tpi) wrote :

Steve - I'm changing my position and am going to advocate for keeping this patch in Maverick and Hardy as its been officially accepted as a stable patch for 2.6.32.y. The code in Hardy is substantially identical wrt the use of the return value of nfs_do_fsync(). Therefore I'm marking all releases as verification-done.

tags: added: verification-done-hardy verification-done-maverick
removed: verification-needed-hardy verification-needed-maverick
Dan Bishop (danbishop) wrote :

This patch works perfectly! I can finally use NFS home directories again! :D Well... so long as I enable -proposed for now

Launchpad Janitor (janitor) wrote :
Download full text (8.6 KiB)

This bug was fixed in the package linux - 2.6.32-30.59

---------------
linux (2.6.32-30.59) lucid-proposed; urgency=low

  [ Steve Conklin ]

  * Release Tracking Bug
    - LP: #727336

  [ Tim Gardner ]

  * [Config] CONFIG_IRQ_TIME_ACCOUNTING=n
    - LP: #723819

  [ Upstream Kernel Changes ]

  * virtio_net: Add schedule check to napi_enable call
    - LP: #579276
  * NFS: fix the return value of nfs_file_fsync()
    - LP: #585657
  * block: check for proper length of iov entries earlier in
    blk_rq_map_user_iov(), CVE-2010-4163
    - LP: #721504
    - CVE-2010-4163
  * filter: make sure filters dont read uninitialized memory
    - LP: #721282
    - CVE-2010-4158
  * tty: Make tiocgicount a handler, CVE-2010-4076, CVE-2010-4077
    - LP: #720189
    - CVE-2010-4077
  * staging: usbip: remove double giveback of URB
    - LP: #723819
  * USB: EHCI: ASPM quirk of ISOC on AMD SB800
    - LP: #723819
  * rt2x00: add device id for windy31 usb device
    - LP: #723819
  * ALSA: snd-usb-us122l: Fix missing NULL checks
    - LP: #723819
  * hwmon: (via686a) Initialize fan_div values
    - LP: #723819
  * USB: serial: handle Data Carrier Detect changes
    - LP: #723819
  * USB: CP210x Add two device IDs
    - LP: #723819
  * USB: CP210x Removed incorrect device ID
    - LP: #723819
  * USB: usb-storage: unusual_devs update for Cypress ATACB
    - LP: #723819
  * USB: usb-storage: unusual_devs update for TrekStor DataStation maxi g.u
    external hard drive enclosure
    - LP: #723819
  * USB: usb-storage: unusual_devs entry for CamSport Evo
    - LP: #723819
  * USB: usb-storage: unusual_devs entry for Coby MP3 player
    - LP: #723819
  * USB: serial: Updated support for ICOM devices
    - LP: #723819
  * USB: adding USB support for Cinterion's HC2x, EU3 and PH8 products
    - LP: #723819
  * USB: EHCI: ASPM quirk of ISOC on AMD Hudson
    - LP: #723819
  * USB: EHCI: fix DMA deallocation bug
    - LP: #723819
  * USB: g_printer: fix bug in module parameter definitions
    - LP: #723819
  * USB: io_edgeport: fix the reported firmware major and minor
    - LP: #723819
  * USB: ti_usb: fix module removal
    - LP: #723819
  * USB: Storage: Add unusual_devs entry for VTech Kidizoom
    - LP: #723819
  * USB: ftdi_sio: add ST Micro Connect Lite uart support
    - LP: #723819
  * USB: cdc-acm: Adding second ACM channel support for Nokia N8
    - LP: #723819
  * USB: ftdi_sio: Add VID=0x0647, PID=0x0100 for Acton Research
    spectrograph
    - LP: #723819
  * USB: prevent buggy hubs from crashing the USB stack
    - LP: #723819
  * staging: comedi: add support for newer jr3 1-channel pci board
    - LP: #723819
  * staging: comedi: ni_labpc: Use shared IRQ for PCMCIA card
    - LP: #723819
  * Staging: hv: fix sysfs symlink on hv block device
    - LP: #723819
  * staging: hv: Enable sending GARP packet after live migration
    - LP: #723819
  * hvc_iucv: allocate memory buffers for IUCV in zone DMA
    - LP: #723819
  * iwlagn: enable only rfkill interrupt when device is down
    - LP: #723819
  * ath9k: Fix bug in delimiter padding computation
    - LP: #723819
  * correct vdso version string
    - LP: #723819
  * fix medium error problems with so...

Read more...

Changed in linux (Ubuntu Lucid):
status: Fix Committed → Fix Released
Martin Pitt (pitti) wrote :

Accepted linux into hardy-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.35-28.49

---------------
linux (2.6.35-28.49) maverick-proposed; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #726796

  [ Colin Ian King ]

  * SAUCE: Dell All-In-One: Remove need for Dell module alias

  [ Manoj Iyer ]

  * SAUCE: add ricoh 0xe823 pci id.
    - LP: #717435

  [ Upstream Kernel Changes ]

  * virtio_net: Add schedule check to napi_enable call
    - LP: #579276
  * mmc: make sdhci work with ricoh mmc controller
    - LP: #717435
  * NFS: fix the return value of nfs_file_fsync()
    - LP: #585657
  * rt2x00: Pad beacon to multiple of 32 bits.
    - LP: #659143
  * rt2x00: Fix firmware loading regression on x86_64.
    - LP: #659143
  * rt2x00: Check for errors from skb_pad() calls
    - LP: #659143
  * block: check for proper length of iov entries earlier in
    blk_rq_map_user_iov(), CVE-2010-4163
    - LP: #721504
    - CVE-2010-4163
  * tty: Make tiocgicount a handler, CVE-2010-4076, CVE-2010-4077
    - LP: #720189
    - CVE-2010-4077
    - CVE-2010-4076
  * rds: Integer overflow in RDS cmsg handling, CVE-2010-4175
    - LP: #721455
    - CVE-2010-4175
 -- Brad Figg <email address hidden> Mon, 28 Feb 2011 13:02:53 -0800

Changed in linux (Ubuntu Maverick):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.24-29.88

---------------
linux (2.6.24-29.88) hardy-proposed; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #736290

  [Steve Conklin]

  * Ubuntu-2.6.24-29.87
  * [Config] Allow insertchanges to work in later version chroots

  [Upstream Kernel Changes]

  * do_exit(): make sure that we run with get_fs() == USER_DS,
    CVE-2010-4258
    - LP: #723945
    - CVE-2010-4258
  * Make the bulkstat_one compat ioctl handling more sane
    - LP: #692848
  * Fix xfs_bulkstat_one size checks & error handling
    - LP: #692848
  * xfs: always use iget in bulkstat
    - LP: #692848
  * x25: Prevent crashing when parsing bad X.25 facilities CVE-2010-4164
    - LP: #731199
    - CVE-2010-4164
  * Revised [CVE-2010-4346 Hardy] install_special_mapping skips
    security_file_mmap check. CVE-2010-4346
    - LP: #731971
    - CVE-2010-4346

linux (2.6.24-29.87) hardy-proposed; urgency=low

  [ Steve Conklin ]

  * Release Tracking Bug
    - LP: #725138

  [Upstream Kernel Changes]

  * bluetooth: Fix missing NULL check, CVE-2010-4242
    - LP: #714846
    - CVE-2010-4242
  * NFS: fix the return value of nfs_file_fsync()
    - LP: #585657
  * bio: take care not overflow page count when mapping/copying user data,
    CVE-2010-4162
    - LP: #721441
    - CVE-2010-4162
  * filter: make sure filters dont read uninitialized memory
    - LP: #721282
    - CVE-2010-4158
  * tty: Make tiocgicount a handler, CVE-2010-4076, CVE-2010-4077
    - LP: #720189
    - CVE-2010-4077
  * block: check for proper length of iov entries earlier in
    blk_rq_map_user_iov(), CVE-2010-4163
    - LP: #721504
    - CVE-2010-4163
 -- Brad Figg <email address hidden> Wed, 16 Mar 2011 09:43:35 -0700

Changed in linux (Ubuntu Hardy):
status: Fix Committed → Fix Released
David McGiven (davidmcgivenn) wrote :

Sorry, I'm still having the same problem with Ubuntu 10.04.2, either using :

2.6.32-30.59
or
2.6.35-020635rc1

What should I do to fix this problem ?

Thanks.

Tim Gardner (timg-tpi) wrote :

David - this issue was fixed after 10.04.2, so you'll need to subscribe to '-updates' in your favorite package manager.
Select System/Administration/SynapticPackageManger, then Settings/Repositories/Updates

David McGiven (davidmcgivenn) wrote :

Thanks Tim this seems to work with 2.6.32-31-server

Zach (zivester) wrote :

Apologies for being late to the party, but I'm also plagued by this bug...

I'm running 64bit Maverick, and I'm still experiencing this lockup. Isn't it suppose to be fixed with this kernel:

Linux mycomp 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux

If not, how do I get this fix?

draven (draven-sol) wrote :

I'm still facing this issue.

Client kernel: Linux hyponoia 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux
Server kernel: Linux nale 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux

Paul Crawford (psc-sat) wrote :

I may be seeing the same problem, but I am not sure.

I have a new Thecus N5200XXX NAS and when I read via NFS I get all transfers to/from the NFS mount blocked after 5-7GB typically, but for writing I got 47GB today (and managed to copy a 202GB file earlier). However, I don't know if this is a Thecus issue or Ubuntu/Linux issue.

I see it with both my 64-bit 10.04.02 LTS installation (kernel 2.6.32-32-generic #62-Ubuntu SMP Wed Apr 20 21:52:38 UTC 2011 x86_64 GNU/Linux) and my 32-bit 10.04.2 LTS installation with the 'proposed' updates with the 2.6.32-33-generic #69-Ubuntu SMP Mon Jun 27 15:36:47 UTC 2011 i686 GNU/Linux kernel (same PC dual-boot).

I don't seem to see it on my home PC (similar 10.04 LTS 32-bit 'proposed' kernel and the older Thecus N5200pro NAS) which is why initially I assumed it was a Thecus issue.

What I do see is I can access the NAS via CIFS and its web interface at the same time NFS is blocked, and can access NFS mounts on other servers as well.

There are no odd high CPU loads on PC client or NAS server, or syslog messages on the PC.

While this may be unrelated, I see others are still having problems after the fix has apparently been released so thought it may be of interest.

Ken Pratt (kenpratt) wrote :

This bug is still alive and kicking in kernel: 3.0.0-12-generic (as part of Mint 12)

Linux fit3.thepratts.info 3.0.0-12-generic #20-Ubuntu SMP Fri Oct 7 14:56:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

This is running on an unmodified up-to-date fitPC3 running Mint 12.

When I attempt to write large files from another Linux box over NFSv4 I get several hundred MBs into the copy then the NFS client connection hangs indifinately. The NFS server still responds to other clients without probelms.

Is there any way the patched that fixed this problem has been reverted in current kernels?

I have a Gig ethernet through a gig switch to a gig port on a Fit PC3 running Mint 12 and exporting a NFS share via NFSv4. I can not successfully copy a file (movie rip - large file) from a client Linux box running Ubuntu 12.04 to this server. It hangs. I can read large files without a hitch.

I realize I am late to this dialog, but am I know the only one suffering this problem? I do not want to redo everything using CIFS (SMB). However, I have never had a problem with CIFS - just don't like the permission and ownership mapping.

StormForge (br-cs) wrote :

I think I'm experiencing this bug as well. Copying large files (250GB) to an NFS server. Copy hangs with many "NFS not responding" messages. Many OS functions (like df) seem to lock up. Only recourse is a reboot.

This is on 12.04.2 running 3.2.0-48-generic x86_64 with latest updates as of July 4 2013.

Per-Inge (per-inge-hallin) wrote :

I also have this problem. Both on an Ubuntu 13.10 installation with kernel 3.12 and on a Trusty Tahr installation with kernel 3.12.0.4.6.
A file copy starts fine, but hangs soon after.
My server is a fully updated Ubuntu 12.04 server using RAID 5.1.

Per-Inge (per-inge-hallin) wrote :

I have used my test server to get some more information.
All copying are made with Nautilus.
The server is a fully updated Ubuntu 12.04 server.
The client is Ubuntu Trusty Tahr.
Copy from the server to the client with NFS works fine
Copy from the client to the server works.
Copy from the client to the server with NFS has problems. It takes about an hour to copy. When I open a new Nautilus window, all three Nautilus windows are grayed out, but recover when the copy is finished.
See the pictures.

Rob van der Linde (robvdl) wrote :

I am experiencing the same issue as described in comment #47 on Ubuntu 13.10

Both server and client are Ubuntu 13.10, copying from server to client does not cause lockups, but copying to the server will cause Dolphin to lockup until the copy is complete.

I am not sure if this is a new issue or an old bug coming back, if this has been marked as fixed in older versions of Ubuntu, yet the problem still seems to persist in 13.10.

I faced a similar issue using Ubuntu 13.10 (fully updated) on my notebook when copying/moving large files to my IOmega IX2-200 NAS (running its own Linux brand with kernel 2.6.31.8).

The system slows down and locks for some time. I've never experienced a fatal lock, but I had to wait for several minutes.

Copying from NAS to the notebook presents no noticeable locks or hangs.

Forgot to mention: the notebook runs kernel 2.6.31.8 and mounts NFS with autofs. All mounts use the default settings.

Ops. A copy/paste problem.
The notebooks actually runs kernel 3.11.0-15-generic.

Sorry for that and for the multiple postings...

Rob van der Linde (robvdl) wrote :

The systems I am running are also Ubuntu 13.10 and the 3.11 kernel.

This bug is a quite annoying, as every time I copy a file to my server over NFS, the client doing the copying will stall/freeze for a few seconds at the end of the copy and lock up Nautilus/Dolphin for a few seconds and then wake up again. I have been considering going back to SSHFS as NFS in it's current state is almost unusable with all that stalling.

I don't know if this is a bug that has resurfaced on Ubuntu 13.10 / kernel 3.11, or if this is actually a new bug that needs to be opened.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.