Transfering large files to nfs mount causes system freeze

Bug #585657 reported by Nathan Adams
164
This bug affects 31 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Tim Gardner
Hardy
Fix Released
Undecided
Tim Gardner
Lucid
Fix Released
Undecided
Tim Gardner
Maverick
Fix Released
Undecided
Tim Gardner
Natty
Fix Released
Undecided
Tim Gardner

Bug Description

Binary package hint: nfs-kernel-server

I have verified this bug on both karmic and lucid on both the server and client:

-------------------------------------------------------------------------------

Description: Ubuntu 9.10
Release: 9.10

nfs-common:
  Installed: 1:1.2.0-2ubuntu8

nfs-kernel-server:
  Installed: 1:1.2.0-2ubuntu8

portmap:
  Installed: 6.0-10ubuntu2

-------------------------------------------------------------------------------

Description: Ubuntu 10.04 LTS
Release: 10.04

nfs-common:
  Installed: 1:1.2.0-4ubuntu4

nfs-kernel-server:
  Installed: 1:1.2.0-4ubuntu4

portmap:
  Installed: 6.0.0-1ubuntu2

-------------------------------------------------------------------------------

Expected behavior:

Copying large files from local directories to an nfs mounted directory should complete without error.

-------------------------------------------------------------------------------

Actual behavior:

The system freezes while trying to copy large files from a local directory (e.g. /tmp) to an nfs mounted directory. This causes various things to fail to respond, ultimately resulting in a hard reboot and potential loss of data. When this occurs I am able to log into the box via ssh, but even sudo is unable to kill -9 the wayward file copy or reboot the machine gracefully.

-------------------------------------------------------------------------------

Details:

The server exports several directories, for example:

/home/shared
/home/user1/Documents
/home/user1/Development

The client mounts these as follows:

server1:/home/shared /home/shared nfs rw,soft,intr 0 0
server1:/home/user1/Development /home/server1/user1/Development nfs rw,soft,intr 0 0
server1:/home/user1/Documents /home/server1/user1/Documents nfs rw,soft,intr 0 0

I see lots of messages like this in /var/log/syslog:

May 22 10:44:31 client1 kernel: [ 1680.390484] INFO: task cp:2791 blocked for more than 120 seconds.
May 22 10:44:31 client1 kernel: [ 1680.390488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 22 10:44:31 client1 kernel: [ 1680.390492] cp D 00000000ffffffff 0 2791 2503 0x00000000
May 22 10:44:31 client1 kernel: [ 1680.390501] ffff88012a457c48 0000000000000082 0000000000015bc0 0000000000015bc0
May 22 10:44:31 client1 kernel: [ 1680.390508] ffff8801291331a0 ffff88012a457fd8 0000000000015bc0 ffff880129132de0
May 22 10:44:31 client1 kernel: [ 1680.390516] 0000000000015bc0 ffff88012a457fd8 0000000000015bc0 ffff8801291331a0
May 22 10:44:31 client1 kernel: [ 1680.390523] Call Trace:
May 22 10:44:31 client1 kernel: [ 1680.390545] [<ffffffffa0cff2b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390552] [<ffffffff8153eb87>] io_schedule+0x47/0x70
May 22 10:44:31 client1 kernel: [ 1680.390573] [<ffffffffa0cff2be>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390579] [<ffffffff8153f3df>] __wait_on_bit+0x5f/0x90
May 22 10:44:31 client1 kernel: [ 1680.390587] [<ffffffff812b6234>] ? __lookup_tag+0x64/0x120
May 22 10:44:31 client1 kernel: [ 1680.390608] [<ffffffffa0cff2b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390615] [<ffffffff8153f488>] out_of_line_wait_on_bit+0x78/0x90
May 22 10:44:31 client1 kernel: [ 1680.390622] [<ffffffff81085360>] ? wake_bit_function+0x0/0x40
May 22 10:44:31 client1 kernel: [ 1680.390643] [<ffffffffa0cff29f>] nfs_wait_on_request+0x2f/0x40 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390665] [<ffffffffa0d036af>] nfs_wait_on_requests_locked+0x7f/0xd0 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390688] [<ffffffffa0d04aee>] nfs_sync_mapping_wait+0x9e/0x1a0 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390711] [<ffffffffa0d04ed9>] nfs_write_mapping+0x79/0xb0 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390733] [<ffffffffa0d04f47>] nfs_wb_all+0x17/0x20 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390751] [<ffffffffa0cf3eba>] nfs_do_fsync+0x2a/0x60 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390770] [<ffffffffa0cf4105>] nfs_file_flush+0x75/0xa0 [nfs]
May 22 10:44:31 client1 kernel: [ 1680.390777] [<ffffffff8114051c>] filp_close+0x3c/0x90
May 22 10:44:31 client1 kernel: [ 1680.390783] [<ffffffff81140627>] sys_close+0xb7/0x120
May 22 10:44:31 client1 kernel: [ 1680.390790] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

Revision history for this message
Bruce Edge (bruce-edge) wrote :

I'm seeing the same thing 10.04 64 bit.

[773760.910061] INFO: task tar:14596 blocked for more than 120 seconds.
[773760.926430] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[773760.958906] tar D 00000000ffffffff 0 14596 14568 0x00000004
[773760.958912] ffff8802b217dc48 0000000000000082 0000000000015bc0 0000000000015bc0
[773760.958917] ffff8801f5fc1ab0 ffff8802b217dfd8 0000000000015bc0 ffff8801f5fc16f0
[773760.958921] 0000000000015bc0 ffff8802b217dfd8 0000000000015bc0 ffff8801f5fc1ab0
[773760.958925] Call Trace:
[773760.958951] [<ffffffffa01b2280>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
[773760.958960] [<ffffffff815555f7>] io_schedule+0x47/0x70
[773760.958972] [<ffffffffa01b228e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[773760.958976] [<ffffffff81555c1f>] __wait_on_bit+0x5f/0x90
[773760.958988] [<ffffffffa01b2280>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
[773760.958993] [<ffffffff81555cc8>] out_of_line_wait_on_bit+0x78/0x90
[773760.958999] [<ffffffff81084fe0>] ? wake_bit_function+0x0/0x40
[773760.959011] [<ffffffffa01b226f>] nfs_wait_on_request+0x2f/0x40 [nfs]
[773760.959024] [<ffffffffa01b666f>] nfs_wait_on_requests_locked+0x7f/0xd0 [nfs]
[773760.959037] [<ffffffffa01b7aae>] nfs_sync_mapping_wait+0x9e/0x1a0 [nfs]
[773760.959050] [<ffffffffa01b7e99>] nfs_write_mapping+0x79/0xb0 [nfs]
[773760.959062] [<ffffffffa01b7f07>] nfs_wb_all+0x17/0x20 [nfs]
[773760.959073] [<ffffffffa01a6e9a>] nfs_do_fsync+0x2a/0x60 [nfs]
[773760.959084] [<ffffffffa01a70e5>] nfs_file_flush+0x75/0xa0 [nfs]
[773760.959089] [<ffffffff81140f2c>] filp_close+0x3c/0x90
[773760.959092] [<ffffffff81141037>] sys_close+0xb7/0x120
[773760.959098] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

Revision history for this message
Sutanto Kurniawan (tanto) wrote :

Could this bug related to this report:
https://bugzilla.kernel.org/show_bug.cgi?id=15552 ?

Revision history for this message
Sutanto Kurniawan (tanto) wrote :

Oops, I meant this one (which included the fix):
https://bugzilla.kernel.org/show_bug.cgi?id=15578

The 15552 ones is the duplicate.

Revision history for this message
vertex.vr4 (vertex-vr4) wrote :

The patch referred to in the last post appears to be in the current kernel-image.
I believe this issue can be closed as fixed.

Regards,
John

Revision history for this message
Nathan Adams (nadams) wrote :

Please do not close this bug until:

1) a tester is able to reproduce the bug on an unpatched system, and

2) that same tester is able to verify, with certainty, that the patch resolves the problem.

Perhaps that is what you meant?

Revision history for this message
David McBride (david-mcbride) wrote :

This appears to be a duplicate of Launchpad bug #561210.

Revision history for this message
David Ressman (davidressman) wrote :

I'm not certain it's a duplicate of #561210, but I'm not certain it isn't either. This one starts from within nfs_wb_all() and the other hang starts in nfs_wb_page(). At any rate, I see this problem in 10.04 with both Ubuntu's 2.6.32-24.39 and with the stock kernel.org 2.6.32.18.

Revision history for this message
cotillion (tobias-schwan) wrote :

Is it possible, the nfs client opens too many ports, than the hardware can handle?

How large are the files, when producing this problem?

Revision history for this message
Andrew Soroka (andrew-soroka) wrote :

It happens for me every time.

I want to backup my file server 1.7TB and get about 250GB through and I get a system freeze on the client. My files are 1-4GB in size.

Reading from NFS writing to local mdadm raid5 array.

Revision history for this message
cotillion (tobias-schwan) wrote :

Hmm, dont have such big files, so I cannot reproduce the bug.

Have you tried to use the option "async" in your exports? Maybe your problem is related to the problem discussed and solved here: http://art.ubuntuforums.org/showthread.php?t=1478413

Revision history for this message
David McBride (david-mcbride) wrote :

Using "async" is not a viable workaround. From `man exports`:

       async
              This option allows the NFS server to violate the NFS protocol
              and reply to requests before any changes made by that request
              have been committed to stable storage (e.g. disc drive).

              Using this option usually improves performance, but at the cost
              that an unclean server restart (i.e. a crash) can cause data to
              be lost or corrupted.

The fact that using 'async' results in higher performance is not a surprise as it is much more careful with data-handling. The fact that (according to the forum thread) enabling it happens not to trigger this particular bug is perhaps interesting from a debugging perspective, but not an acceptable solution to the problem for most organisations.

If need to make a large file for testing, `dd if=/dev/zero of=my-large-file bs=1M count=$SIZE` will make you an arbitrarily-sized file containing all-zeroes. (Other nodes in /dev may well produce more interesting output..)

Revision history for this message
Bruce Edge (bruce-edge) wrote :

10.04.1 still has the same problem.

4 months later - critical failure and still "unassigned"?

What are canonical spending all their time on, eye candy? Come on people, this is a core failure. This is very bad. There are dozens of the same report that are all "unassigned", with one a "medium".
Jeez, Mark S should be lying awake a night over this one.

Revision history for this message
getnuked (getnuked) wrote :
Revision history for this message
getnuked (getnuked) wrote :

Ah, disregard my comment, it appears that you are already on that bug.

Revision history for this message
Nrm (smith32-35) wrote :

Hi everyone,

I've got the same problem, and if I use my WIFI card, it's "solved".
My ethernet card is :

09:00.0 Ethernet controller: Atheros Communications Atheros AR8132 / L1c Gigabit Ethernet Adapter (rev c0)

Revision history for this message
David Ressman (davidressman) wrote :

I believe this issue is solved by commit 0702099bd86c33c2dcdbd3963433a61f3f503901 (NFS: fix the return value of
nfs_file_fsync()).

Revision history for this message
Tim Gardner (timg-tpi) wrote :

David - to test your theory, how about subscribing to 'deb http://ppa.launchpad.net/kernel-ppa/ppa/ubuntu lucid main' and install linux-image-server-lts-backport-natty.

affects: nfs-utils (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Revision history for this message
David Ressman (davidressman) wrote :

Unfortunately, in the environment we have, the latest we can run is 2.6.32.xx (IB drivers, filesystem modules, etc.), so even if I installed it, I wouldn't be able to use NFS). I can verify that I added the patch from that commit into 2.6.32.24-generic and the problem disappeared. When we booted back into the stock 2.6.32.24-generic, it reappeared.

Sorry.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

That works for me. Did your patch look like this:

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Natty):
status: In Progress → Fix Released
Revision history for this message
David Ressman (davidressman) wrote :

It looked precisely like that. :)

Revision history for this message
Tim Gardner (timg-tpi) wrote :

SRU Justification

Impact: Large NFS file copies can orphan resources and block tasks

Patch Description: NFS: fix the return value of nfs_file_fsync()

Changed in linux (Ubuntu Lucid):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → Fix Committed
Changed in linux (Ubuntu Maverick):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → Fix Committed
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Hardy):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Stefan Bader (smb)
Changed in linux (Ubuntu Hardy):
status: In Progress → Fix Committed
Revision history for this message
David Ressman (davidressman) wrote :

You're a scholar and a gentleman, Tim.

Revision history for this message
Sean Clarke (sean-clarke) wrote :

I am hitting this problem under 10.10 x64:

uname -a
Linux enterprise 2.6.35-27-server #47-Ubuntu SMP Fri Feb 11 23:09:19 UTC 2011 x86_64 GNU/Linux

I've reported it under a couple of other open bugs relating to issues around this area, can you let me know what kernel version to expect the change to be rolled out in? It is a huge problem for us as we run KVM images over NFS and this happens every time.

After it happens, we also get "false" timeouts on the NFS server and the whole system stutters and stalls. The NFS server continues to serve files to other systems and can be ping'd from the failed client, it even serves files to it - but you get very regular (5 seconds?) timeout messages in the logs:

[15594.126931] nfs: server XXXXXX not responding, timed out
[15598.336861] nfs: server XXXXXX not responding, timed out
[15602.546851] nfs: server XXXXXX not responding, timed out
[15606.757764] nfs: server XXXXXX not responding, timed out
[15610.966788] nfs: server XXXXXX not responding, timed out
[15615.176756] nfs: server XXXXXX not responding, timed out

PING XXXXXX 56(84) bytes of data.
64 bytes from XXXXXX: icmp_req=1 ttl=64 time=0.097 ms
64 bytes from XXXXXX: icmp_req=2 ttl=64 time=0.059 ms
64 bytes from XXXXXX: icmp_req=3 ttl=64 time=0.079 ms

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Sean - its best if you start your own bug report using 'ubuntu-bug linux'. Your symptoms appear unrelated to this bug.

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted linux into hardy-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Martin Pitt (pitti) wrote :

Accepted linux into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Martin Pitt (pitti) wrote :

Accepted linux into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Brad Figg (brad-figg)
tags: added: verification-needed-hardy verification-needed-lucid verification-needed-maverick
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted linux-ec2 into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Steve Conklin (sconklin) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-<release>' to 'verification-done-<release>'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
jeffetflo (jeff-jeffetflo) wrote :

Sorry, but which package I have to test ?
I don't see anything nfs......

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Steve - the only kernel that I'm comfortable marking verified is Lucid. I'm happy to have you revert Maverick and Hardy so that it forces someone to do the testing as I don't have a reproducer.

tags: added: verification-done-lucid
removed: verification-needed-lucid
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Steve - I'm changing my position and am going to advocate for keeping this patch in Maverick and Hardy as its been officially accepted as a stable patch for 2.6.32.y. The code in Hardy is substantially identical wrt the use of the return value of nfs_do_fsync(). Therefore I'm marking all releases as verification-done.

tags: added: verification-done-hardy verification-done-maverick
removed: verification-needed-hardy verification-needed-maverick
Revision history for this message
Dan Bishop (danbishop) wrote :

This patch works perfectly! I can finally use NFS home directories again! :D Well... so long as I enable -proposed for now

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.6 KiB)

This bug was fixed in the package linux - 2.6.32-30.59

---------------
linux (2.6.32-30.59) lucid-proposed; urgency=low

  [ Steve Conklin ]

  * Release Tracking Bug
    - LP: #727336

  [ Tim Gardner ]

  * [Config] CONFIG_IRQ_TIME_ACCOUNTING=n
    - LP: #723819

  [ Upstream Kernel Changes ]

  * virtio_net: Add schedule check to napi_enable call
    - LP: #579276
  * NFS: fix the return value of nfs_file_fsync()
    - LP: #585657
  * block: check for proper length of iov entries earlier in
    blk_rq_map_user_iov(), CVE-2010-4163
    - LP: #721504
    - CVE-2010-4163
  * filter: make sure filters dont read uninitialized memory
    - LP: #721282
    - CVE-2010-4158
  * tty: Make tiocgicount a handler, CVE-2010-4076, CVE-2010-4077
    - LP: #720189
    - CVE-2010-4077
  * staging: usbip: remove double giveback of URB
    - LP: #723819
  * USB: EHCI: ASPM quirk of ISOC on AMD SB800
    - LP: #723819
  * rt2x00: add device id for windy31 usb device
    - LP: #723819
  * ALSA: snd-usb-us122l: Fix missing NULL checks
    - LP: #723819
  * hwmon: (via686a) Initialize fan_div values
    - LP: #723819
  * USB: serial: handle Data Carrier Detect changes
    - LP: #723819
  * USB: CP210x Add two device IDs
    - LP: #723819
  * USB: CP210x Removed incorrect device ID
    - LP: #723819
  * USB: usb-storage: unusual_devs update for Cypress ATACB
    - LP: #723819
  * USB: usb-storage: unusual_devs update for TrekStor DataStation maxi g.u
    external hard drive enclosure
    - LP: #723819
  * USB: usb-storage: unusual_devs entry for CamSport Evo
    - LP: #723819
  * USB: usb-storage: unusual_devs entry for Coby MP3 player
    - LP: #723819
  * USB: serial: Updated support for ICOM devices
    - LP: #723819
  * USB: adding USB support for Cinterion's HC2x, EU3 and PH8 products
    - LP: #723819
  * USB: EHCI: ASPM quirk of ISOC on AMD Hudson
    - LP: #723819
  * USB: EHCI: fix DMA deallocation bug
    - LP: #723819
  * USB: g_printer: fix bug in module parameter definitions
    - LP: #723819
  * USB: io_edgeport: fix the reported firmware major and minor
    - LP: #723819
  * USB: ti_usb: fix module removal
    - LP: #723819
  * USB: Storage: Add unusual_devs entry for VTech Kidizoom
    - LP: #723819
  * USB: ftdi_sio: add ST Micro Connect Lite uart support
    - LP: #723819
  * USB: cdc-acm: Adding second ACM channel support for Nokia N8
    - LP: #723819
  * USB: ftdi_sio: Add VID=0x0647, PID=0x0100 for Acton Research
    spectrograph
    - LP: #723819
  * USB: prevent buggy hubs from crashing the USB stack
    - LP: #723819
  * staging: comedi: add support for newer jr3 1-channel pci board
    - LP: #723819
  * staging: comedi: ni_labpc: Use shared IRQ for PCMCIA card
    - LP: #723819
  * Staging: hv: fix sysfs symlink on hv block device
    - LP: #723819
  * staging: hv: Enable sending GARP packet after live migration
    - LP: #723819
  * hvc_iucv: allocate memory buffers for IUCV in zone DMA
    - LP: #723819
  * iwlagn: enable only rfkill interrupt when device is down
    - LP: #723819
  * ath9k: Fix bug in delimiter padding computation
    - LP: #723819
  * correct vdso version string
    - LP: #723819
  * fix medium error problems with so...

Read more...

Changed in linux (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted linux into hardy-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.35-28.49

---------------
linux (2.6.35-28.49) maverick-proposed; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #726796

  [ Colin Ian King ]

  * SAUCE: Dell All-In-One: Remove need for Dell module alias

  [ Manoj Iyer ]

  * SAUCE: add ricoh 0xe823 pci id.
    - LP: #717435

  [ Upstream Kernel Changes ]

  * virtio_net: Add schedule check to napi_enable call
    - LP: #579276
  * mmc: make sdhci work with ricoh mmc controller
    - LP: #717435
  * NFS: fix the return value of nfs_file_fsync()
    - LP: #585657
  * rt2x00: Pad beacon to multiple of 32 bits.
    - LP: #659143
  * rt2x00: Fix firmware loading regression on x86_64.
    - LP: #659143
  * rt2x00: Check for errors from skb_pad() calls
    - LP: #659143
  * block: check for proper length of iov entries earlier in
    blk_rq_map_user_iov(), CVE-2010-4163
    - LP: #721504
    - CVE-2010-4163
  * tty: Make tiocgicount a handler, CVE-2010-4076, CVE-2010-4077
    - LP: #720189
    - CVE-2010-4077
    - CVE-2010-4076
  * rds: Integer overflow in RDS cmsg handling, CVE-2010-4175
    - LP: #721455
    - CVE-2010-4175
 -- Brad Figg <email address hidden> Mon, 28 Feb 2011 13:02:53 -0800

Changed in linux (Ubuntu Maverick):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.24-29.88

---------------
linux (2.6.24-29.88) hardy-proposed; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #736290

  [Steve Conklin]

  * Ubuntu-2.6.24-29.87
  * [Config] Allow insertchanges to work in later version chroots

  [Upstream Kernel Changes]

  * do_exit(): make sure that we run with get_fs() == USER_DS,
    CVE-2010-4258
    - LP: #723945
    - CVE-2010-4258
  * Make the bulkstat_one compat ioctl handling more sane
    - LP: #692848
  * Fix xfs_bulkstat_one size checks & error handling
    - LP: #692848
  * xfs: always use iget in bulkstat
    - LP: #692848
  * x25: Prevent crashing when parsing bad X.25 facilities CVE-2010-4164
    - LP: #731199
    - CVE-2010-4164
  * Revised [CVE-2010-4346 Hardy] install_special_mapping skips
    security_file_mmap check. CVE-2010-4346
    - LP: #731971
    - CVE-2010-4346

linux (2.6.24-29.87) hardy-proposed; urgency=low

  [ Steve Conklin ]

  * Release Tracking Bug
    - LP: #725138

  [Upstream Kernel Changes]

  * bluetooth: Fix missing NULL check, CVE-2010-4242
    - LP: #714846
    - CVE-2010-4242
  * NFS: fix the return value of nfs_file_fsync()
    - LP: #585657
  * bio: take care not overflow page count when mapping/copying user data,
    CVE-2010-4162
    - LP: #721441
    - CVE-2010-4162
  * filter: make sure filters dont read uninitialized memory
    - LP: #721282
    - CVE-2010-4158
  * tty: Make tiocgicount a handler, CVE-2010-4076, CVE-2010-4077
    - LP: #720189
    - CVE-2010-4077
  * block: check for proper length of iov entries earlier in
    blk_rq_map_user_iov(), CVE-2010-4163
    - LP: #721504
    - CVE-2010-4163
 -- Brad Figg <email address hidden> Wed, 16 Mar 2011 09:43:35 -0700

Changed in linux (Ubuntu Hardy):
status: Fix Committed → Fix Released
Revision history for this message
David McGiven (davidmcgivenn) wrote :

Sorry, I'm still having the same problem with Ubuntu 10.04.2, either using :

2.6.32-30.59
or
2.6.35-020635rc1

What should I do to fix this problem ?

Thanks.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

David - this issue was fixed after 10.04.2, so you'll need to subscribe to '-updates' in your favorite package manager.
Select System/Administration/SynapticPackageManger, then Settings/Repositories/Updates

Revision history for this message
David McGiven (davidmcgivenn) wrote :

Thanks Tim this seems to work with 2.6.32-31-server

Revision history for this message
Zach (zivester) wrote :

Apologies for being late to the party, but I'm also plagued by this bug...

I'm running 64bit Maverick, and I'm still experiencing this lockup. Isn't it suppose to be fixed with this kernel:

Linux mycomp 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux

If not, how do I get this fix?

Revision history for this message
draven (draven-sol) wrote :

I'm still facing this issue.

Client kernel: Linux hyponoia 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux
Server kernel: Linux nale 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux

Revision history for this message
Paul Crawford (psc-sat) wrote :

I may be seeing the same problem, but I am not sure.

I have a new Thecus N5200XXX NAS and when I read via NFS I get all transfers to/from the NFS mount blocked after 5-7GB typically, but for writing I got 47GB today (and managed to copy a 202GB file earlier). However, I don't know if this is a Thecus issue or Ubuntu/Linux issue.

I see it with both my 64-bit 10.04.02 LTS installation (kernel 2.6.32-32-generic #62-Ubuntu SMP Wed Apr 20 21:52:38 UTC 2011 x86_64 GNU/Linux) and my 32-bit 10.04.2 LTS installation with the 'proposed' updates with the 2.6.32-33-generic #69-Ubuntu SMP Mon Jun 27 15:36:47 UTC 2011 i686 GNU/Linux kernel (same PC dual-boot).

I don't seem to see it on my home PC (similar 10.04 LTS 32-bit 'proposed' kernel and the older Thecus N5200pro NAS) which is why initially I assumed it was a Thecus issue.

What I do see is I can access the NAS via CIFS and its web interface at the same time NFS is blocked, and can access NFS mounts on other servers as well.

There are no odd high CPU loads on PC client or NAS server, or syslog messages on the PC.

While this may be unrelated, I see others are still having problems after the fix has apparently been released so thought it may be of interest.

Revision history for this message
Ken Pratt (kenpratt) wrote :

This bug is still alive and kicking in kernel: 3.0.0-12-generic (as part of Mint 12)

Linux fit3.thepratts.info 3.0.0-12-generic #20-Ubuntu SMP Fri Oct 7 14:56:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

This is running on an unmodified up-to-date fitPC3 running Mint 12.

When I attempt to write large files from another Linux box over NFSv4 I get several hundred MBs into the copy then the NFS client connection hangs indifinately. The NFS server still responds to other clients without probelms.

Is there any way the patched that fixed this problem has been reverted in current kernels?

I have a Gig ethernet through a gig switch to a gig port on a Fit PC3 running Mint 12 and exporting a NFS share via NFSv4. I can not successfully copy a file (movie rip - large file) from a client Linux box running Ubuntu 12.04 to this server. It hangs. I can read large files without a hitch.

I realize I am late to this dialog, but am I know the only one suffering this problem? I do not want to redo everything using CIFS (SMB). However, I have never had a problem with CIFS - just don't like the permission and ownership mapping.

Revision history for this message
StormForge (br-cs) wrote :

I think I'm experiencing this bug as well. Copying large files (250GB) to an NFS server. Copy hangs with many "NFS not responding" messages. Many OS functions (like df) seem to lock up. Only recourse is a reboot.

This is on 12.04.2 running 3.2.0-48-generic x86_64 with latest updates as of July 4 2013.

Revision history for this message
Per-Inge (per-inge-hallin) wrote :

I also have this problem. Both on an Ubuntu 13.10 installation with kernel 3.12 and on a Trusty Tahr installation with kernel 3.12.0.4.6.
A file copy starts fine, but hangs soon after.
My server is a fully updated Ubuntu 12.04 server using RAID 5.1.

Revision history for this message
Per-Inge (per-inge-hallin) wrote :

I have used my test server to get some more information.
All copying are made with Nautilus.
The server is a fully updated Ubuntu 12.04 server.
The client is Ubuntu Trusty Tahr.
Copy from the server to the client with NFS works fine
Copy from the client to the server works.
Copy from the client to the server with NFS has problems. It takes about an hour to copy. When I open a new Nautilus window, all three Nautilus windows are grayed out, but recover when the copy is finished.
See the pictures.

Revision history for this message
Rob van der Linde (robvdl) wrote :

I am experiencing the same issue as described in comment #47 on Ubuntu 13.10

Both server and client are Ubuntu 13.10, copying from server to client does not cause lockups, but copying to the server will cause Dolphin to lockup until the copy is complete.

I am not sure if this is a new issue or an old bug coming back, if this has been marked as fixed in older versions of Ubuntu, yet the problem still seems to persist in 13.10.

Revision history for this message
Jander Moreira (moreira-jander) wrote :

I faced a similar issue using Ubuntu 13.10 (fully updated) on my notebook when copying/moving large files to my IOmega IX2-200 NAS (running its own Linux brand with kernel 2.6.31.8).

The system slows down and locks for some time. I've never experienced a fatal lock, but I had to wait for several minutes.

Copying from NAS to the notebook presents no noticeable locks or hangs.

Revision history for this message
Jander Moreira (moreira-jander) wrote :

Forgot to mention: the notebook runs kernel 2.6.31.8 and mounts NFS with autofs. All mounts use the default settings.

Revision history for this message
Jander Moreira (moreira-jander) wrote :

Ops. A copy/paste problem.
The notebooks actually runs kernel 3.11.0-15-generic.

Sorry for that and for the multiple postings...

Revision history for this message
Rob van der Linde (robvdl) wrote :

The systems I am running are also Ubuntu 13.10 and the 3.11 kernel.

This bug is a quite annoying, as every time I copy a file to my server over NFS, the client doing the copying will stall/freeze for a few seconds at the end of the copy and lock up Nautilus/Dolphin for a few seconds and then wake up again. I have been considering going back to SSHFS as NFS in it's current state is almost unusable with all that stalling.

I don't know if this is a bug that has resurfaced on Ubuntu 13.10 / kernel 3.11, or if this is actually a new bug that needs to be opened.

Revision history for this message
MBr (m-emanuel) wrote :

Same problem with Ubuntu Trusty "Ubuntu 14.04.1 LTS" / kernel 3.13.0-32-generic.
Both NFS client and server run this version of Ubuntu and kernel version. Trying to transfer 500GB files from mdadm raid5 to NFS using lbzip2:

[48693.533918] INFO: task lbzip2:14344 blocked for more than 120 seconds.
[48693.536784] Not tainted 3.13.0-32-generic #57-Ubuntu
[48693.539750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[48693.542764] lbzip2 D ffff88011fc94440 0 14344 14341 0x00000000
[48693.542773] ffff8800d235bb38 0000000000000002 ffff8801194497f0 ffff8800d235bfd8
[48693.542782] 0000000000014440 0000000000014440 ffff8801194497f0 ffff88011fc94cd8
[48693.542789] ffff88011ffd3f28 0000000000000002 ffffffffa0219fe0 ffff8800d235bbb0
[48693.542795] Call Trace:
[48693.542838] [<ffffffffa0219fe0>] ? nfs_free_request+0xb0/0xb0 [nfs]
[48693.542851] [<ffffffff817203fd>] io_schedule+0x9d/0x140
[48693.542877] [<ffffffffa0219fee>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[48693.542884] [<ffffffff81720882>] __wait_on_bit+0x62/0x90
[48693.542908] [<ffffffffa0219fe0>] ? nfs_free_request+0xb0/0xb0 [nfs]
[48693.542917] [<ffffffff81720927>] out_of_line_wait_on_bit+0x77/0x90
[48693.542926] [<ffffffff810aaf40>] ? autoremove_wake_function+0x40/0x40
[48693.542948] [<ffffffffa021a383>] nfs_wait_on_request+0x33/0x40 [nfs]
[48693.542971] [<ffffffffa021f2d0>] nfs_updatepage+0x150/0x650 [nfs]
[48693.542991] [<ffffffffa021096b>] nfs_write_end+0x5b/0x340 [nfs]
[48693.543000] [<ffffffff8114e616>] generic_file_buffered_write+0x156/0x250
[48693.543009] [<ffffffff8114fc81>] __generic_file_aio_write+0x1c1/0x3d0
[48693.543016] [<ffffffff8114fee8>] generic_file_aio_write+0x58/0xa0
[48693.543036] [<ffffffffa020fbdb>] nfs_file_write+0xbb/0x1d0 [nfs]
[48693.543043] [<ffffffff811bc3da>] do_sync_write+0x5a/0x90
[48693.543050] [<ffffffff811bcb64>] vfs_write+0xb4/0x1f0
[48693.543056] [<ffffffff811bd599>] SyS_write+0x49/0xa0
[48693.543063] [<ffffffff8172c87f>] tracesys+0xe1/0xe6

NFS client hardware Dell T605 server, NFS server HP Proliant ML150 G2

Revision history for this message
kevinf (kevinf) wrote :

same lockups when copying / rsync 200MB files TO FreeNas NFS server, didn't have problems reading.

Gigabyte Brix using Asix ax88179_178a Gigabit USB3 nic, tried a second nic with same result.

Kernel: 3.17.4-031704-generic (mainline)

Xubuntu Trusty 14.0.1

ii nfs-kernel-server 1:1.2.8-6ubuntu1.1
ii nfs-common 1:1.2.8-6ubuntu1.1

22699-Jan 14 14:08:44 brix kernel: [ 360.548908] INFO: task cp:5165 blocked for more than 120 seconds.
22700-Jan 14 14:08:44 brix kernel: [ 360.548912] Tainted: G OE 3.17.4-031704-generic #201411211317
22701:Jan 14 14:08:44 brix kernel: [ 360.548913] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22702-Jan 14 14:08:44 brix kernel: [ 360.548914] cp D 0000000000000007 0 5165 1 0x00000004
22703-Jan 14 14:08:44 brix kernel: [ 360.548917] ffff8801eb9b7c88 0000000000000086 ffff8801eb9b7c28 ffffffff8101e5c9
22704-Jan 14 14:08:44 brix kernel: [ 360.548919] ffff8801eb9b7fd8 00000000000145c0 ffff8800bbeee200 00000000000145c0
22705-Jan 14 14:08:44 brix kernel: [ 360.548920] ffff880213235000 ffff88003643e400 ffff8801eb9b7c88 ffff88021ebd4ec0
22706-Jan 14 14:08:44 brix kernel: [ 360.548922] Call Trace:
22707-Jan 14 14:08:44 brix kernel: [ 360.548928] [<ffffffff8101e5c9>] ? read_tsc+0x9/0x10
22708-Jan 14 14:08:44 brix kernel: [ 360.548932] [<ffffffff817a2970>] ? bit_wait+0x50/0x50
22709-Jan 14 14:08:44 brix kernel: [ 360.548933] [<ffffffff817a20c9>] schedule+0x29/0x70
22710-Jan 14 14:08:44 brix kernel: [ 360.548935] [<ffffffff817a219f>] io_schedule+0x8f/0xd0
22711-Jan 14 14:08:44 brix kernel: [ 360.548937] [<ffffffff817a299b>] bit_wait_io+0x2b/0x50
22712-Jan 14 14:08:44 brix kernel: [ 360.548939] [<ffffffff817a2865>] __wait_on_bit+0x65/0x90
22713-Jan 14 14:08:44 brix kernel: [ 360.548942] [<ffffffff811731eb>] ? find_get_pages_tag+0xcb/0x170
22714-Jan 14 14:08:44 brix kernel: [ 360.548944] [<ffffffff81172637>] wait_on_page_bit+0xc7/0xd0
22715-Jan 14 14:08:44 brix kernel: [ 360.548947] [<ffffffff810b3fd0>] ? wake_atomic_t_function+0x40/0x40
22716-Jan 14 14:08:44 brix kernel: [ 360.548949] [<ffffffff81172804>] filemap_fdatawait_range+0xf4/0x180
22717-Jan 14 14:08:44 brix kernel: [ 360.548951] [<ffffffff811747fd>] filemap_write_and_wait_range+0x4d/0x80
22718-Jan 14 14:08:44 brix kernel: [ 360.548969] [<ffffffffc01f9223>] nfs_file_fsync+0x53/0x150 [nfs]
22719-Jan 14 14:08:44 brix kernel: [ 360.548974] [<ffffffff81219899>] vfs_fsync+0x29/0x40
22720-Jan 14 14:08:44 brix kernel: [ 360.548980] [<ffffffffc01f9cfa>] nfs_file_flush+0x8a/0xd0 [nfs]
22721-Jan 14 14:08:44 brix kernel: [ 360.548982] [<ffffffff811e743a>] filp_close+0x3a/0x90
22722-Jan 14 14:08:44 brix kernel: [ 360.548984] [<ffffffff8120709f>] __close_fd+0x8f/0xc0
22723-Jan 14 14:08:44 brix kernel: [ 360.548986] [<ffffffff811e8cd3>] SyS_close+0x23/0x50
22724-Jan 14 14:08:44 brix kernel: [ 360.548988] [<ffffffff817a656d>] system_call_fastpath+0x1a/0x1f

After reading: http://art.ubuntuforums.org/showthread.php?t=1478413 this is REALLY embarrassing.

Revision history for this message
Per-Inge (per-inge-hallin) wrote :

I have added the option tcp in the mount command in fstab. That option enables copying of at least 10 GB files to my Ubuntu server.

Revision history for this message
bananenkasper (bananenkasper) wrote :

Since ages, still the same problem.

DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=17.2
DISTRIB_CODENAME=rafaela
DISTRIB_DESCRIPTION="Linux Mint 17.2 Rafaela"

Linux 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8 09:43:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Savage-w (savage-w) wrote :
Download full text (6.2 KiB)

Still happening in 14.04 LTS

Client (1Gbps Link):
[ 3038.818986] nfs: server 10.0.0.200 not responding, timed out
[ 3038.818991] nfs: server 10.0.0.200 not responding, timed out
[ 3038.818996] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819001] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819006] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819012] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819017] nfs: server 10.0.0.200 not responding, timed out
[ 3038.958559] nfs: server 10.0.0.200 not responding, timed out

Pings are under 1ms

Crash:
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.799988] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.815847] BUG: unable to handle kernel paging request at ffffea00084c2540
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.824363] IP: [<ffffea00084c2540>] 0xffffea00084c2540
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.832785] PGD 82fff5067 PUD 82fff4067 PMD 80000008176001e3
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.841143] Oops: 0011 [#2] SMP
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.849239] Modules linked in: mptctl xt_comment iptable_filter xt_multiport ip_tables x_tables rpcsec_gss_krb5 nfsv4 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache nf_conntrack_netlink nf_conntrack nfnetlink intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw ipmi_devintf serio_raw joydev gf128mul glue_helper dcdbas ablk_helper i7core_edac acpi_power_meter gpio_ich lpc_ich ipmi_si edac_core cryptd ipmi_msghandler shpchp mac_hid lp parport tcp_htcp hid_generic mptsas mptscsih usbhid mptbase psmouse hid scsi_transport_sas pata_acpi bnx2
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.919101] CPU: 6 PID: 210 Comm: kswapd0 Tainted: G D W 3.16.0-51-generic #69~14.04.1-Ubuntu
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.937875] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.957034] task: ffff8808043c1e90 ti: ffff880802204000 task.ti: ffff880802204000
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.976975] RIP: 0010:[<ffffea00084c2540>] [<ffffea00084c2540>] 0xffffea00084c2540
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.997565] RSP: 0018:ffff880802207a40 EFLAGS: 00010282
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.007973] RAX: ffff8807f94c8848 RBX: ffff880802207db0 RCX: 0000000000000000
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.028679] RDX: ffffea00084c2540 RSI: 0000000000000002 RDI: ffffea001623df80
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.049907] RBP: ffff880802207b40 R08: ffff880002d078e8 R09: ffff880005eaf478
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.071886] R10: ffff8808022079c8 R11: ffffea003f7e0980 R12: ffffea002f1b4b60
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.094261] R13: ffff880802207bc8 R14: ffffea002f1b4b40 R15: 0000000000000001
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.116663] FS: 0000000000000000(0000) GS:ffff88102fc60000(0000) knlGS:0000000000000000
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.139733] CS: 001...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.