Bug #585657 “Transfering large files to nfs mount causes system ...” : Bugs : linux package : Ubuntu

Revision history for this message

Bruce Edge (bruce-edge) wrote on 2010-05-27:

#1

I'm seeing the same thing 10.04 64 bit.

[773760.910061] INFO: task tar:14596 blocked for more than 120 seconds.
[773760.926430] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[773760.958906] tar D 00000000ffffffff 0 14596 14568 0x00000004
[773760.958912] ffff8802b217dc48 0000000000000082 0000000000015bc0 0000000000015bc0
[773760.958917] ffff8801f5fc1ab0 ffff8802b217dfd8 0000000000015bc0 ffff8801f5fc16f0
[773760.958921] 0000000000015bc0 ffff8802b217dfd8 0000000000015bc0 ffff8801f5fc1ab0
[773760.958925] Call Trace:
[773760.958951] [<ffffffffa01b2280>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
[773760.958960] [<ffffffff815555f7>] io_schedule+0x47/0x70
[773760.958972] [<ffffffffa01b228e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[773760.958976] [<ffffffff81555c1f>] __wait_on_bit+0x5f/0x90
[773760.958988] [<ffffffffa01b2280>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
[773760.958993] [<ffffffff81555cc8>] out_of_line_wait_on_bit+0x78/0x90
[773760.958999] [<ffffffff81084fe0>] ? wake_bit_function+0x0/0x40
[773760.959011] [<ffffffffa01b226f>] nfs_wait_on_request+0x2f/0x40 [nfs]
[773760.959024] [<ffffffffa01b666f>] nfs_wait_on_requests_locked+0x7f/0xd0 [nfs]
[773760.959037] [<ffffffffa01b7aae>] nfs_sync_mapping_wait+0x9e/0x1a0 [nfs]
[773760.959050] [<ffffffffa01b7e99>] nfs_write_mapping+0x79/0xb0 [nfs]
[773760.959062] [<ffffffffa01b7f07>] nfs_wb_all+0x17/0x20 [nfs]
[773760.959073] [<ffffffffa01a6e9a>] nfs_do_fsync+0x2a/0x60 [nfs]
[773760.959084] [<ffffffffa01a70e5>] nfs_file_flush+0x75/0xa0 [nfs]
[773760.959089] [<ffffffff81140f2c>] filp_close+0x3c/0x90
[773760.959092] [<ffffffff81141037>] sys_close+0xb7/0x120
[773760.959098] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

Revision history for this message

Sutanto Kurniawan (tanto) wrote on 2010-06-19:

#2

Could this bug related to this report:
https://bugzilla.kernel.org/show_bug.cgi?id=15552 ?

Revision history for this message

Sutanto Kurniawan (tanto) wrote on 2010-06-19:

#3

Oops, I meant this one (which included the fix):
https://bugzilla.kernel.org/show_bug.cgi?id=15578

The 15552 ones is the duplicate.

Revision history for this message

vertex.vr4 (vertex-vr4) wrote on 2010-07-23:

#4

The patch referred to in the last post appears to be in the current kernel-image.
I believe this issue can be closed as fixed.

Regards,
John

Revision history for this message

Nathan Adams (nadams) wrote on 2010-07-23:

#5

Please do not close this bug until:

1) a tester is able to reproduce the bug on an unpatched system, and

2) that same tester is able to verify, with certainty, that the patch resolves the problem.

Perhaps that is what you meant?

Revision history for this message

David McBride (david-mcbride) wrote on 2010-07-26:

#6

This appears to be a duplicate of Launchpad bug #561210.

Revision history for this message

David Ressman (davidressman) wrote on 2010-08-20:

#7

I'm not certain it's a duplicate of #561210, but I'm not certain it isn't either. This one starts from within nfs_wb_all() and the other hang starts in nfs_wb_page(). At any rate, I see this problem in 10.04 with both Ubuntu's 2.6.32-24.39 and with the stock kernel.org 2.6.32.18.

Revision history for this message

cotillion (tobias-schwan) wrote on 2010-08-25:

#8

Is it possible, the nfs client opens too many ports, than the hardware can handle?

How large are the files, when producing this problem?

Revision history for this message

Andrew Soroka (andrew-soroka) wrote on 2010-09-02:

#9

It happens for me every time.

I want to backup my file server 1.7TB and get about 250GB through and I get a system freeze on the client. My files are 1-4GB in size.

Reading from NFS writing to local mdadm raid5 array.

Revision history for this message

cotillion (tobias-schwan) wrote on 2010-09-03:

#10

Hmm, dont have such big files, so I cannot reproduce the bug.

Have you tried to use the option "async" in your exports? Maybe your problem is related to the problem discussed and solved here: http://art.ubuntuforums.org/showthread.php?t=1478413

Revision history for this message

David McBride (david-mcbride) wrote on 2010-09-03:

#11

Using "async" is not a viable workaround. From `man exports`:

       async
              This option allows the NFS server to violate the NFS protocol
              and reply to requests before any changes made by that request
              have been committed to stable storage (e.g. disc drive).

              Using this option usually improves performance, but at the cost
              that an unclean server restart (i.e. a crash) can cause data to
              be lost or corrupted.

The fact that using 'async' results in higher performance is not a surprise as it is much more careful with data-handling. The fact that (according to the forum thread) enabling it happens not to trigger this particular bug is perhaps interesting from a debugging perspective, but not an acceptable solution to the problem for most organisations.

If need to make a large file for testing, `dd if=/dev/zero of=my-large-file bs=1M count=$SIZE` will make you an arbitrarily-sized file containing all-zeroes. (Other nodes in /dev may well produce more interesting output..)

Revision history for this message

Bruce Edge (bruce-edge) wrote on 2010-09-30:

#12

10.04.1 still has the same problem.

4 months later - critical failure and still "unassigned"?

What are canonical spending all their time on, eye candy? Come on people, this is a core failure. This is very bad. There are dozens of the same report that are all "unassigned", with one a "medium".
Jeez, Mark S should be lying awake a night over this one.

Revision history for this message

getnuked (getnuked) wrote on 2010-09-30:

#13

@Thag you might be interested in bug 561210 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/561210)

Revision history for this message

getnuked (getnuked) wrote on 2010-09-30:

#14

Ah, disregard my comment, it appears that you are already on that bug.

Revision history for this message

Nrm (smith32-35) wrote on 2010-11-30:

#15

Hi everyone,

I've got the same problem, and if I use my WIFI card, it's "solved".
My ethernet card is :

09:00.0 Ethernet controller: Atheros Communications Atheros AR8132 / L1c Gigabit Ethernet Adapter (rev c0)

Revision history for this message

David Ressman (davidressman) wrote on 2011-02-11:

#16

I believe this issue is solved by commit 0702099bd86c33c2dcdbd3963433a61f3f503901 (NFS: fix the return value of
nfs_file_fsync()).

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2011-02-11:

#17

David - to test your theory, how about subscribing to 'deb http://ppa.launchpad.net/kernel-ppa/ppa/ubuntu lucid main' and install linux-image-server-lts-backport-natty.

affects:	nfs-utils (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee:	nobody → Tim Gardner (timg-tpi)
status:	New → In Progress

Revision history for this message

David Ressman (davidressman) wrote on 2011-02-11:

#18

Unfortunately, in the environment we have, the latest we can run is 2.6.32.xx (IB drivers, filesystem modules, etc.), so even if I installed it, I wouldn't be able to use NFS). I can verify that I added the patch from that commit into 2.6.32.24-generic and the problem disappeared. When we booted back into the stock 2.6.32.24-generic, it reappeared.

Sorry.

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2011-02-11:

#19

NFS: fix the return value of nfs_file_fsync() Edit (1.1 KiB, text/plain)

That works for me. Did your patch look like this:

Tim Gardner (timg-tpi) on 2011-02-11

Changed in linux (Ubuntu Natty):
status:	In Progress → Fix Released

Revision history for this message

David Ressman (davidressman) wrote on 2011-02-14:

#20

It looked precisely like that. :)

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2011-02-14:

#21

Lucid: NFS: fix the return value of nfs_file_fsync() Edit (1.1 KiB, text/plain)

SRU Justification

Impact: Large NFS file copies can orphan resources and block tasks

Patch Description: NFS: fix the return value of nfs_file_fsync()

Changed in linux (Ubuntu Lucid):
assignee:	nobody → Tim Gardner (timg-tpi)
status:	New → Fix Committed
Changed in linux (Ubuntu Maverick):
assignee:	nobody → Tim Gardner (timg-tpi)
status:	New → Fix Committed

Tim Gardner (timg-tpi) on 2011-02-14

Changed in linux (Ubuntu Hardy):
assignee:	nobody → Tim Gardner (timg-tpi)
status:	New → In Progress

Stefan Bader (smb) on 2011-02-15

Changed in linux (Ubuntu Hardy):
status:	In Progress → Fix Committed

Revision history for this message

David Ressman (davidressman) wrote on 2011-02-15:

#22

You're a scholar and a gentleman, Tim.

Revision history for this message

Sean Clarke (sean-clarke) wrote on 2011-02-17:

#23

I am hitting this problem under 10.10 x64:

uname -a
Linux enterprise 2.6.35-27-server #47-Ubuntu SMP Fri Feb 11 23:09:19 UTC 2011 x86_64 GNU/Linux

I've reported it under a couple of other open bugs relating to issues around this area, can you let me know what kernel version to expect the change to be rolled out in? It is a huge problem for us as we run KVM images over NFS and this happens every time.

After it happens, we also get "false" timeouts on the NFS server and the whole system stutters and stalls. The NFS server continues to serve files to other systems and can be ping'd from the failed client, it even serves files to it - but you get very regular (5 seconds?) timeout messages in the logs:

[15594.126931] nfs: server XXXXXX not responding, timed out
[15598.336861] nfs: server XXXXXX not responding, timed out
[15602.546851] nfs: server XXXXXX not responding, timed out
[15606.757764] nfs: server XXXXXX not responding, timed out
[15610.966788] nfs: server XXXXXX not responding, timed out
[15615.176756] nfs: server XXXXXX not responding, timed out

PING XXXXXX 56(84) bytes of data.
64 bytes from XXXXXX: icmp_req=1 ttl=64 time=0.097 ms
64 bytes from XXXXXX: icmp_req=2 ttl=64 time=0.059 ms
64 bytes from XXXXXX: icmp_req=3 ttl=64 time=0.079 ms

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2011-02-17:

#24

Sean - its best if you start your own bug report using 'ubuntu-bug linux'. Your symptoms appear unrelated to this bug.

Revision history for this message

Martin Pitt (pitti) wrote on 2011-03-02: Please test proposed package

#25

Accepted linux into hardy-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message

Martin Pitt (pitti) wrote on 2011-03-02:

#26

Accepted linux into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message

Martin Pitt (pitti) wrote on 2011-03-02:

#27

Accepted linux into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Brad Figg (brad-figg) on 2011-03-03

tags:

added: verification-needed-hardy verification-needed-lucid verification-needed-maverick

Revision history for this message

Martin Pitt (pitti) wrote on 2011-03-03:

#28

Accepted linux-ec2 into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message

Steve Conklin (sconklin) wrote on 2011-03-03:

#29

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-<release>' to 'verification-done-<release>'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message

jeffetflo (jeff-jeffetflo) wrote on 2011-03-05:

#30

Sorry, but which package I have to test ?
I don't see anything nfs......

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2011-03-07:

#31

Steve - the only kernel that I'm comfortable marking verified is Lucid. I'm happy to have you revert Maverick and Hardy so that it forces someone to do the testing as I don't have a reproducer.

tags:

added: verification-done-lucid
removed: verification-needed-lucid

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2011-03-08:

#32

Steve - I'm changing my position and am going to advocate for keeping this patch in Maverick and Hardy as its been officially accepted as a stable patch for 2.6.32.y. The code in Hardy is substantially identical wrt the use of the return value of nfs_do_fsync(). Therefore I'm marking all releases as verification-done.

tags:

added: verification-done-hardy verification-done-maverick
removed: verification-needed-hardy verification-needed-maverick

Revision history for this message

Dan Bishop (danbishop) wrote on 2011-03-08:

#33

This patch works perfectly! I can finally use NFS home directories again! :D Well... so long as I enable -proposed for now

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-03-17:

#34

Download full text (8.6 KiB)

This bug was fixed in the package linux - 2.6.32-30.59

---------------
linux (2.6.32-30.59) lucid-proposed; urgency=low

[ Steve Conklin ]

* Release Tracking Bug
- LP: #727336

[ Tim Gardner ]

* [Config] CONFIG_IRQ_TIME_ACCOUNTING=n
- LP: #723819

[ Upstream Kernel Changes ]

  * virtio_net: Add schedule check to napi_enable call
    - LP: #579276
  * NFS: fix the return value of nfs_file_fsync()
    - LP: #585657
  * block: check for proper length of iov entries earlier in
    blk_rq_map_user_iov(), CVE-2010-4163
    - LP: #721504
    - CVE-2010-4163
  * filter: make sure filters dont read uninitialized memory
    - LP: #721282
    - CVE-2010-4158
  * tty: Make tiocgicount a handler, CVE-2010-4076, CVE-2010-4077
    - LP: #720189
    - CVE-2010-4077
  * staging: usbip: remove double giveback of URB
    - LP: #723819
  * USB: EHCI: ASPM quirk of ISOC on AMD SB800
    - LP: #723819
  * rt2x00: add device id for windy31 usb device
    - LP: #723819
  * ALSA: snd-usb-us122l: Fix missing NULL checks
    - LP: #723819
  * hwmon: (via686a) Initialize fan_div values
    - LP: #723819
  * USB: serial: handle Data Carrier Detect changes
    - LP: #723819
  * USB: CP210x Add two device IDs
    - LP: #723819
  * USB: CP210x Removed incorrect device ID
    - LP: #723819
  * USB: usb-storage: unusual_devs update for Cypress ATACB
    - LP: #723819
  * USB: usb-storage: unusual_devs update for TrekStor DataStation maxi g.u
    external hard drive enclosure
    - LP: #723819
  * USB: usb-storage: unusual_devs entry for CamSport Evo
    - LP: #723819
  * USB: usb-storage: unusual_devs entry for Coby MP3 player
    - LP: #723819
  * USB: serial: Updated support for ICOM devices
    - LP: #723819
  * USB: adding USB support for Cinterion's HC2x, EU3 and PH8 products
    - LP: #723819
  * USB: EHCI: ASPM quirk of ISOC on AMD Hudson
    - LP: #723819
  * USB: EHCI: fix DMA deallocation bug
    - LP: #723819
  * USB: g_printer: fix bug in module parameter definitions
    - LP: #723819
  * USB: io_edgeport: fix the reported firmware major and minor
    - LP: #723819
  * USB: ti_usb: fix module removal
    - LP: #723819
  * USB: Storage: Add unusual_devs entry for VTech Kidizoom
    - LP: #723819
  * USB: ftdi_sio: add ST Micro Connect Lite uart support
    - LP: #723819
  * USB: cdc-acm: Adding second ACM channel support for Nokia N8
    - LP: #723819
  * USB: ftdi_sio: Add VID=0x0647, PID=0x0100 for Acton Research
    spectrograph
    - LP: #723819
  * USB: prevent buggy hubs from crashing the USB stack
    - LP: #723819
  * staging: comedi: add support for newer jr3 1-channel pci board
    - LP: #723819
  * staging: comedi: ni_labpc: Use shared IRQ for PCMCIA card
    - LP: #723819
  * Staging: hv: fix sysfs symlink on hv block device
    - LP: #723819
  * staging: hv: Enable sending GARP packet after live migration
    - LP: #723819
  * hvc_iucv: allocate memory buffers for IUCV in zone DMA
    - LP: #723819
  * iwlagn: enable only rfkill interrupt when device is down
    - LP: #723819
  * ath9k: Fix bug in delimiter padding computation
    - LP: #723819
  * correct vdso version string
    - LP: #723819
  * fix medium error problems with so...

This bug was fixed in the package linux - 2.6.32-30.59

---------------
linux (2.6.32-30.59) lucid-proposed; urgency=low

[ Steve Conklin ]

* Release Tracking Bug
    - LP: #727336

[ Tim Gardner ]

* [Config] CONFIG_IRQ_TIME_ACCOUNTING=n
    - LP: #723819

[ Upstream Kernel Changes ]

* virtio_net: Add schedule check to napi_enable call
    - LP: #579276
  * NFS: fix the return value of nfs_file_fsync()
    - LP: #585657
  * block: check for proper length of iov entries earlier in
    blk_rq_map_user_iov(), CVE-2010-4163
    - LP: #721504
    - CVE-2010-4163
  * filter: make sure filters dont read uninitialized memory
    - LP: #721282
    - CVE-2010-4158
  * tty: Make tiocgicount a handler, CVE-2010-4076, CVE-2010-4077
    - LP: #720189
    - CVE-2010-4077
  * staging: usbip: remove double giveback of URB
    - LP: #723819
  * USB: EHCI: ASPM quirk of ISOC on AMD SB800
    - LP: #723819
  * rt2x00: add device id for windy31 usb device
    - LP: #723819
  * ALSA: snd-usb-us122l: Fix missing NULL checks
    - LP: #723819
  * hwmon: (via686a) Initialize fan_div values
    - LP: #723819
  * USB: serial: handle Data Carrier Detect changes
    - LP: #723819
  * USB: CP210x Add two device IDs
    - LP: #723819
  * USB: CP210x Removed incorrect device ID
    - LP: #723819
  * USB: usb-storage: unusual_devs update for Cypress ATACB
    - LP: #723819
  * USB: usb-storage: unusual_devs update for TrekStor DataStation maxi g.u
    external hard drive enclosure
    - LP: #723819
  * USB: usb-storage: unusual_devs entry for CamSport Evo
    - LP: #723819
  * USB: usb-storage: unusual_devs entry for Coby MP3 player
    - LP: #723819
  * USB: serial: Updated support for ICOM devices
    - LP: #723819
  * USB: adding USB support for Cinterion's HC2x, EU3 and PH8 products
    - LP: #723819
  * USB: EHCI: ASPM quirk of ISOC on AMD Hudson
    - LP: #723819
  * USB: EHCI: fix DMA deallocation bug
    - LP: #723819
  * USB: g_printer: fix bug in module parameter definitions
    - LP: #723819
  * USB: io_edgeport: fix the reported firmware major and minor
    - LP: #723819
  * USB: ti_usb: fix module removal
    - LP: #723819
  * USB: Storage: Add unusual_devs entry for VTech Kidizoom
    - LP: #723819
  * USB: ftdi_sio: add ST Micro Connect Lite uart support
    - LP: #723819
  * USB: cdc-acm: Adding second ACM channel support for Nokia N8
    - LP: #723819
  * USB: ftdi_sio: Add VID=0x0647, PID=0x0100 for Acton Research
    spectrograph
    - LP: #723819
  * USB: prevent buggy hubs from crashing the USB stack
    - LP: #723819
  * staging: comedi: add support for newer jr3 1-channel pci board
    - LP: #723819
  * staging: comedi: ni_labpc: Use shared IRQ for PCMCIA card
    - LP: #723819
  * Staging: hv: fix sysfs symlink on hv block device
    - LP: #723819
  * staging: hv: Enable sending GARP packet after live migration
    - LP: #723819
  * hvc_iucv: allocate memory buffers for IUCV in zone DMA
    - LP: #723819
  * iwlagn: enable only rfkill interrupt when device is down
    - LP: #723819
  * ath9k: Fix bug in delimiter padding computation
    - LP: #723819
  * correct vdso version string
    - LP: #723819
  * fix medium error problems with some arrays which can cause data
    corruption
    - LP: #723819
  * libsas: fix runaway error handler problem
    - LP: #723819
  * mpt2sas: Fix device removal handshake for zoned devices
    - LP: #723819
  * mpt2sas: Correct resizing calculation for max_queue_depth
    - LP: #723819
  * mpt2sas: Kernel Panic during Large Topology discovery
    - LP: #723819
  * radio-aimslab.c: Fix gcc 4.5+ bug
    - LP: #723819
  * em28xx: Fix audio input for Terratec Grabby
    - LP: #723819
  * ALSA : au88x0 - Limit number of channels to fix Oops via OSS emu
    - LP: #723819
  * ALSA: HDA: Fix dmesg output of HDMI supported bits
    - LP: #723819
  * ALSA: hda - Fix memory leaks in conexant jack arrays
    - LP: #723819
  * input: bcm5974: Add support for MacBookAir3
    - LP: #723819
  * ALSA: hrtimer: handle delayed timer interrupts
    - LP: #723819
  * ASoC: WM8990: msleep() takes milliseconds not jiffies
    - LP: #723819
  * ASoC: Blackfin AC97: fix build error after multi-component update
    - LP: #723819
  * NFS: Fix "kernel BUG at fs/aio.c:554!"
    - LP: #723819
  * rtc-cmos: fix suspend/resume
    - LP: #723819
  * iwlagn: Re-enable RF_KILL interrupt when down
    - LP: #723819
  * rapidio: fix hang on RapidIO doorbell queue full condition
    - LP: #723819
  * PCI: pci-stub: ignore zero-length id parameters
    - LP: #723819
  * virtio: remove virtio-pci root device
    - LP: #723819
  * ds2760_battery: Fix calculation of time_to_empty_now
    - LP: #723819
  * p54: fix sequence no. accounting off-by-one error
    - LP: #723819
  * i2c: Unregister dummy devices last on adapter removal
    - LP: #723819
  * serial: unbreak billionton CF card
    - LP: #723819
  * ptrace: use safer wake up on ptrace_detach()
    - LP: #723819
  * x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms
    - LP: #723819
  * fix jiffy calculations in calibrate_delay_direct to handle overflow
    - LP: #723819
  * USB: serial: pl2303: Hybrid reader Uniform HCR331
    - LP: #723819
  * drivers: update to pl2303 usb-serial to support Motorola cables
    - LP: #723819
  * klist: Fix object alignment on 64-bit.
    - LP: #723819
  * powerpc: Fix some 6xx/7xxx CPU setup functions
    - LP: #723819
  * parisc : Remove broken line wrapping handling pdc_iodc_print()
    - LP: #723819
  * kernel/smp.c: fix smp_call_function_many() SMP race
    - LP: #723819
  * hostap_cs: fix sleeping function called from invalid context
    - LP: #723819
  * md: fix regression with re-adding devices to arrays with no metadata
    - LP: #723819
  * pata_mpc52xx: inherit from ata_bmdma_port_ops
    - LP: #723819
  * TPM: Long default timeout fix
    - LP: #723819
  * tpm_tis: Use timeouts returned from TPM
    - LP: #723819
  * SELinux: define permissions for DCB netlink messages
    - LP: #723819
  * SELinux: do not compute transition labels on mountpoint labeled
    filesystems
    - LP: #723819
  * ieee80211: correct IEEE80211_ADDBA_PARAM_BUF_SIZE_MASK macro
    - LP: #723819
  * dm: dont take i_mutex to change device size
    - LP: #723819
  * dm mpath: disable blk_abort_queue
    - LP: #723819
  * x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask
    after switching mm
    - LP: #723819
  * usb: Realloc xHCI structures after a hub is verified.
    - LP: #723819
  * sched: Remove USER_SCHED
    - LP: #723819
  * sched: Remove remaining USER_SCHED code
    - LP: #723819
  * sched: Move sched_avg_update() to update_cpu_load()
    - LP: #723819
  * sched: Increment cache_nice_tries only on periodic lb
    - LP: #723819
  * sched: Try not to migrate higher priority RT tasks
    - LP: #723819
  * sched: Give CPU bound RT tasks preference
    - LP: #723819
  * sched: suppress RCU lockdep splat in task_fork_fair
    - LP: #723819
  * sched: fix RCU lockdep splat from task_group()
    - LP: #723819
  * sched: Do not consider SCHED_IDLE tasks to be cache hot
    - LP: #723819
  * sched: Set group_imb only a task can be pulled from the busiest cpu
    - LP: #723819
  * sched: Force balancing on newidle balance if local group has capacity
    - LP: #723819
  * sched: Drop group_capacity to 1 only if local group has extra capacity
    - LP: #723819
  * sched: Fix softirq time accounting
    - LP: #723819
  * sched: Consolidate account_system_vtime extern declaration
    - LP: #723819
  * sched: Remove unused PF_ALIGNWARN flag
    - LP: #723819
  * sched: Add a PF flag for ksoftirqd identification
    - LP: #723819
  * sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time
    - LP: #723819
  * x86: Add IRQ_TIME_ACCOUNTING
    - LP: #723819
  * sched: Do not account irq time to current task
    - LP: #723819
  * sched: Remove irq time from available CPU power
    - LP: #723819
  * sched: Call tick_check_idle before __irq_enter
    - LP: #723819
  * sched: Export account_system_vtime()
    - LP: #723819
  * sched, cgroup: Fixup broken cgroup movement
    - LP: #723819
  * sched: Use group weight, idle cpu metrics to fix imbalances during idle
    - LP: #723819
  * sched: Fix cross-sched-class wakeup preemption
    - LP: #723819
  * sched: Fix volanomark performance regression
    - LP: #723819
  * sched: Fix idle balancing
    - LP: #723819
  * sched: Fix wake_affine() vs RT tasks
    - LP: #723819
  * sched: Remove some dead code
    - LP: #723819
  * kernel/user.c: add lock release annotation on free_user()
    - LP: #723819
  * Linux 2.6.32.29
    - LP: #723819
  * rds: Integer overflow in RDS cmsg handling, CVE-2010-4175
    - LP: #721455
    - CVE-2010-4175
 -- Steve Conklin <sconklin@canonical.com>   Tue, 01 Mar 2011 12:09:35 -0600

Changed in linux (Ubuntu Lucid):
status:	Fix Committed → Fix Released

Revision history for this message

Martin Pitt (pitti) wrote on 2011-03-17:

#35

Accepted linux into hardy-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-03-18:

#36

This bug was fixed in the package linux - 2.6.35-28.49

---------------
linux (2.6.35-28.49) maverick-proposed; urgency=low

[ Brad Figg ]

* Release Tracking Bug
- LP: #726796

[ Colin Ian King ]

* SAUCE: Dell All-In-One: Remove need for Dell module alias

[ Manoj Iyer ]

* SAUCE: add ricoh 0xe823 pci id.
- LP: #717435

[ Upstream Kernel Changes ]

  * virtio_net: Add schedule check to napi_enable call
    - LP: #579276
  * mmc: make sdhci work with ricoh mmc controller
    - LP: #717435
  * NFS: fix the return value of nfs_file_fsync()
    - LP: #585657
  * rt2x00: Pad beacon to multiple of 32 bits.
    - LP: #659143
  * rt2x00: Fix firmware loading regression on x86_64.
    - LP: #659143
  * rt2x00: Check for errors from skb_pad() calls
    - LP: #659143
  * block: check for proper length of iov entries earlier in
    blk_rq_map_user_iov(), CVE-2010-4163
    - LP: #721504
    - CVE-2010-4163
  * tty: Make tiocgicount a handler, CVE-2010-4076, CVE-2010-4077
    - LP: #720189
    - CVE-2010-4077
    - CVE-2010-4076
  * rds: Integer overflow in RDS cmsg handling, CVE-2010-4175
    - LP: #721455
    - CVE-2010-4175
-- Brad Figg <email address hidden> Mon, 28 Feb 2011 13:02:53 -0800

Changed in linux (Ubuntu Maverick):
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-04-04:

#37

This bug was fixed in the package linux - 2.6.24-29.88

---------------
linux (2.6.24-29.88) hardy-proposed; urgency=low

[ Brad Figg ]

* Release Tracking Bug
- LP: #736290

[Steve Conklin]

* Ubuntu-2.6.24-29.87
* [Config] Allow insertchanges to work in later version chroots

[Upstream Kernel Changes]

  * do_exit(): make sure that we run with get_fs() == USER_DS,
    CVE-2010-4258
    - LP: #723945
    - CVE-2010-4258
  * Make the bulkstat_one compat ioctl handling more sane
    - LP: #692848
  * Fix xfs_bulkstat_one size checks & error handling
    - LP: #692848
  * xfs: always use iget in bulkstat
    - LP: #692848
  * x25: Prevent crashing when parsing bad X.25 facilities CVE-2010-4164
    - LP: #731199
    - CVE-2010-4164
  * Revised [CVE-2010-4346 Hardy] install_special_mapping skips
    security_file_mmap check. CVE-2010-4346
    - LP: #731971
    - CVE-2010-4346

linux (2.6.24-29.87) hardy-proposed; urgency=low

[ Steve Conklin ]

* Release Tracking Bug
- LP: #725138

[Upstream Kernel Changes]

  * bluetooth: Fix missing NULL check, CVE-2010-4242
    - LP: #714846
    - CVE-2010-4242
  * NFS: fix the return value of nfs_file_fsync()
    - LP: #585657
  * bio: take care not overflow page count when mapping/copying user data,
    CVE-2010-4162
    - LP: #721441
    - CVE-2010-4162
  * filter: make sure filters dont read uninitialized memory
    - LP: #721282
    - CVE-2010-4158
  * tty: Make tiocgicount a handler, CVE-2010-4076, CVE-2010-4077
    - LP: #720189
    - CVE-2010-4077
  * block: check for proper length of iov entries earlier in
    blk_rq_map_user_iov(), CVE-2010-4163
    - LP: #721504
    - CVE-2010-4163
-- Brad Figg <email address hidden> Wed, 16 Mar 2011 09:43:35 -0700

Changed in linux (Ubuntu Hardy):
status:	Fix Committed → Fix Released

Revision history for this message

David McGiven (davidmcgivenn) wrote on 2011-04-29:

#38

Sorry, I'm still having the same problem with Ubuntu 10.04.2, either using :

2.6.32-30.59
or
2.6.35-020635rc1

What should I do to fix this problem ?

Thanks.

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2011-04-29:

#39

David - this issue was fixed after 10.04.2, so you'll need to subscribe to '-updates' in your favorite package manager.
Select System/Administration/SynapticPackageManger, then Settings/Repositories/Updates

Revision history for this message

David McGiven (davidmcgivenn) wrote on 2011-04-29:

#40

Thanks Tim this seems to work with 2.6.32-31-server

Revision history for this message

Zach (zivester) wrote on 2011-05-07:

#41

Apologies for being late to the party, but I'm also plagued by this bug...

I'm running 64bit Maverick, and I'm still experiencing this lockup. Isn't it suppose to be fixed with this kernel:

Linux mycomp 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux

If not, how do I get this fix?

Revision history for this message

draven (draven-sol) wrote on 2011-05-15:

#42

I'm still facing this issue.

Client kernel: Linux hyponoia 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux
Server kernel: Linux nale 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux

Revision history for this message

Paul Crawford (psc-sat) wrote on 2011-07-11:

#43

I may be seeing the same problem, but I am not sure.

I have a new Thecus N5200XXX NAS and when I read via NFS I get all transfers to/from the NFS mount blocked after 5-7GB typically, but for writing I got 47GB today (and managed to copy a 202GB file earlier). However, I don't know if this is a Thecus issue or Ubuntu/Linux issue.

I see it with both my 64-bit 10.04.02 LTS installation (kernel 2.6.32-32-generic #62-Ubuntu SMP Wed Apr 20 21:52:38 UTC 2011 x86_64 GNU/Linux) and my 32-bit 10.04.2 LTS installation with the 'proposed' updates with the 2.6.32-33-generic #69-Ubuntu SMP Mon Jun 27 15:36:47 UTC 2011 i686 GNU/Linux kernel (same PC dual-boot).

I don't seem to see it on my home PC (similar 10.04 LTS 32-bit 'proposed' kernel and the older Thecus N5200pro NAS) which is why initially I assumed it was a Thecus issue.

What I do see is I can access the NAS via CIFS and its web interface at the same time NFS is blocked, and can access NFS mounts on other servers as well.

There are no odd high CPU loads on PC client or NAS server, or syslog messages on the PC.

While this may be unrelated, I see others are still having problems after the fix has apparently been released so thought it may be of interest.

Revision history for this message

Ken Pratt (kenpratt) wrote on 2012-05-30:

#44

This bug is still alive and kicking in kernel: 3.0.0-12-generic (as part of Mint 12)

Linux fit3.thepratts.info 3.0.0-12-generic #20-Ubuntu SMP Fri Oct 7 14:56:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

This is running on an unmodified up-to-date fitPC3 running Mint 12.

When I attempt to write large files from another Linux box over NFSv4 I get several hundred MBs into the copy then the NFS client connection hangs indifinately. The NFS server still responds to other clients without probelms.

Is there any way the patched that fixed this problem has been reverted in current kernels?

I have a Gig ethernet through a gig switch to a gig port on a Fit PC3 running Mint 12 and exporting a NFS share via NFSv4. I can not successfully copy a file (movie rip - large file) from a client Linux box running Ubuntu 12.04 to this server. It hangs. I can read large files without a hitch.

I realize I am late to this dialog, but am I know the only one suffering this problem? I do not want to redo everything using CIFS (SMB). However, I have never had a problem with CIFS - just don't like the permission and ownership mapping.

Revision history for this message

StormForge (br-cs) wrote on 2013-07-05:

#45

I think I'm experiencing this bug as well. Copying large files (250GB) to an NFS server. Copy hangs with many "NFS not responding" messages. Many OS functions (like df) seem to lock up. Only recourse is a reboot.

This is on 12.04.2 running 3.2.0-48-generic x86_64 with latest updates as of July 4 2013.

Revision history for this message

Per-Inge (per-inge-hallin) wrote on 2013-12-03:

#46

I also have this problem. Both on an Ubuntu 13.10 installation with kernel 3.12 and on a Trusty Tahr installation with kernel 3.12.0.4.6.
A file copy starts fine, but hangs soon after.
My server is a fully updated Ubuntu 12.04 server using RAID 5.1.

Revision history for this message

Per-Inge (per-inge-hallin) wrote on 2013-12-04:

#47

nfs.png Edit (45.1 KiB, image/png)

I have used my test server to get some more information.
All copying are made with Nautilus.
The server is a fully updated Ubuntu 12.04 server.
The client is Ubuntu Trusty Tahr.
Copy from the server to the client with NFS works fine
Copy from the client to the server works.
Copy from the client to the server with NFS has problems. It takes about an hour to copy. When I open a new Nautilus window, all three Nautilus windows are grayed out, but recover when the copy is finished.
See the pictures.

Revision history for this message

Rob van der Linde (robvdl) wrote on 2014-01-19:

#48

I am experiencing the same issue as described in comment #47 on Ubuntu 13.10

Both server and client are Ubuntu 13.10, copying from server to client does not cause lockups, but copying to the server will cause Dolphin to lockup until the copy is complete.

I am not sure if this is a new issue or an old bug coming back, if this has been marked as fixed in older versions of Ubuntu, yet the problem still seems to persist in 13.10.

Revision history for this message

Jander Moreira (moreira-jander) wrote on 2014-01-21:

#49

I faced a similar issue using Ubuntu 13.10 (fully updated) on my notebook when copying/moving large files to my IOmega IX2-200 NAS (running its own Linux brand with kernel 2.6.31.8).

The system slows down and locks for some time. I've never experienced a fatal lock, but I had to wait for several minutes.

Copying from NAS to the notebook presents no noticeable locks or hangs.

Revision history for this message

Jander Moreira (moreira-jander) wrote on 2014-01-21:

#50

Forgot to mention: the notebook runs kernel 2.6.31.8 and mounts NFS with autofs. All mounts use the default settings.

Revision history for this message

Jander Moreira (moreira-jander) wrote on 2014-01-21:

#51

Ops. A copy/paste problem.
The notebooks actually runs kernel 3.11.0-15-generic.

Sorry for that and for the multiple postings...

Revision history for this message

Rob van der Linde (robvdl) wrote on 2014-01-22:

#52

The systems I am running are also Ubuntu 13.10 and the 3.11 kernel.

This bug is a quite annoying, as every time I copy a file to my server over NFS, the client doing the copying will stall/freeze for a few seconds at the end of the copy and lock up Nautilus/Dolphin for a few seconds and then wake up again. I have been considering going back to SSHFS as NFS in it's current state is almost unusable with all that stalling.

I don't know if this is a bug that has resurfaced on Ubuntu 13.10 / kernel 3.11, or if this is actually a new bug that needs to be opened.

Revision history for this message

MBr (m-emanuel) wrote on 2014-09-10:

#53

Same problem with Ubuntu Trusty "Ubuntu 14.04.1 LTS" / kernel 3.13.0-32-generic.
Both NFS client and server run this version of Ubuntu and kernel version. Trying to transfer 500GB files from mdadm raid5 to NFS using lbzip2:

[48693.533918] INFO: task lbzip2:14344 blocked for more than 120 seconds.
[48693.536784] Not tainted 3.13.0-32-generic #57-Ubuntu
[48693.539750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[48693.542764] lbzip2 D ffff88011fc94440 0 14344 14341 0x00000000
[48693.542773] ffff8800d235bb38 0000000000000002 ffff8801194497f0 ffff8800d235bfd8
[48693.542782] 0000000000014440 0000000000014440 ffff8801194497f0 ffff88011fc94cd8
[48693.542789] ffff88011ffd3f28 0000000000000002 ffffffffa0219fe0 ffff8800d235bbb0
[48693.542795] Call Trace:
[48693.542838] [<ffffffffa0219fe0>] ? nfs_free_request+0xb0/0xb0 [nfs]
[48693.542851] [<ffffffff817203fd>] io_schedule+0x9d/0x140
[48693.542877] [<ffffffffa0219fee>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[48693.542884] [<ffffffff81720882>] __wait_on_bit+0x62/0x90
[48693.542908] [<ffffffffa0219fe0>] ? nfs_free_request+0xb0/0xb0 [nfs]
[48693.542917] [<ffffffff81720927>] out_of_line_wait_on_bit+0x77/0x90
[48693.542926] [<ffffffff810aaf40>] ? autoremove_wake_function+0x40/0x40
[48693.542948] [<ffffffffa021a383>] nfs_wait_on_request+0x33/0x40 [nfs]
[48693.542971] [<ffffffffa021f2d0>] nfs_updatepage+0x150/0x650 [nfs]
[48693.542991] [<ffffffffa021096b>] nfs_write_end+0x5b/0x340 [nfs]
[48693.543000] [<ffffffff8114e616>] generic_file_buffered_write+0x156/0x250
[48693.543009] [<ffffffff8114fc81>] __generic_file_aio_write+0x1c1/0x3d0
[48693.543016] [<ffffffff8114fee8>] generic_file_aio_write+0x58/0xa0
[48693.543036] [<ffffffffa020fbdb>] nfs_file_write+0xbb/0x1d0 [nfs]
[48693.543043] [<ffffffff811bc3da>] do_sync_write+0x5a/0x90
[48693.543050] [<ffffffff811bcb64>] vfs_write+0xb4/0x1f0
[48693.543056] [<ffffffff811bd599>] SyS_write+0x49/0xa0
[48693.543063] [<ffffffff8172c87f>] tracesys+0xe1/0xe6

NFS client hardware Dell T605 server, NFS server HP Proliant ML150 G2

Same problem with Ubuntu Trusty "Ubuntu 14.04.1 LTS" / kernel 3.13.0-32-generic.
Both NFS client and server run this version of Ubuntu and kernel version. Trying to transfer 500GB files from mdadm raid5 to NFS using lbzip2:

[48693.533918] INFO: task lbzip2:14344 blocked for more than 120 seconds.
[48693.536784]       Not tainted 3.13.0-32-generic #57-Ubuntu
[48693.539750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[48693.542764] lbzip2          D ffff88011fc94440     0 14344  14341 0x00000000
[48693.542773]  ffff8800d235bb38 0000000000000002 ffff8801194497f0 ffff8800d235bfd8
[48693.542782]  0000000000014440 0000000000014440 ffff8801194497f0 ffff88011fc94cd8
[48693.542789]  ffff88011ffd3f28 0000000000000002 ffffffffa0219fe0 ffff8800d235bbb0
[48693.542795] Call Trace:
[48693.542838]  [<ffffffffa0219fe0>] ? nfs_free_request+0xb0/0xb0 [nfs]
[48693.542851]  [<ffffffff817203fd>] io_schedule+0x9d/0x140
[48693.542877]  [<ffffffffa0219fee>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[48693.542884]  [<ffffffff81720882>] __wait_on_bit+0x62/0x90
[48693.542908]  [<ffffffffa0219fe0>] ? nfs_free_request+0xb0/0xb0 [nfs]
[48693.542917]  [<ffffffff81720927>] out_of_line_wait_on_bit+0x77/0x90
[48693.542926]  [<ffffffff810aaf40>] ? autoremove_wake_function+0x40/0x40
[48693.542948]  [<ffffffffa021a383>] nfs_wait_on_request+0x33/0x40 [nfs]
[48693.542971]  [<ffffffffa021f2d0>] nfs_updatepage+0x150/0x650 [nfs]
[48693.542991]  [<ffffffffa021096b>] nfs_write_end+0x5b/0x340 [nfs]
[48693.543000]  [<ffffffff8114e616>] generic_file_buffered_write+0x156/0x250
[48693.543009]  [<ffffffff8114fc81>] __generic_file_aio_write+0x1c1/0x3d0
[48693.543016]  [<ffffffff8114fee8>] generic_file_aio_write+0x58/0xa0
[48693.543036]  [<ffffffffa020fbdb>] nfs_file_write+0xbb/0x1d0 [nfs]
[48693.543043]  [<ffffffff811bc3da>] do_sync_write+0x5a/0x90
[48693.543050]  [<ffffffff811bcb64>] vfs_write+0xb4/0x1f0
[48693.543056]  [<ffffffff811bd599>] SyS_write+0x49/0xa0
[48693.543063]  [<ffffffff8172c87f>] tracesys+0xe1/0xe6

NFS client hardware Dell T605 server, NFS server HP Proliant ML150 G2

Revision history for this message

kevinf (kevinf) wrote on 2015-01-14:

#54

same lockups when copying / rsync 200MB files TO FreeNas NFS server, didn't have problems reading.

Gigabyte Brix using Asix ax88179_178a Gigabit USB3 nic, tried a second nic with same result.

Kernel: 3.17.4-031704-generic (mainline)

Xubuntu Trusty 14.0.1

ii nfs-kernel-server 1:1.2.8-6ubuntu1.1
ii nfs-common 1:1.2.8-6ubuntu1.1

22699-Jan 14 14:08:44 brix kernel: [ 360.548908] INFO: task cp:5165 blocked for more than 120 seconds.
22700-Jan 14 14:08:44 brix kernel: [ 360.548912] Tainted: G OE 3.17.4-031704-generic #201411211317
22701:Jan 14 14:08:44 brix kernel: [ 360.548913] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22702-Jan 14 14:08:44 brix kernel: [ 360.548914] cp D 0000000000000007 0 5165 1 0x00000004
22703-Jan 14 14:08:44 brix kernel: [ 360.548917] ffff8801eb9b7c88 0000000000000086 ffff8801eb9b7c28 ffffffff8101e5c9
22704-Jan 14 14:08:44 brix kernel: [ 360.548919] ffff8801eb9b7fd8 00000000000145c0 ffff8800bbeee200 00000000000145c0
22705-Jan 14 14:08:44 brix kernel: [ 360.548920] ffff880213235000 ffff88003643e400 ffff8801eb9b7c88 ffff88021ebd4ec0
22706-Jan 14 14:08:44 brix kernel: [ 360.548922] Call Trace:
22707-Jan 14 14:08:44 brix kernel: [ 360.548928] [<ffffffff8101e5c9>] ? read_tsc+0x9/0x10
22708-Jan 14 14:08:44 brix kernel: [ 360.548932] [<ffffffff817a2970>] ? bit_wait+0x50/0x50
22709-Jan 14 14:08:44 brix kernel: [ 360.548933] [<ffffffff817a20c9>] schedule+0x29/0x70
22710-Jan 14 14:08:44 brix kernel: [ 360.548935] [<ffffffff817a219f>] io_schedule+0x8f/0xd0
22711-Jan 14 14:08:44 brix kernel: [ 360.548937] [<ffffffff817a299b>] bit_wait_io+0x2b/0x50
22712-Jan 14 14:08:44 brix kernel: [ 360.548939] [<ffffffff817a2865>] __wait_on_bit+0x65/0x90
22713-Jan 14 14:08:44 brix kernel: [ 360.548942] [<ffffffff811731eb>] ? find_get_pages_tag+0xcb/0x170
22714-Jan 14 14:08:44 brix kernel: [ 360.548944] [<ffffffff81172637>] wait_on_page_bit+0xc7/0xd0
22715-Jan 14 14:08:44 brix kernel: [ 360.548947] [<ffffffff810b3fd0>] ? wake_atomic_t_function+0x40/0x40
22716-Jan 14 14:08:44 brix kernel: [ 360.548949] [<ffffffff81172804>] filemap_fdatawait_range+0xf4/0x180
22717-Jan 14 14:08:44 brix kernel: [ 360.548951] [<ffffffff811747fd>] filemap_write_and_wait_range+0x4d/0x80
22718-Jan 14 14:08:44 brix kernel: [ 360.548969] [<ffffffffc01f9223>] nfs_file_fsync+0x53/0x150 [nfs]
22719-Jan 14 14:08:44 brix kernel: [ 360.548974] [<ffffffff81219899>] vfs_fsync+0x29/0x40
22720-Jan 14 14:08:44 brix kernel: [ 360.548980] [<ffffffffc01f9cfa>] nfs_file_flush+0x8a/0xd0 [nfs]
22721-Jan 14 14:08:44 brix kernel: [ 360.548982] [<ffffffff811e743a>] filp_close+0x3a/0x90
22722-Jan 14 14:08:44 brix kernel: [ 360.548984] [<ffffffff8120709f>] __close_fd+0x8f/0xc0
22723-Jan 14 14:08:44 brix kernel: [ 360.548986] [<ffffffff811e8cd3>] SyS_close+0x23/0x50
22724-Jan 14 14:08:44 brix kernel: [ 360.548988] [<ffffffff817a656d>] system_call_fastpath+0x1a/0x1f

After reading: http://art.ubuntuforums.org/showthread.php?t=1478413 this is REALLY embarrassing.

same lockups when copying / rsync 200MB files TO FreeNas NFS server, didn't have problems reading.

Gigabyte Brix using Asix ax88179_178a Gigabit USB3 nic, tried a second nic with same result.

Kernel: 3.17.4-031704-generic (mainline)

Xubuntu Trusty 14.0.1

ii  nfs-kernel-server                           1:1.2.8-6ubuntu1.1         
ii  nfs-common                                  1:1.2.8-6ubuntu1.1

22699-Jan 14 14:08:44 brix kernel: [  360.548908] INFO: task cp:5165 blocked for more than 120 seconds.
22700-Jan 14 14:08:44 brix kernel: [  360.548912]       Tainted: G           OE  3.17.4-031704-generic #201411211317
22701:Jan 14 14:08:44 brix kernel: [  360.548913] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22702-Jan 14 14:08:44 brix kernel: [  360.548914] cp              D 0000000000000007     0  5165      1 0x00000004
22703-Jan 14 14:08:44 brix kernel: [  360.548917]  ffff8801eb9b7c88 0000000000000086 ffff8801eb9b7c28 ffffffff8101e5c9
22704-Jan 14 14:08:44 brix kernel: [  360.548919]  ffff8801eb9b7fd8 00000000000145c0 ffff8800bbeee200 00000000000145c0
22705-Jan 14 14:08:44 brix kernel: [  360.548920]  ffff880213235000 ffff88003643e400 ffff8801eb9b7c88 ffff88021ebd4ec0
22706-Jan 14 14:08:44 brix kernel: [  360.548922] Call Trace:
22707-Jan 14 14:08:44 brix kernel: [  360.548928]  [<ffffffff8101e5c9>] ? read_tsc+0x9/0x10
22708-Jan 14 14:08:44 brix kernel: [  360.548932]  [<ffffffff817a2970>] ? bit_wait+0x50/0x50
22709-Jan 14 14:08:44 brix kernel: [  360.548933]  [<ffffffff817a20c9>] schedule+0x29/0x70
22710-Jan 14 14:08:44 brix kernel: [  360.548935]  [<ffffffff817a219f>] io_schedule+0x8f/0xd0
22711-Jan 14 14:08:44 brix kernel: [  360.548937]  [<ffffffff817a299b>] bit_wait_io+0x2b/0x50
22712-Jan 14 14:08:44 brix kernel: [  360.548939]  [<ffffffff817a2865>] __wait_on_bit+0x65/0x90
22713-Jan 14 14:08:44 brix kernel: [  360.548942]  [<ffffffff811731eb>] ? find_get_pages_tag+0xcb/0x170
22714-Jan 14 14:08:44 brix kernel: [  360.548944]  [<ffffffff81172637>] wait_on_page_bit+0xc7/0xd0
22715-Jan 14 14:08:44 brix kernel: [  360.548947]  [<ffffffff810b3fd0>] ? wake_atomic_t_function+0x40/0x40
22716-Jan 14 14:08:44 brix kernel: [  360.548949]  [<ffffffff81172804>] filemap_fdatawait_range+0xf4/0x180
22717-Jan 14 14:08:44 brix kernel: [  360.548951]  [<ffffffff811747fd>] filemap_write_and_wait_range+0x4d/0x80
22718-Jan 14 14:08:44 brix kernel: [  360.548969]  [<ffffffffc01f9223>] nfs_file_fsync+0x53/0x150 [nfs]
22719-Jan 14 14:08:44 brix kernel: [  360.548974]  [<ffffffff81219899>] vfs_fsync+0x29/0x40
22720-Jan 14 14:08:44 brix kernel: [  360.548980]  [<ffffffffc01f9cfa>] nfs_file_flush+0x8a/0xd0 [nfs]
22721-Jan 14 14:08:44 brix kernel: [  360.548982]  [<ffffffff811e743a>] filp_close+0x3a/0x90
22722-Jan 14 14:08:44 brix kernel: [  360.548984]  [<ffffffff8120709f>] __close_fd+0x8f/0xc0
22723-Jan 14 14:08:44 brix kernel: [  360.548986]  [<ffffffff811e8cd3>] SyS_close+0x23/0x50
22724-Jan 14 14:08:44 brix kernel: [  360.548988]  [<ffffffff817a656d>] system_call_fastpath+0x1a/0x1f

After reading: http://art.ubuntuforums.org/showthread.php?t=1478413 this is REALLY embarrassing.

Revision history for this message

Per-Inge (per-inge-hallin) wrote on 2015-01-15:

#55

I have added the option tcp in the mount command in fstab. That option enables copying of at least 10 GB files to my Ubuntu server.

Revision history for this message

bananenkasper (bananenkasper) wrote on 2015-10-31:

#56

Since ages, still the same problem.

DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=17.2
DISTRIB_CODENAME=rafaela
DISTRIB_DESCRIPTION="Linux Mint 17.2 Rafaela"

Linux 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8 09:43:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message

Savage-w (savage-w) wrote on 2015-11-02:

#57

Download full text (6.2 KiB)

Still happening in 14.04 LTS

Client (1Gbps Link):
[ 3038.818986] nfs: server 10.0.0.200 not responding, timed out
[ 3038.818991] nfs: server 10.0.0.200 not responding, timed out
[ 3038.818996] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819001] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819006] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819012] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819017] nfs: server 10.0.0.200 not responding, timed out
[ 3038.958559] nfs: server 10.0.0.200 not responding, timed out

Pings are under 1ms

Crash:
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.799988] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.815847] BUG: unable to handle kernel paging request at ffffea00084c2540
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.824363] IP: [<ffffea00084c2540>] 0xffffea00084c2540
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.832785] PGD 82fff5067 PUD 82fff4067 PMD 80000008176001e3
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.841143] Oops: 0011 [#2] SMP
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.849239] Modules linked in: mptctl xt_comment iptable_filter xt_multiport ip_tables x_tables rpcsec_gss_krb5 nfsv4 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache nf_conntrack_netlink nf_conntrack nfnetlink intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw ipmi_devintf serio_raw joydev gf128mul glue_helper dcdbas ablk_helper i7core_edac acpi_power_meter gpio_ich lpc_ich ipmi_si edac_core cryptd ipmi_msghandler shpchp mac_hid lp parport tcp_htcp hid_generic mptsas mptscsih usbhid mptbase psmouse hid scsi_transport_sas pata_acpi bnx2
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.919101] CPU: 6 PID: 210 Comm: kswapd0 Tainted: G D W 3.16.0-51-generic #69~14.04.1-Ubuntu
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.937875] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.957034] task: ffff8808043c1e90 ti: ffff880802204000 task.ti: ffff880802204000
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.976975] RIP: 0010:[<ffffea00084c2540>] [<ffffea00084c2540>] 0xffffea00084c2540
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.997565] RSP: 0018:ffff880802207a40 EFLAGS: 00010282
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.007973] RAX: ffff8807f94c8848 RBX: ffff880802207db0 RCX: 0000000000000000
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.028679] RDX: ffffea00084c2540 RSI: 0000000000000002 RDI: ffffea001623df80
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.049907] RBP: ffff880802207b40 R08: ffff880002d078e8 R09: ffff880005eaf478
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.071886] R10: ffff8808022079c8 R11: ffffea003f7e0980 R12: ffffea002f1b4b60
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.094261] R13: ffff880802207bc8 R14: ffffea002f1b4b40 R15: 0000000000000001
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.116663] FS: 0000000000000000(0000) GS:ffff88102fc60000(0000) knlGS:0000000000000000
Nov 2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.139733] CS: 001...

Ubuntu
linux package

Transfering large files to nfs mount causes system freeze

Bug Description

Related branches

CVE References

Other bug subscribers

Patches

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
linux (Ubuntu)	Fix Released	Undecided	Tim Gardner
Hardy	Fix Released	Undecided	Tim Gardner
Lucid	Fix Released	Undecided	Tim Gardner
Maverick	Fix Released	Undecided	Tim Gardner
Natty	Fix Released	Undecided	Tim Gardner

Ubuntulinux package

Transfering large files to nfs mount causes system freeze

Bug Description

Related branches

CVE References

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
linux package