Ubuntu
nfs-utils package

Copying large files to NFS mount blocks system

Bug #591947 reported by KÁDÁR Balázs on 2010-06-09

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	nfs-utils (Ubuntu)	New	Undecided	Unassigned

Bug Description

The system becomes completely unresponsive for several seconds then screen is updated, mouse can be moved for a few seconds before it repeats.

Server is a Debian Lenny:
Linux gurul 2.6.32-00007-g56678ec #1 PREEMPT Mon Feb 8 03:49:55 PST 2010 armv5tel GNU/Linux
unfs3 0.9.21+dfsg-1

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: nfs-common 1:1.2.0-4ubuntu4
ProcVersionSignature: Ubuntu 2.6.32-22.36-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-22-generic i686
NonfreeKernelModules: nvidia
Architecture: i386
Date: Wed Jun 9 23:06:06 2010
InstallationMedia: Kubuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100427)
ProcEnviron:
LANGUAGE=
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: nfs-utils

Tags:

Revision history for this message

KÁDÁR Balázs (balazs-kadar) wrote on 2010-06-09:

Dependencies.txt Edit (2.2 KiB, text/plain; charset="utf-8")

Revision history for this message

KÁDÁR Balázs (balazs-kadar) wrote on 2010-06-09:

syslog Edit (178.9 KiB, text/plain)

Syslog contains lots of error messages starting with:

Jun 9 22:21:11 mithrim kernel: [53384.935724] rpciod/0: page allocation failure. order:0, mode:0x4020
Jun 9 22:21:14 mithrim kernel: [53384.935733] Pid: 911, comm: rpciod/0 Tainted: P 2.6.32-22-generic #36-Ubuntu

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-01-24:

The calculations that set vm.min_free_kbytes are too parsimonious. This leads to log messages that start with the text:

fooprog: page allocation failure. order:0, mode:0x4020

and go on for dozens of lines.

By doubling the value set in vm.min_free_kbytes I was able to squelch those messages.

See https://gist.github.com/790577 https://gist.github.com/792128 https://gist.github.com/790584 for log messages

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-01-24:

run: sysctl vm.min_free_kbytes
and then take the number of KB output from that and multiply it by two.

Then run:
sysctl -w vm.min_free_kbytes=new-number-of-KB

(substitute the value you calculated for "new-number-of-KB" in my case it was 16266 KB doubled to 32532 KB)

To make this persistent over reboots:

put it in a file like /etc/sysctl.d/e1000e-bug-fix.conf

#
# double amount of memory kept free
#
# 16266 KB -> 32532 KB
#
vm.min_free_kbytes = 32532

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-01-24:

This is probably not a regression. I'm seeing both Lucid and Jaunty KVM guests with this problem too. The KVM host is running Lucid.

Jaunty VM guest with virtio IRQ page allocation failure: https://gist.github.com/793522

Lucid VM guest with virtio IRQ page allocation failure https://gist.github.com/793545

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-01-24:

Additional report of this issue http://ubuntuforums.org/showthread.php?t=1452659

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-01-24:

Another report with the same pattern: network driver IRQ happens before page allocation failure.

http://ubuntuforums.org/showthread.php?p=10393393

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-01-24:

I found a Karmic KVM VM guest with virtio where it also happened.
https://gist.github.com/793807

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-01-24:

How min_free_kbytes default size is calculated. Note that the comments mention network bandwidth.

https://gist.github.com/793880

The comments say:
min_free_kbytes = 4 * sqrt(lowmem_kbytes), for better accuracy:
min_free_kbytes = sqrt(lowmem_kbytes * 16)

So perhaps
min_free_kbytes = sqrt(lowmem_kbytes * 32)
is more realistic in terms of what is actually needed to prevent this from happening?

Revision history for this message

Divinsa Development (dev-divinsa) wrote on 2011-03-07:

#10

Reproduced multiple times on 10.04

On 10.04 this is also happening to us with a vm.min_free_kbytes set to 11140:

# sysctl vm.min_free_kbytes
vm.min_free_kbytes = 11140

Running multiple (10+) 10.04 instances on EC2, and reproduced over 15 times, but most often resulting in hung/non-responsive servers rather than a recovery.

It's fairly easy to reproduce this by increasing the MTU=9000 instead of default 1500, and moving large files, at which point it will hang the system or crash the system.

I'm now increasing that to 32252 to see how we fare at that point.

Would love to use jumbo frames as well, but causes crash within a few days (when we get high NFS load = network load)

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-03-07:

#11

@Divinsa Development for virtual machines there is a second issue.
See:
https://bugs.launchpad.net/bugs/579276

Revision history for this message

nutznboltz (nutznboltz-deactivatedaccount) wrote on 2011-03-09:

#12

Lucid proposed kernel with virtio-net napi patch passed all of the QA Team's regression testing
https://wiki.ubuntu.com/QATeam/KernelSRU-lucid-2.6.32-30.59

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntunfs-utils package

Copying large files to NFS mount blocks system

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
nfs-utils package