Bug #879334 “nfsd from nfs-kernel-server very slow and system lo...” : Bugs : nfs-utils package : Ubuntu

Revision history for this message

Vagelis Nonas (vnonas) wrote on 2011-10-21:

#1

Dependencies.txt Edit (2.3 KiB, text/plain; charset="utf-8")

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-10-24:

#2

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nfs-utils (Ubuntu):
status:	New → Confirmed

Revision history for this message

Tom Vijlbrief (tvijlbrief) wrote on 2011-10-24:

#3

while true;
do
sleep 1
rm -f data*
echo data > data1
echo data > data2
done

This file access pattern which results from
a more complex script which writes more data hangs my client
(nfs server not responding)
and creates looping nfsd processes on the server.

Revision history for this message

Tom Vijlbrief (tvijlbrief) wrote on 2011-11-16:

#4

Looks similar to bug # 585657 (similar dmesg on the client)
which was solved for 10.10

This morning my workstation hung on reading a large file after
I wrote a big file.

Looping nfsd daemons on the server...

Revision history for this message

Vagelis Nonas (vnonas) wrote on 2011-11-16:

#5

I have made a fresh install of the diskless machine with the latest ubuntu (11.10) and the problem has gone away.

However, now that you mentioned about big files I noticed in the old diskless file system (ubuntu 10.10) there are many hidden .nfsxxxxxxxxxx files some of them quite big. I suppose they are left overs....

Here are a few examples (the size appears before the date):
115351481 16252 -rw-r--r-- 1 root root 16642048 Feb 26 2011 /mnt1/ubuntu/var/cache/apt/.nfs0000000002a50011000000e2
115351436 16792 -rw-r--r-- 1 root root 17236093 Sep 11 15:57 /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01f8c00000063
115351478 16020 -rw-r--r-- 1 root root 16404288 Dec 20 2010 /mnt1/ubuntu/var/cache/apt/.nfs000000000057c00900000032
115351480 16208 -rw-r--r-- 1 root root 16594500 Feb 16 2011 /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5001200000036
115351483 16704 -rw-r--r-- 1 root root 17102785 May 5 2011 /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5004d00000032
115351460 16824 -rw-r--r-- 1 root root 17283067 Oct 22 07:45 /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01fa40000009d
115351839 16820 -rw-r--r-- 1 root root 17284552 Oct 13 07:26 /mnt1/ubuntu/var/cache/apt/.nfs0000000006e0211f0000001b
115351479 16152 -rw-r--r-- 1 root root 16536128 Feb 2 2011 /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5000f00000026
115351470 16804 -rw-r--r-- 1 root root 17258153 Sep 17 07:51 /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01fae000000bd
115351755 16820 -rw-r--r-- 1 root root 17261588 Oct 5 07:33 /mnt1/ubuntu/var/cache/apt/.nfs0000000006e020cb00000019
115351482 16380 -rw-r--r-- 1 root root 16769594 Apr 22 2011 /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5004c0000001b
115352738 4 -rw-r--r-- 1 root root 1681 Feb 2 2011 /mnt1/ubuntu/etc/.nfs0000000006e024a200000026
115353070 2097156 -rw-r--r-- 1 root root 2147483648 Mar 18 2011 /mnt1/ubuntu/etc/.nfs00000000001b807b00000028

I have made a fresh install of the diskless machine with the latest ubuntu (11.10) and the problem has gone away.

However, now that you mentioned about big files I noticed in the old diskless file system (ubuntu 10.10) there are many hidden .nfsxxxxxxxxxx  files some of them quite big. I suppose they are left overs....

Here are a few examples (the size appears before the date):
115351481 16252 -rw-r--r--   1 root     root     16642048 Feb 26  2011 /mnt1/ubuntu/var/cache/apt/.nfs0000000002a50011000000e2
115351436 16792 -rw-r--r--   1 root     root     17236093 Sep 11 15:57 /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01f8c00000063
115351478 16020 -rw-r--r--   1 root     root     16404288 Dec 20  2010 /mnt1/ubuntu/var/cache/apt/.nfs000000000057c00900000032
115351480 16208 -rw-r--r--   1 root     root     16594500 Feb 16  2011 /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5001200000036
115351483 16704 -rw-r--r--   1 root     root     17102785 May  5  2011 /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5004d00000032
115351460 16824 -rw-r--r--   1 root     root     17283067 Oct 22 07:45 /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01fa40000009d
115351839 16820 -rw-r--r--   1 root     root     17284552 Oct 13 07:26 /mnt1/ubuntu/var/cache/apt/.nfs0000000006e0211f0000001b
115351479 16152 -rw-r--r--   1 root     root     16536128 Feb  2  2011 /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5000f00000026
115351470 16804 -rw-r--r--   1 root     root     17258153 Sep 17 07:51 /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01fae000000bd
115351755 16820 -rw-r--r--   1 root     root     17261588 Oct  5 07:33 /mnt1/ubuntu/var/cache/apt/.nfs0000000006e020cb00000019
115351482 16380 -rw-r--r--   1 root     root     16769594 Apr 22  2011 /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5004c0000001b
115352738    4 -rw-r--r--   1 root     root         1681 Feb  2  2011 /mnt1/ubuntu/etc/.nfs0000000006e024a200000026
115353070 2097156 -rw-r--r--   1 root     root     2147483648 Mar 18  2011 /mnt1/ubuntu/etc/.nfs00000000001b807b00000028

Revision history for this message

Tom Vijlbrief (tvijlbrief) wrote on 2011-11-16: Re: [Bug 879334] Re: nfsd from nfs-kernel-server very slow and system load from 25%-100% from nfsd

#6

Download full text (4.0 KiB)

I wonder if your fresh install uses different mount options. Eg nfs 4 while
the old install used version 3?

I have an old exports on my server so I expect it uses 3. Did you convert
your exports?
Op 16 nov. 2011 12:35 schreef "Vagelis Nonas" <email address hidden>
het volgende:

> I have made a fresh install of the diskless machine with the latest
> ubuntu (11.10) and the problem has gone away.
>
> However, now that you mentioned about big files I noticed in the old
> diskless file system (ubuntu 10.10) there are many hidden .nfsxxxxxxxxxx
> files some of them quite big. I suppose they are left overs....
>
> Here are a few examples (the size appears before the date):
> 115351481 16252 -rw-r--r-- 1 root root 16642048 Feb 26 2011
> /mnt1/ubuntu/var/cache/apt/.nfs0000000002a50011000000e2
> 115351436 16792 -rw-r--r-- 1 root root 17236093 Sep 11 15:57
> /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01f8c00000063
> 115351478 16020 -rw-r--r-- 1 root root 16404288 Dec 20 2010
> /mnt1/ubuntu/var/cache/apt/.nfs000000000057c00900000032
> 115351480 16208 -rw-r--r-- 1 root root 16594500 Feb 16 2011
> /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5001200000036
> 115351483 16704 -rw-r--r-- 1 root root 17102785 May 5 2011
> /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5004d00000032
> 115351460 16824 -rw-r--r-- 1 root root 17283067 Oct 22 07:45
> /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01fa40000009d
> 115351839 16820 -rw-r--r-- 1 root root 17284552 Oct 13 07:26
> /mnt1/ubuntu/var/cache/apt/.nfs0000000006e0211f0000001b
> 115351479 16152 -rw-r--r-- 1 root root 16536128 Feb 2 2011
> /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5000f00000026
> 115351470 16804 -rw-r--r-- 1 root root 17258153 Sep 17 07:51
> /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01fae000000bd
> 115351755 16820 -rw-r--r-- 1 root root 17261588 Oct 5 07:33
> /mnt1/ubuntu/var/cache/apt/.nfs0000000006e020cb00000019
> 115351482 16380 -rw-r--r-- 1 root root 16769594 Apr 22 2011
> /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5004c0000001b
> 115352738 4 -rw-r--r-- 1 root root 1681 Feb 2 2011
> /mnt1/ubuntu/etc/.nfs0000000006e024a200000026
> 115353070 2097156 -rw-r--r-- 1 root root 2147483648 Mar 18 2011
> /mnt1/ubuntu/etc/.nfs00000000001b807b00000028
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/879334
>
> Title:
> nfsd from nfs-kernel-server very slow and system load from 25%-100%
> from nfsd
>
> Status in “nfs-utils” package in Ubuntu:
> Confirmed
>
> Bug description:
> I have a diskless ubuntu 10.10 machine which I boot regularly using
> pxe-boot from another ubuntu machine where I have the root filesystem
> of the diskless machine exported over nfs.
>
> I set it up about a year ago using 10.10. In the mean while the server
> machine got upgraded to 11.04 and as of yesterday to 11.10.
>
> After the upgrade to 11.10 the diskless machine is dead slow (most of
> the times it wont even boot completely) and the load on the server
> machine is high (25%-100% as shown from top). If...

I wonder if your fresh install uses different mount options. Eg nfs 4 while
the old install used version 3?

I have an old exports on my server so I expect it uses 3. Did you convert
your exports?
Op 16 nov. 2011 12:35 schreef "Vagelis Nonas" <879334@bugs.launchpad.net>
het volgende:

> I have made a fresh install of the diskless machine with the latest
> ubuntu (11.10) and the problem has gone away.
>
> However, now that you mentioned about big files I noticed in the old
> diskless file system (ubuntu 10.10) there are many hidden .nfsxxxxxxxxxx
> files some of them quite big. I suppose they are left overs....
>
> Here are a few examples (the size appears before the date):
> 115351481 16252 -rw-r--r--   1 root     root     16642048 Feb 26  2011
> /mnt1/ubuntu/var/cache/apt/.nfs0000000002a50011000000e2
> 115351436 16792 -rw-r--r--   1 root     root     17236093 Sep 11 15:57
> /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01f8c00000063
> 115351478 16020 -rw-r--r--   1 root     root     16404288 Dec 20  2010
> /mnt1/ubuntu/var/cache/apt/.nfs000000000057c00900000032
> 115351480 16208 -rw-r--r--   1 root     root     16594500 Feb 16  2011
> /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5001200000036
> 115351483 16704 -rw-r--r--   1 root     root     17102785 May  5  2011
> /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5004d00000032
> 115351460 16824 -rw-r--r--   1 root     root     17283067 Oct 22 07:45
> /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01fa40000009d
> 115351839 16820 -rw-r--r--   1 root     root     17284552 Oct 13 07:26
> /mnt1/ubuntu/var/cache/apt/.nfs0000000006e0211f0000001b
> 115351479 16152 -rw-r--r--   1 root     root     16536128 Feb  2  2011
> /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5000f00000026
> 115351470 16804 -rw-r--r--   1 root     root     17258153 Sep 17 07:51
> /mnt1/ubuntu/var/cache/apt/.nfs0000000006e01fae000000bd
> 115351755 16820 -rw-r--r--   1 root     root     17261588 Oct  5 07:33
> /mnt1/ubuntu/var/cache/apt/.nfs0000000006e020cb00000019
> 115351482 16380 -rw-r--r--   1 root     root     16769594 Apr 22  2011
> /mnt1/ubuntu/var/cache/apt/.nfs0000000002a5004c0000001b
> 115352738    4 -rw-r--r--   1 root     root         1681 Feb  2  2011
> /mnt1/ubuntu/etc/.nfs0000000006e024a200000026
> 115353070 2097156 -rw-r--r--   1 root     root     2147483648 Mar 18  2011
> /mnt1/ubuntu/etc/.nfs00000000001b807b00000028
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/879334
>
> Title:
>  nfsd from nfs-kernel-server very slow and system load from 25%-100%
>  from nfsd
>
> Status in “nfs-utils” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  I have a diskless ubuntu 10.10 machine which I boot regularly using
>  pxe-boot from another ubuntu machine where I have the root filesystem
>  of the diskless machine exported over nfs.
>
>  I set it up about a year ago using 10.10. In the mean while the server
>  machine got upgraded to 11.04 and as of yesterday to 11.10.
>
>  After the upgrade to 11.10 the diskless machine is dead slow (most of
>  the times it wont even boot completely) and the load on the server
>  machine is high (25%-100% as shown from top). If in the middle of the
>  diskless computer booting I do a restart of the nfs server, the client
>  computer proceeds with the boot a bit more and then it gets stuck
>  again. I have to restart and nfs-server 3-4 times in order to get the
>  gdm login screen at the client machine
>
>  ProblemType: Bug
>  DistroRelease: Ubuntu 11.10
>  Package: nfs-kernel-server 1:1.2.4-1ubuntu2
>  ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
>  Uname: Linux 3.0.0-12-generic i686
>  ApportVersion: 1.23-0ubuntu3
>  Architecture: i386
>  Date: Fri Oct 21 12:53:02 2011
>  ProcEnviron:
>   LANG=en_US.UTF-8
>   SHELL=/bin/bash
>  SourcePackage: nfs-utils
>  UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/879334/+subscriptions
>

Revision history for this message

Vagelis Nonas (vnonas) wrote on 2011-11-16:

#7

I dont think my exports use nfs v3 options, this is my exports file (unchanged from previous install)

/mnt1/ubuntu_new 192.168.1.46(rw,no_root_squash,sync,no_subtree_check) 192.168.1.194(rw,no_root_squash,sync,no_subtree_check)
/mnt1/mac 192.168.1.47(rw,no_root_squash,sync,no_subtree_check) 192.168.1.50(rw,no_root_squash,sync,no_subtree_check)

However I can make another test with the old filesystem, after deleting the big hidden .nfsxxxxxxxx files and see if it defaults to nfs v3, if you think it might be of use to you.

Revision history for this message

Tom Vijlbrief (tvijlbrief) wrote on 2011-11-16:

#8

According to https://help.ubuntu.com/community/SettingUpNFSHowTo

you need fsid in exports to use nfs 4. So I think we both use 4.

I wonder what is different after your reinstall...
Op 16 nov. 2011 13:30 schreef "Vagelis Nonas" <email address hidden>
het volgende:

> I dont think my exports use nfs v3 options, this is my exports file
> (unchanged from previous install)
>
> /mnt1/ubuntu_new
> 192.168.1.46(rw,no_root_squash,sync,no_subtree_check)
> 192.168.1.194(rw,no_root_squash,sync,no_subtree_check)
> /mnt1/mac 192.168.1.47(rw,no_root_squash,sync,no_subtree_check)
> 192.168.1.50(rw,no_root_squash,sync,no_subtree_check)
>
> However I can make another test with the old filesystem, after deleting
> the big hidden .nfsxxxxxxxx files and see if it defaults to nfs v3, if
> you think it might be of use to you.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/879334
>
> Title:
> nfsd from nfs-kernel-server very slow and system load from 25%-100%
> from nfsd
>
> Status in “nfs-utils” package in Ubuntu:
> Confirmed
>
> Bug description:
> I have a diskless ubuntu 10.10 machine which I boot regularly using
> pxe-boot from another ubuntu machine where I have the root filesystem
> of the diskless machine exported over nfs.
>
> I set it up about a year ago using 10.10. In the mean while the server
> machine got upgraded to 11.04 and as of yesterday to 11.10.
>
> After the upgrade to 11.10 the diskless machine is dead slow (most of
> the times it wont even boot completely) and the load on the server
> machine is high (25%-100% as shown from top). If in the middle of the
> diskless computer booting I do a restart of the nfs server, the client
> computer proceeds with the boot a bit more and then it gets stuck
> again. I have to restart and nfs-server 3-4 times in order to get the
> gdm login screen at the client machine
>
> ProblemType: Bug
> DistroRelease: Ubuntu 11.10
> Package: nfs-kernel-server 1:1.2.4-1ubuntu2
> ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
> Uname: Linux 3.0.0-12-generic i686
> ApportVersion: 1.23-0ubuntu3
> Architecture: i386
> Date: Fri Oct 21 12:53:02 2011
> ProcEnviron:
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: nfs-utils
> UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/879334/+subscriptions
>

According to https://help.ubuntu.com/community/SettingUpNFSHowTo

you need fsid in exports to use nfs 4. So I think we both use 4.

I wonder what is different after your reinstall...
Op 16 nov. 2011 13:30 schreef "Vagelis Nonas" <879334@bugs.launchpad.net>
het volgende:

> I dont think my exports use nfs v3 options, this is my exports file
> (unchanged from previous install)
>
> /mnt1/ubuntu_new
>  192.168.1.46(rw,no_root_squash,sync,no_subtree_check)
> 192.168.1.194(rw,no_root_squash,sync,no_subtree_check)
> /mnt1/mac       192.168.1.47(rw,no_root_squash,sync,no_subtree_check)
> 192.168.1.50(rw,no_root_squash,sync,no_subtree_check)
>
> However I can make another test with the old filesystem, after deleting
> the big hidden .nfsxxxxxxxx files and see if it defaults to nfs v3, if
> you think it might be of use to you.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/879334
>
> Title:
>  nfsd from nfs-kernel-server very slow and system load from 25%-100%
>  from nfsd
>
> Status in “nfs-utils” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  I have a diskless ubuntu 10.10 machine which I boot regularly using
>  pxe-boot from another ubuntu machine where I have the root filesystem
>  of the diskless machine exported over nfs.
>
>  I set it up about a year ago using 10.10. In the mean while the server
>  machine got upgraded to 11.04 and as of yesterday to 11.10.
>
>  After the upgrade to 11.10 the diskless machine is dead slow (most of
>  the times it wont even boot completely) and the load on the server
>  machine is high (25%-100% as shown from top). If in the middle of the
>  diskless computer booting I do a restart of the nfs server, the client
>  computer proceeds with the boot a bit more and then it gets stuck
>  again. I have to restart and nfs-server 3-4 times in order to get the
>  gdm login screen at the client machine
>
>  ProblemType: Bug
>  DistroRelease: Ubuntu 11.10
>  Package: nfs-kernel-server 1:1.2.4-1ubuntu2
>  ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
>  Uname: Linux 3.0.0-12-generic i686
>  ApportVersion: 1.23-0ubuntu3
>  Architecture: i386
>  Date: Fri Oct 21 12:53:02 2011
>  ProcEnviron:
>   LANG=en_US.UTF-8
>   SHELL=/bin/bash
>  SourcePackage: nfs-utils
>  UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/879334/+subscriptions
>

Revision history for this message

Tom Vijlbrief (tvijlbrief) wrote on 2011-11-16:

#9

I meant I think we both use 3.
Op 16 nov. 2011 14:43 schreef "Tom Vijlbrief" <email address hidden> het
volgende:

> According to https://help.ubuntu.com/community/SettingUpNFSHowTo
>
> you need fsid in exports to use nfs 4. So I think we both use 4.
>
> I wonder what is different after your reinstall...
> Op 16 nov. 2011 13:30 schreef "Vagelis Nonas" <email address hidden>
> het volgende:
>
>> I dont think my exports use nfs v3 options, this is my exports file
>> (unchanged from previous install)
>>
>> /mnt1/ubuntu_new
>> 192.168.1.46(rw,no_root_squash,sync,no_subtree_check)
>> 192.168.1.194(rw,no_root_squash,sync,no_subtree_check)
>> /mnt1/mac 192.168.1.47(rw,no_root_squash,sync,no_subtree_check)
>> 192.168.1.50(rw,no_root_squash,sync,no_subtree_check)
>>
>> However I can make another test with the old filesystem, after deleting
>> the big hidden .nfsxxxxxxxx files and see if it defaults to nfs v3, if
>> you think it might be of use to you.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/879334
>>
>> Title:
>> nfsd from nfs-kernel-server very slow and system load from 25%-100%
>> from nfsd
>>
>> Status in “nfs-utils” package in Ubuntu:
>> Confirmed
>>
>> Bug description:
>> I have a diskless ubuntu 10.10 machine which I boot regularly using
>> pxe-boot from another ubuntu machine where I have the root filesystem
>> of the diskless machine exported over nfs.
>>
>> I set it up about a year ago using 10.10. In the mean while the server
>> machine got upgraded to 11.04 and as of yesterday to 11.10.
>>
>> After the upgrade to 11.10 the diskless machine is dead slow (most of
>> the times it wont even boot completely) and the load on the server
>> machine is high (25%-100% as shown from top). If in the middle of the
>> diskless computer booting I do a restart of the nfs server, the client
>> computer proceeds with the boot a bit more and then it gets stuck
>> again. I have to restart and nfs-server 3-4 times in order to get the
>> gdm login screen at the client machine
>>
>> ProblemType: Bug
>> DistroRelease: Ubuntu 11.10
>> Package: nfs-kernel-server 1:1.2.4-1ubuntu2
>> ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
>> Uname: Linux 3.0.0-12-generic i686
>> ApportVersion: 1.23-0ubuntu3
>> Architecture: i386
>> Date: Fri Oct 21 12:53:02 2011
>> ProcEnviron:
>> LANG=en_US.UTF-8
>> SHELL=/bin/bash
>> SourcePackage: nfs-utils
>> UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)
>>
>> To manage notifications about this bug go to:
>>
>> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/879334/+subscriptions
>>
>

I meant I think we both use 3.
Op 16 nov. 2011 14:43 schreef "Tom Vijlbrief" <tvijlbrief@gmail.com> het
volgende:

> According to https://help.ubuntu.com/community/SettingUpNFSHowTo
>
> you need fsid in exports to use nfs 4. So I think we both use 4.
>
> I wonder what is different after your reinstall...
> Op 16 nov. 2011 13:30 schreef "Vagelis Nonas" <879334@bugs.launchpad.net>
> het volgende:
>
>> I dont think my exports use nfs v3 options, this is my exports file
>> (unchanged from previous install)
>>
>> /mnt1/ubuntu_new
>>  192.168.1.46(rw,no_root_squash,sync,no_subtree_check)
>> 192.168.1.194(rw,no_root_squash,sync,no_subtree_check)
>> /mnt1/mac       192.168.1.47(rw,no_root_squash,sync,no_subtree_check)
>> 192.168.1.50(rw,no_root_squash,sync,no_subtree_check)
>>
>> However I can make another test with the old filesystem, after deleting
>> the big hidden .nfsxxxxxxxx files and see if it defaults to nfs v3, if
>> you think it might be of use to you.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/879334
>>
>> Title:
>>  nfsd from nfs-kernel-server very slow and system load from 25%-100%
>>  from nfsd
>>
>> Status in “nfs-utils” package in Ubuntu:
>>  Confirmed
>>
>> Bug description:
>>  I have a diskless ubuntu 10.10 machine which I boot regularly using
>>  pxe-boot from another ubuntu machine where I have the root filesystem
>>  of the diskless machine exported over nfs.
>>
>>  I set it up about a year ago using 10.10. In the mean while the server
>>  machine got upgraded to 11.04 and as of yesterday to 11.10.
>>
>>  After the upgrade to 11.10 the diskless machine is dead slow (most of
>>  the times it wont even boot completely) and the load on the server
>>  machine is high (25%-100% as shown from top). If in the middle of the
>>  diskless computer booting I do a restart of the nfs server, the client
>>  computer proceeds with the boot a bit more and then it gets stuck
>>  again. I have to restart and nfs-server 3-4 times in order to get the
>>  gdm login screen at the client machine
>>
>>  ProblemType: Bug
>>  DistroRelease: Ubuntu 11.10
>>  Package: nfs-kernel-server 1:1.2.4-1ubuntu2
>>  ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
>>  Uname: Linux 3.0.0-12-generic i686
>>  ApportVersion: 1.23-0ubuntu3
>>  Architecture: i386
>>  Date: Fri Oct 21 12:53:02 2011
>>  ProcEnviron:
>>   LANG=en_US.UTF-8
>>   SHELL=/bin/bash
>>  SourcePackage: nfs-utils
>>  UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)
>>
>> To manage notifications about this bug go to:
>>
>> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/879334/+subscriptions
>>
>

Revision history for this message

Vagelis Nonas (vnonas) wrote on 2011-11-16:

#10

Yes you are right, it must be v3 the protocol we both use. I'll make a test again tomorrow of the old file system and post back my observations.

Do you know why there are those hidden .nfsxxxxxxxxxxxxx files? I noticed they exist in the new file system too. Are they left overs, or are they necessary for the operation of nfs? Do you have such files in your server at the filesystems you export?

Revision history for this message

Vagelis Nonas (vnonas) wrote on 2011-11-16:

#11

the hidden .nfsxxxxxxx files can be removed without problem, they are "left overs", created by the nfs protocol inefficiencies

Revision history for this message

Tom Vijlbrief (tvijlbrief) wrote on 2011-11-17:

#12

The .nfs files are created when you remove (unlink(2) in Unix lingo) a
file while it is still opened (used) by a program. This is quite
normal behavior for Unix programs.
In a local file system the file is only removed from the directory
(can no longer be opened by other programs) and it is not really
destroyed until the Unix kernel detects that the last program
accessing it exits.
The nfs servers implements it (the remove action) by renaming the file
to .nfsXXX at the server so that it can still be used by the client.
It does not know when the last accessing client exits, so it keeps the
file until a cleanup job is run.

2011/11/16 Vagelis Nonas <email address hidden>:
> Yes you are right, it must be v3 the protocol we both use. I'll make a
> test again tomorrow of the old file system and post back my
> observations.
>
> Do you know why there are those hidden .nfsxxxxxxxxxxxxx files? I
> noticed they exist in the new file system too. Are they left overs, or
> are they necessary for the operation of nfs? Do you have such files in
> your server at the filesystems you export?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/879334
>
> Title:
> nfsd from nfs-kernel-server very slow and system load from 25%-100%
> from nfsd
>
> Status in “nfs-utils” package in Ubuntu:
> Confirmed
>
> Bug description:
> I have a diskless ubuntu 10.10 machine which I boot regularly using
> pxe-boot from another ubuntu machine where I have the root filesystem
> of the diskless machine exported over nfs.
>
> I set it up about a year ago using 10.10. In the mean while the server
> machine got upgraded to 11.04 and as of yesterday to 11.10.
>
> After the upgrade to 11.10 the diskless machine is dead slow (most of
> the times it wont even boot completely) and the load on the server
> machine is high (25%-100% as shown from top). If in the middle of the
> diskless computer booting I do a restart of the nfs server, the client
> computer proceeds with the boot a bit more and then it gets stuck
> again. I have to restart and nfs-server 3-4 times in order to get the
> gdm login screen at the client machine
>
> ProblemType: Bug
> DistroRelease: Ubuntu 11.10
> Package: nfs-kernel-server 1:1.2.4-1ubuntu2
> ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
> Uname: Linux 3.0.0-12-generic i686
> ApportVersion: 1.23-0ubuntu3
> Architecture: i386
> Date: Fri Oct 21 12:53:02 2011
> ProcEnviron:
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: nfs-utils
> UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/879334/+subscriptions
>

The .nfs files are created when you remove (unlink(2) in Unix lingo) a
file while it is still opened (used) by a program. This is quite
normal behavior for Unix programs.
In a local file system the file is only removed from the directory
(can no longer be opened by other programs) and it is not really
destroyed until the Unix kernel detects that the last program
accessing it exits.
The nfs servers implements it (the remove action) by renaming the file
to .nfsXXX at the server so that it can still be used by the client.
It does not know when the last accessing client exits, so it keeps the
file until a cleanup job is run.

2011/11/16 Vagelis Nonas <879334@bugs.launchpad.net>:
> Yes you are right, it must be v3 the protocol we both use. I'll make a
> test again tomorrow of the old file system and post back my
> observations.
>
> Do you know why there are those hidden .nfsxxxxxxxxxxxxx   files? I
> noticed they exist in the new file system too. Are they left overs, or
> are they necessary for the operation of nfs? Do you have such files in
> your server at the filesystems you export?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/879334
>
> Title:
>  nfsd from nfs-kernel-server very slow and system load from 25%-100%
>  from nfsd
>
> Status in “nfs-utils” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  I have a diskless ubuntu 10.10 machine which I boot regularly using
>  pxe-boot from another ubuntu machine where I have the root filesystem
>  of the diskless machine exported over nfs.
>
>  I set it up about a year ago using 10.10. In the mean while the server
>  machine got upgraded to 11.04 and as of yesterday to 11.10.
>
>  After the upgrade to 11.10 the diskless machine is dead slow (most of
>  the times it wont even boot completely) and the load on the server
>  machine is high (25%-100% as shown from top). If in the middle of the
>  diskless computer booting I do a restart of the nfs server, the client
>  computer proceeds with the boot a bit more and then it gets stuck
>  again. I have to restart and nfs-server 3-4 times in order to get the
>  gdm login screen at the client machine
>
>  ProblemType: Bug
>  DistroRelease: Ubuntu 11.10
>  Package: nfs-kernel-server 1:1.2.4-1ubuntu2
>  ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
>  Uname: Linux 3.0.0-12-generic i686
>  ApportVersion: 1.23-0ubuntu3
>  Architecture: i386
>  Date: Fri Oct 21 12:53:02 2011
>  ProcEnviron:
>   LANG=en_US.UTF-8
>   SHELL=/bin/bash
>  SourcePackage: nfs-utils
>  UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/879334/+subscriptions
>

Revision history for this message

Nobuaki Nakamura (yubird) wrote on 2011-12-16:

#13

in kernel 3.0.0-14-server too.

Revision history for this message

Tom Vijlbrief (tvijlbrief) wrote on 2011-12-16:

#14

I converted my NFS server (which started life many Ubuntus ago,
probably as a 7.04 server) to v4 exports which solved my problem.

A newer Ubuntu server installation works fine as v3 server, so the
problem is probably caused by some old left over configuration
files....

2011/12/16 Nobuaki Nakamura <email address hidden>:
> in kernel 3.0.0-14-server too.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/879334
>
> Title:
> nfsd from nfs-kernel-server very slow and system load from 25%-100%
> from nfsd
>
> Status in “nfs-utils” package in Ubuntu:
> Confirmed
>
> Bug description:
> I have a diskless ubuntu 10.10 machine which I boot regularly using
> pxe-boot from another ubuntu machine where I have the root filesystem
> of the diskless machine exported over nfs.
>
> I set it up about a year ago using 10.10. In the mean while the server
> machine got upgraded to 11.04 and as of yesterday to 11.10.
>
> After the upgrade to 11.10 the diskless machine is dead slow (most of
> the times it wont even boot completely) and the load on the server
> machine is high (25%-100% as shown from top). If in the middle of the
> diskless computer booting I do a restart of the nfs server, the client
> computer proceeds with the boot a bit more and then it gets stuck
> again. I have to restart and nfs-server 3-4 times in order to get the
> gdm login screen at the client machine
>
> ProblemType: Bug
> DistroRelease: Ubuntu 11.10
> Package: nfs-kernel-server 1:1.2.4-1ubuntu2
> ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
> Uname: Linux 3.0.0-12-generic i686
> ApportVersion: 1.23-0ubuntu3
> Architecture: i386
> Date: Fri Oct 21 12:53:02 2011
> ProcEnviron:
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: nfs-utils
> UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/879334/+subscriptions

I converted my NFS server (which started life many Ubuntus ago,
probably as a 7.04 server) to v4 exports which solved my problem.

A newer Ubuntu server installation works fine as v3 server, so the
problem is probably caused by some old left over configuration
files....

2011/12/16 Nobuaki Nakamura <879334@bugs.launchpad.net>:
> in kernel 3.0.0-14-server too.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/879334
>
> Title:
>  nfsd from nfs-kernel-server very slow and system load from 25%-100%
>  from nfsd
>
> Status in “nfs-utils” package in Ubuntu:
>  Confirmed
>
> Bug description:
>  I have a diskless ubuntu 10.10 machine which I boot regularly using
>  pxe-boot from another ubuntu machine where I have the root filesystem
>  of the diskless machine exported over nfs.
>
>  I set it up about a year ago using 10.10. In the mean while the server
>  machine got upgraded to 11.04 and as of yesterday to 11.10.
>
>  After the upgrade to 11.10 the diskless machine is dead slow (most of
>  the times it wont even boot completely) and the load on the server
>  machine is high (25%-100% as shown from top). If in the middle of the
>  diskless computer booting I do a restart of the nfs server, the client
>  computer proceeds with the boot a bit more and then it gets stuck
>  again. I have to restart and nfs-server 3-4 times in order to get the
>  gdm login screen at the client machine
>
>  ProblemType: Bug
>  DistroRelease: Ubuntu 11.10
>  Package: nfs-kernel-server 1:1.2.4-1ubuntu2
>  ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
>  Uname: Linux 3.0.0-12-generic i686
>  ApportVersion: 1.23-0ubuntu3
>  Architecture: i386
>  Date: Fri Oct 21 12:53:02 2011
>  ProcEnviron:
>   LANG=en_US.UTF-8
>   SHELL=/bin/bash
>  SourcePackage: nfs-utils
>  UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/879334/+subscriptions

Revision history for this message

Ivan Frederiks (idfred) wrote on 2012-05-25:

#15

Got exactly same symptoms after upgrade from 11.04 to 12.04

Originally nfs server was set up on Ubuntu 11.04 i386.

String in /etc/exports sounds like
/srv/share 192.168.2.0/24(rw,no_root_squash,async,no_subtree_check)

Diskless client is Debian 6.0.5 i386

nfs-kernel-server:
  Installed: 1:1.2.5-3ubuntu3
  Candidate: 1:1.2.5-3ubuntu3
  Version table:
*** 1:1.2.5-3ubuntu3 0
        500 http://de.archive.ubuntu.com/ubuntu/ precise/main i386 Packages

tags:

added: precise
removed: running-unity

Revision history for this message

Jeff Ebert (jeffrey-ebertland) wrote on 2012-06-20:

#16

This thread on linux-nfs seems to be the same issue:
http://www.spinics.net/lists/linux-nfs/msg30552.html

Also, Bug #1006446 seems to be the same issue (marked as Duplicate).

Revision history for this message

Jeff Ebert (jeffrey-ebertland) wrote on 2012-06-20:

#17

Another thread on linux-nfs that appears to be the same issue, this one using kernel version 3.3.3.
http://www.spinics.net/lists/linux-nfs/msg29935.html

No response to this thread, however.

Revision history for this message

Jeff Ebert (jeffrey-ebertland) wrote on 2012-06-24:

#18

There is an upstream bug here:
https://bugzilla.kernel.org/show_bug.cgi?id=40912

I have tried the latest mainstream kernel (3.5.0) using the instructions here:
https://wiki.ubuntu.com/KernelTeam/GitKernelBuild

I still see the high CPU load on the NFS server.

I then reversed the patch suggested in the above bug.

$ git show 9660439861aa8dbd5e2b8087f33e20760c2c9afc
commit 9660439861aa8dbd5e2b8087f33e20760c2c9afc
Author: Olga Kornievskaia <email address hidden>
Date: Tue Oct 21 14:13:47 2008 -0400

svcrpc: take advantage of tcp autotuning

I also reversed the patch mentioned here manually, since I could not find the commit hash for it
http://lists.openwall.net/netdev/2012/01/20/81

Unfortunately, this patched version of 3.5.0 does not boot. I may have screwed up something else along the way, but I wanted to report this in case somebody has more time to experiment.

This particular patch looks like an ongoing problem for nfsd. It was reverted due to performance issues in 2009.

commit 7f4218354fe312b327af06c3d8c95ed5f214c8ca
Author: J. Bruce Fields <email address hidden>
Date: Wed May 27 18:51:06 2009 -0400

nfsd: Revert "svcrpc: take advantage of tcp autotuning"

    This reverts commit 47a14ef1af48c696b214ac168f056ddc79793d0e "svcrpc:
    take advantage of tcp autotuning", which uncovered some further problems
    in the server rpc code, causing significant performance regressions in
    common cases.

We will likely reinstate this patch after releasing 2.6.30 and applying
some work on the underlying fixes to the problem (developed by Trond).

    Reported-by: Jeff Moyer <email address hidden>
    Cc: Olga Kornievskaia <email address hidden>
    Cc: Jim Rees <email address hidden>
    Cc: Trond Myklebust <email address hidden>
    Signed-off-by: J. Bruce Fields <email address hidden>

It was reintroduced in May 2011, commit a74d70b63f1a0230831bcca3145d85ae016f9d4c .

Hope this helps somebody...

Revision history for this message

Jeff Ebert (jeffrey-ebertland) wrote on 2012-06-24:

#19

I reverted to linux-image-2.6.38-15-generic-pae (2.6.38-15.61) and the NFS performance is back to normal, and the CPU load dropped down to almost nothing, as before. This is clearly a linux kernel regression.

Revision history for this message

Brad Figg (brad-figg) wrote on 2012-06-24: Missing required logs.

#20

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 879334

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Karsten Suehring (suehring) wrote on 2012-11-11:

#21

I have added logs of a test setup in Bug #1077612 and steps how to reproduce in Bug #1006446 (before noticing that was marked duplicate)

Citing from the summary in Bug #1077612:

I tested with upstream Debian in my virtual machines: Squeeze has a server load of 7-10% (which seems high, but might be related to using a VM). When upgraded to Debian Wheezy the load goes up to 40% as in Ubuntu 12.04. When I boot the old 2.6 kernel from Squeeze, the load goes back to the original values.

On Ubuntu 12.04 I tried several share and mount options. The only change that showed an effect was mounting with -o proto=udp which reduced the load to around 15% which is still more than the old kernel, but much better than the 40% with tcp.

(end cite)

I'm rather surprised that not more people are running into this issues because it seems to be a show-stopper for Ubuntu NFS servers.

Revision history for this message

perpetualrabbit (perpetualrabbit) wrote on 2012-11-11: Re: [Bug 879334] Re: nfsd from nfs-kernel-server very slow and system load from 25%-100% from nfsd

#22

Since then I have been forced to leave the ubuntu server platform and went
to Redhat enterprise linux.
Canonical needs to understand that these are critical bugs that should have
been fixed in hours or days, not in weeks or months.

On Sun, Nov 11, 2012 at 1:24 PM, Karsten Suehring <<email address hidden>
> wrote:

> ** Bug watch added: Debian Bug tracker #692957
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=692957
>
> ** Also affects: linux (Debian) via
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=692957
> Importance: Unknown
> Status: Unknown
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1006446).
> https://bugs.launchpad.net/bugs/879334
>
> Title:
> nfsd from nfs-kernel-server very slow and system load from 25%-100%
> from nfsd
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/879334/+subscriptions
>

Revision history for this message

Vagelis Nonas (vnonas) wrote on 2012-11-11:

#23

I agree 100%. It's been over a year from the time of the initial report. Personally, I dont think it is gonna be solved any time soon.

So the bottom line is that if you need a production nfs server you better use an "old stable" kernel. This looks really sad to me, because I can see that neither Canonical nor the mainstream kernel developers can fix a bug introduced (probably) back in 2008, affecting a very important piece of functionality (nfs servers and clients).

Revision history for this message

Karsten Suehring (suehring) wrote on 2012-11-15:

#24

Download full text (6.1 KiB)

I'm adding some more test data here:

As a workaround I tried to install an old Ubuntu 2.6 kernel (linux-image-2.6.35-31-generic_2.6.35-31.63_amd64.deb) into 12.04.1.

I saw a number of locking issues reported and thought these might be caused by using the kernel in a wrong environment. But now after I have downgraded the servers back to 10.10 and kept the clients at 12.04.1, I still see kernel messages like the following:

[ 5474.132324] ------------[ cut here ]------------
[ 5474.132346] WARNING: at /build/buildd/linux-2.6.35/net/sunrpc/sched.c:597 rpc_exit_task+0x5c/0x60 [sunrpc]()
[ 5474.132349] Hardware name: PowerEdge R710
[ 5474.132351] Modules linked in: ipmi_si mpt2sas raid_class mptctl ipmi_devintf ipmi_msghandler dell_rbu nfsd autofs4 xfs exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc joydev ftdi_sio usbhid hid bnx2 usbserial shpchp psmouse i7core_edac serio_raw edac_core hed lp power_meter parport dcdbas ses enclosure mptsas mptscsih mptbase usb_storage scsi_transport_sas megaraid_sas [last unloaded: ipmi_si]
[ 5474.132386] Pid: 1746, comm: rpciod/16 Tainted: G W 2.6.35-32-server #67-Ubuntu
[ 5474.132388] Call Trace:
[ 5474.132399] [<ffffffff810616df>] warn_slowpath_common+0x7f/0xc0
[ 5474.132403] [<ffffffff8106173a>] warn_slowpath_null+0x1a/0x20
[ 5474.132414] [<ffffffffa016bd4c>] rpc_exit_task+0x5c/0x60 [sunrpc]
[ 5474.132426] [<ffffffffa016c52e>] __rpc_execute+0x5e/0x280 [sunrpc]
[ 5474.132437] [<ffffffffa016c7f0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[ 5474.132448] [<ffffffffa016c805>] rpc_async_schedule+0x15/0x20 [sunrpc]
[ 5474.132455] [<ffffffff8107b395>] run_workqueue+0xc5/0x1a0
[ 5474.132460] [<ffffffff8107b513>] worker_thread+0xa3/0x110
[ 5474.132464] [<ffffffff810801a0>] ? autoremove_wake_function+0x0/0x40
[ 5474.132468] [<ffffffff8107b470>] ? worker_thread+0x0/0x110
[ 5474.132472] [<ffffffff8107fc26>] kthread+0x96/0xa0
[ 5474.132477] [<ffffffff8100aea4>] kernel_thread_helper+0x4/0x10
[ 5474.132481] [<ffffffff8107fb90>] ? kthread+0x0/0xa0
[ 5474.132484] [<ffffffff8100aea0>] ? kernel_thread_helper+0x0/0x10
[ 5474.132487] ---[ end trace 5a3838b115992a79 ]---
[ 6091.800511] ------------[ cut here ]------------
[ 6091.800532] WARNING: at /build/buildd/linux-2.6.35/net/sunrpc/sched.c:597 rpc_exit_task+0x5c/0x60 [sunrpc]()
[ 6091.800536] Hardware name: PowerEdge R710
[ 6091.800537] Modules linked in: ipmi_si mpt2sas raid_class mptctl ipmi_devintf ipmi_msghandler dell_rbu nfsd autofs4 xfs exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc joydev ftdi_sio usbhid hid bnx2 usbserial shpchp psmouse i7core_edac serio_raw edac_core hed lp power_meter parport dcdbas ses enclosure mptsas mptscsih mptbase usb_storage scsi_transport_sas megaraid_sas [last unloaded: ipmi_si]
[ 6091.800572] Pid: 1744, comm: rpciod/14 Tainted: G W 2.6.35-32-server #67-Ubuntu
[ 6091.800575] Call Trace:
[ 6091.800585] [<ffffffff810616df>] warn_slowpath_common+0x7f/0xc0
[ 6091.800590] [<ffffffff8106173a>] warn_slowpath_null+0x1a/0x20
[ 6091.800601] [<ffffffffa016bd4c>] rpc_exit_task+0x5c/0x60 [sunrpc]
[ 6091.800612] [<ffffffffa016c52e>] __rpc_execute+0x5e/0x280 [sunrpc]
[ 6091.800623] [<ffffffffa016c7f0>] ? rpc...

I'm adding some more test data here:

As a workaround I tried to install an old Ubuntu 2.6 kernel (linux-image-2.6.35-31-generic_2.6.35-31.63_amd64.deb) into 12.04.1.

I saw a number of locking issues reported and thought these might be caused by using the kernel in a wrong environment. But now after I have downgraded the servers back to 10.10 and kept the clients at 12.04.1, I still see kernel messages like the following:

[ 5474.132324] ------------[ cut here ]------------
[ 5474.132346] WARNING: at /build/buildd/linux-2.6.35/net/sunrpc/sched.c:597 rpc_exit_task+0x5c/0x60 [sunrpc]()
[ 5474.132349] Hardware name: PowerEdge R710
[ 5474.132351] Modules linked in: ipmi_si mpt2sas raid_class mptctl ipmi_devintf ipmi_msghandler dell_rbu nfsd autofs4 xfs exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc joydev ftdi_sio usbhid hid bnx2 usbserial shpchp psmouse i7core_edac serio_raw edac_core hed lp power_meter parport dcdbas ses enclosure mptsas mptscsih mptbase usb_storage scsi_transport_sas megaraid_sas [last unloaded: ipmi_si]
[ 5474.132386] Pid: 1746, comm: rpciod/16 Tainted: G        W   2.6.35-32-server #67-Ubuntu
[ 5474.132388] Call Trace:
[ 5474.132399]  [<ffffffff810616df>] warn_slowpath_common+0x7f/0xc0
[ 5474.132403]  [<ffffffff8106173a>] warn_slowpath_null+0x1a/0x20
[ 5474.132414]  [<ffffffffa016bd4c>] rpc_exit_task+0x5c/0x60 [sunrpc]
[ 5474.132426]  [<ffffffffa016c52e>] __rpc_execute+0x5e/0x280 [sunrpc]
[ 5474.132437]  [<ffffffffa016c7f0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[ 5474.132448]  [<ffffffffa016c805>] rpc_async_schedule+0x15/0x20 [sunrpc]
[ 5474.132455]  [<ffffffff8107b395>] run_workqueue+0xc5/0x1a0
[ 5474.132460]  [<ffffffff8107b513>] worker_thread+0xa3/0x110
[ 5474.132464]  [<ffffffff810801a0>] ? autoremove_wake_function+0x0/0x40
[ 5474.132468]  [<ffffffff8107b470>] ? worker_thread+0x0/0x110
[ 5474.132472]  [<ffffffff8107fc26>] kthread+0x96/0xa0
[ 5474.132477]  [<ffffffff8100aea4>] kernel_thread_helper+0x4/0x10
[ 5474.132481]  [<ffffffff8107fb90>] ? kthread+0x0/0xa0
[ 5474.132484]  [<ffffffff8100aea0>] ? kernel_thread_helper+0x0/0x10
[ 5474.132487] ---[ end trace 5a3838b115992a79 ]---
[ 6091.800511] ------------[ cut here ]------------
[ 6091.800532] WARNING: at /build/buildd/linux-2.6.35/net/sunrpc/sched.c:597 rpc_exit_task+0x5c/0x60 [sunrpc]()
[ 6091.800536] Hardware name: PowerEdge R710
[ 6091.800537] Modules linked in: ipmi_si mpt2sas raid_class mptctl ipmi_devintf ipmi_msghandler dell_rbu nfsd autofs4 xfs exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc joydev ftdi_sio usbhid hid bnx2 usbserial shpchp psmouse i7core_edac serio_raw edac_core hed lp power_meter parport dcdbas ses enclosure mptsas mptscsih mptbase usb_storage scsi_transport_sas megaraid_sas [last unloaded: ipmi_si]
[ 6091.800572] Pid: 1744, comm: rpciod/14 Tainted: G        W   2.6.35-32-server #67-Ubuntu
[ 6091.800575] Call Trace:
[ 6091.800585]  [<ffffffff810616df>] warn_slowpath_common+0x7f/0xc0
[ 6091.800590]  [<ffffffff8106173a>] warn_slowpath_null+0x1a/0x20
[ 6091.800601]  [<ffffffffa016bd4c>] rpc_exit_task+0x5c/0x60 [sunrpc]
[ 6091.800612]  [<ffffffffa016c52e>] __rpc_execute+0x5e/0x280 [sunrpc]
[ 6091.800623]  [<ffffffffa016c7f0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[ 6091.800634]  [<ffffffffa016c805>] rpc_async_schedule+0x15/0x20 [sunrpc]
[ 6091.800642]  [<ffffffff8107b395>] run_workqueue+0xc5/0x1a0
[ 6091.800646]  [<ffffffff8107b513>] worker_thread+0xa3/0x110
[ 6091.800650]  [<ffffffff810801a0>] ? autoremove_wake_function+0x0/0x40
[ 6091.800654]  [<ffffffff8107b470>] ? worker_thread+0x0/0x110
[ 6091.800658]  [<ffffffff8107fc26>] kthread+0x96/0xa0
[ 6091.800663]  [<ffffffff8100aea4>] kernel_thread_helper+0x4/0x10
[ 6091.800667]  [<ffffffff8107fb90>] ? kthread+0x0/0xa0
[ 6091.800671]  [<ffffffff8100aea0>] ? kernel_thread_helper+0x0/0x10
[ 6091.800673] ---[ end trace 5a3838b115992a7a ]---

On the client I see:

[ 7061.756411] INFO: task unzip:8081 blocked for more than 120 seconds.
[ 7061.767633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7061.790039] unzip           D 0000000000000007     0  8081   8041 0x00000000
[ 7061.790044]  ffff8805ec807b48 0000000000000086 ffff880500000000 ffffffff00000007
[ 7061.790051]  ffff8805ec807fd8 ffff8805ec807fd8 ffff8805ec807fd8 00000000000137c0
[ 7061.790063]  ffff880608a02e00 ffff8805fb9f1700 ffff8805ec807b28 ffff880617c74080
[ 7061.790075] Call Trace:
[ 7061.790082]  [<ffffffff81117130>] ? __lock_page+0x70/0x70
[ 7061.790090]  [<ffffffff816590ff>] schedule+0x3f/0x60
[ 7061.790097]  [<ffffffff816591af>] io_schedule+0x8f/0xd0
[ 7061.790105]  [<ffffffff8111713e>] sleep_on_page+0xe/0x20
[ 7061.790112]  [<ffffffff816599cf>] __wait_on_bit+0x5f/0x90
[ 7061.790119]  [<ffffffff811172a8>] wait_on_page_bit+0x78/0x80
[ 7061.790127]  [<ffffffff8108acc0>] ? autoremove_wake_function+0x40/0x40
[ 7061.790135]  [<ffffffff811173bc>] filemap_fdatawait_range+0x10c/0x1a0
[ 7061.790144]  [<ffffffff8111747b>] filemap_fdatawait+0x2b/0x30
[ 7061.790151]  [<ffffffff811a17b9>] writeback_single_inode+0x399/0x430
[ 7061.790159]  [<ffffffff811a18ca>] sync_inode+0x7a/0xc0
[ 7061.790169]  [<ffffffffa01a20b3>] nfs_wb_all+0x43/0x50 [nfs]
[ 7061.790177]  [<ffffffffa01937f8>] nfs_setattr+0x138/0x140 [nfs]
[ 7061.790181]  [<ffffffff8119402b>] notify_change+0x1bb/0x360
[ 7061.790185]  [<ffffffff8117617b>] chmod_common+0xbb/0xc0
[ 7061.790189]  [<ffffffff8117d0ba>] ? sys_newstat+0x2a/0x40
[ 7061.790193]  [<ffffffff811770bf>] sys_fchmod+0x4f/0x80
[ 7061.790197]  [<ffffffff81663602>] system_call_fastpath+0x16/0x1b

and the NFS mount hangs. Sometimes the clients are able to recover, but often they hang completely.

It seems that my initial test on Debian was wrong and the Debian testing kernels have at least less load on the server.  I cannot comment on the other issues yet. But it was discussed in the linked Debian bug report that the above mentioned patch has been removed in their kernels. This seems to provide at least some positive effect.

Is any Ubuntu kernel developer following this? Could you provide a test kernel with the patch removed?

I'm currently trying to set up a test environment, but fixing my production environment has priority :-(

Bug Watch Updater (bug-watch-updater) on 2012-11-16

Changed in linux (Debian):
status:	Unknown → Incomplete

Revision history for this message

Gordon Dracup (gordon-dracup) wrote on 2012-11-22:

#25

I am not sure if this is related, but I recently upgraded my server from 10.04 to 12.04LTS. Any large files on the server e.g. ISO files showed incorrect file sizes when opened in Nautilus. These large files were unusable from the clients (also running 12.04), although they were fine on the server. It is an old 32bit server - Althon processor only used for backups and serving audio, video etc.

Solved the problem by moving to NFSv4. Changed exports file on server to:-

/nfs/srv 192.168.xx.0/24(rw,fsid=0,insecure,no_subtree_check,async)

and the fstab on the clients to:-

192.168.xx.x:/ /nfs/srv nfs4 _netdev,auto 0 0

Only been running with this setup for a couple of days, but so far, so good.

Apologies if this is unrelated to this bug. Wasn't what to do with this information as is possibly of use to others out there?

Revision history for this message

Giuseppe Vacanti (gvacanti) wrote on 2012-12-31:

#26

Running Ubuntu 12.04, 3.2.0-35-generic-pae, when clients access data on NFS mounted partitions the load on the server goes through the roof (>50). I'm testing during the holiday period when there is nobody else running anything on the machines. Had this problem with NFS3 and moved to NFS4 hoping to fix it, but it is still there. Adding my comment to keep the pressure going.

Revision history for this message

pascal (pascal-pascallen) wrote on 2013-01-15:

#27

Same here.
Mounting homedirs is a pain.
Rsync of homedirs for laptops is taking ages.
Loads of >20.
Bug confirmed.

Revision history for this message

Sven Rudolph (rudolph) wrote on 2013-01-25:

#28

Same here.
NFSv4 mounted home dirs (from a Ubuntu 12.04 LTS NFSv4 server) become very slow. Eventually the client machine freezes completely -> reset button.
Using Opensuse 12.2 as NFSv4 client produces no problems at all.
This is reproducable.

Same on some of our servers with NFSv4 mounted directories. All Ubuntu 12.04 LTS, NFSv4 servers and clients.
Frequent messages in /var/log/syslog:
[...]
Jan 25 11:49:39 xxx kernel: [ 8996.289241] Call Trace:
Jan 25 11:49:39 xxx kernel: [ 8996.289246] [<ffffffff81659ebf>] schedule+0x3f/0x60
Jan 25 11:49:39 xxx kernel: [ 8996.289249] [<ffffffff8165acc7>] __mutex_lock_slowpath+0xd7/0x150
Jan 25 11:49:39 xxx kernel: [ 8996.289253] [<ffffffff8165a8da>] mutex_lock+0x2a/0x50
Jan 25 11:49:39 xxx kernel: [ 8996.289256] [<ffffffff81186404>] do_last+0x2b4/0x730
Jan 25 11:49:39 xxx kernel: [ 8996.289260] [<ffffffff81187c21>] path_openat+0xd1/0x3f0
Jan 25 11:49:39 xxx kernel: [ 8996.289263] [<ffffffff81183565>] ? putname+0x35/0x50
Jan 25 11:49:39 xxx kernel: [ 8996.289266] [<ffffffff81187fc3>] ? user_path_at_empty+0x63/0xa0
Jan 25 11:49:39 xxx kernel: [ 8996.289275] [<ffffffffa01337db>] ? nfs_attribute_cache_expired+0x1b/0x70 [nfs]
Jan 25 11:49:39 xxx kernel: [ 8996.289279] [<ffffffff81188062>] do_filp_open+0x42/0xa0
Jan 25 11:49:39 xxx kernel: [ 8996.289284] [<ffffffff81319c11>] ? strncpy_from_user+0x31/0x40
Jan 25 11:49:39 xxx kernel: [ 8996.289287] [<ffffffff811833aa>] ? do_getname+0x10a/0x180
Jan 25 11:49:39 xxx kernel: [ 8996.289291] [<ffffffff8165bdce>] ? _raw_spin_lock+0xe/0x20
Jan 25 11:49:39 xxx kernel: [ 8996.289294] [<ffffffff81195377>] ? alloc_fd+0xf7/0x150
Jan 25 11:49:39 xxx kernel: [ 8996.289298] [<ffffffff81177688>] do_sys_open+0xf8/0x240
Jan 25 11:49:39 xxx kernel: [ 8996.289301] [<ffffffff811777f0>] sys_open+0x20/0x30
Jan 25 11:49:39 xxx kernel: [ 8996.289304] [<ffffffff816643c2>] system_call_fastpath+0x16/0x1b
[...]

Eventually the servers which act as NFS clients freeze completely -> remote reset. (It's just a test system).
This is reproducable.
Needless to say, that this disqualifies Ubuntu 12.04 LTS as an NFS client.

A fix of this bug would be highly appreciated!

Same here.
NFSv4 mounted home dirs (from a Ubuntu 12.04 LTS NFSv4 server) become very slow. Eventually the client machine freezes completely -> reset button.
Using Opensuse 12.2 as NFSv4 client produces no problems at all.
This is reproducable.

Same on some of our servers with NFSv4 mounted directories. All Ubuntu 12.04 LTS, NFSv4 servers and clients.
Frequent messages in /var/log/syslog:
[...]
Jan 25 11:49:39 xxx kernel: [ 8996.289241] Call Trace:
Jan 25 11:49:39 xxx kernel: [ 8996.289246]  [<ffffffff81659ebf>] schedule+0x3f/0x60
Jan 25 11:49:39 xxx kernel: [ 8996.289249]  [<ffffffff8165acc7>] __mutex_lock_slowpath+0xd7/0x150
Jan 25 11:49:39 xxx kernel: [ 8996.289253]  [<ffffffff8165a8da>] mutex_lock+0x2a/0x50
Jan 25 11:49:39 xxx kernel: [ 8996.289256]  [<ffffffff81186404>] do_last+0x2b4/0x730
Jan 25 11:49:39 xxx kernel: [ 8996.289260]  [<ffffffff81187c21>] path_openat+0xd1/0x3f0
Jan 25 11:49:39 xxx kernel: [ 8996.289263]  [<ffffffff81183565>] ? putname+0x35/0x50
Jan 25 11:49:39 xxx kernel: [ 8996.289266]  [<ffffffff81187fc3>] ? user_path_at_empty+0x63/0xa0
Jan 25 11:49:39 xxx kernel: [ 8996.289275]  [<ffffffffa01337db>] ? nfs_attribute_cache_expired+0x1b/0x70 [nfs]
Jan 25 11:49:39 xxx kernel: [ 8996.289279]  [<ffffffff81188062>] do_filp_open+0x42/0xa0
Jan 25 11:49:39 xxx kernel: [ 8996.289284]  [<ffffffff81319c11>] ? strncpy_from_user+0x31/0x40
Jan 25 11:49:39 xxx kernel: [ 8996.289287]  [<ffffffff811833aa>] ? do_getname+0x10a/0x180
Jan 25 11:49:39 xxx kernel: [ 8996.289291]  [<ffffffff8165bdce>] ? _raw_spin_lock+0xe/0x20
Jan 25 11:49:39 xxx kernel: [ 8996.289294]  [<ffffffff81195377>] ? alloc_fd+0xf7/0x150
Jan 25 11:49:39 xxx kernel: [ 8996.289298]  [<ffffffff81177688>] do_sys_open+0xf8/0x240
Jan 25 11:49:39 xxx kernel: [ 8996.289301]  [<ffffffff811777f0>] sys_open+0x20/0x30
Jan 25 11:49:39 xxx kernel: [ 8996.289304]  [<ffffffff816643c2>] system_call_fastpath+0x16/0x1b
[...]

Eventually the servers which act as NFS clients freeze completely -> remote reset. (It's just a test system).
This is reproducable.
Needless to say, that this disqualifies Ubuntu 12.04 LTS as an NFS client.

A fix of this bug would be highly appreciated!

Revision history for this message

Doug Schaapveld (djschaap) wrote on 2013-02-02:

#29

I am still seeing slow nfs performance and high cpu with 3.5.0-22, but found a thread suggesting a fix went into 3.5.4 in September. Haven't been able to test myself yet.

J. Bruce Fields (4):
nfsd4: fix security flavor of NFSv4.0 callback
svcrpc: fix BUG() in svc_tcp_clear_pages
svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
svcrpc: sends on closed socket should stop immediately

http://lwn.net/Articles/516478/

Revision history for this message

Cheryl Josie (exwyeorzee) wrote on 2013-02-03:

#30

Download full text (9.9 KiB)

Sorry, I did not know what to do with my report, so I am attaching it here since it seems to be the same problem.

I am running desktop Ubuntu 12.04 lts on 4 separate gigabit-networked machines, for my full-home media center, with the tuner installed in the 'server' (desktop install running NFS server) under mythtv, and 3 desktop install NFS clients in separate rooms. I had to upgrade the pre-existing 'server' (my learning platform) from 11.10 to 12.04 to match the clients because mythtv does not interoperate with differing versions, and I did not want to downgrade the clients to 11.10, I wanted a long term network install that is reliable and low-maintenance.

Now I have terrible network performance. The tuner works fine within the 'server', and I can view shows on the server, channel surf, record, play back, etc. no problems. Over the NFS network at the desktop clients, the media center system is almost completely broken.

If I start viewing a video media file, or listen to ripped audio, or i.e. open any media file at all, that is stored on the server, viewing on a client over the network, or if I attempt to edit the commercials out of shows on a client from over the network within mythtv editor, or even open a text file, the client will pause/hang for at least 30 seconds while 'loading' the file, and then finally it will start sequential streaming the media with OK performance on one or maybe two clients max - but when using Videolan VLC to view server media files on a client, I had to increase the buffer by 10X (from 3 to 30 seconds of standard definition programming, approximately) to avoid long stuttering pauses in playback. Within mythtv frontend application at the client side, the video editing over the network is abominably slow, needing tenths of seconds, to seconds, to minutes, to hours, to completely hung, for the editor to respond to each keypress, getting slower all the time until it eventually grinds to a halt.

Listing directories, editing files, viewing media, using any of the text editors or media players I have installed, all have at least 30 seconds of delay on 'opening' (sending a command, either from a terminal window, or a nautilus window, or a text editor, or whatever), and the entire network slowly grinds to a standstill eventually, with mythtv locked in unusable state at the clients, even though it is still working fine on the server.

My server is a core 2 duo and so is my main media center client. The server is fully populated with 8gig of memory and terabytes of storage, and the client is sparsely populated with 2 gig of memory. I realize this is underpowered for hdtv media applications but surely a core 2 duo should be able to serve at least one standard definition media file at a time without any performance issues at all, and should be able to handle text editors with its eyes closed. I also have an i7 laptop client with 8 gig of memory and a terabyte of storage that suffers from the same poor network performance, even after disabling the troublesome Broadcom wireless power management, or even after plugging in the 1 gigabit wired connection and disabling of wireless.

I have no security at all configured on t...

Sorry, I did not know what to do with my report, so I am attaching it here since it seems to be the same problem.

I am running desktop Ubuntu 12.04 lts on 4 separate gigabit-networked machines, for my full-home media center, with the tuner installed in the 'server' (desktop install running NFS server) under mythtv, and 3 desktop install NFS clients in separate rooms. I had to upgrade the pre-existing 'server' (my learning platform) from 11.10 to 12.04 to match the clients because mythtv does not interoperate with differing versions, and I did not want to downgrade the clients to 11.10, I wanted a long term network install that is reliable and low-maintenance.

Now I have terrible network performance. The tuner works fine within the 'server', and I can view shows on the server, channel surf, record, play back, etc. no problems. Over the NFS network at the desktop clients, the media center system is almost completely broken.

If I start viewing a video media file, or listen to ripped audio, or i.e. open any media file at all, that is stored on the server, viewing on a client over the network, or if I attempt to edit the commercials out of shows on a client from over the network within mythtv editor, or even open a text file, the client will pause/hang for at least 30 seconds while 'loading' the file, and then finally it will start sequential streaming the media with OK performance on one or maybe two clients max - but when using Videolan VLC to view server media files on a client,  I had to increase the buffer by 10X (from 3 to 30 seconds of standard definition programming, approximately) to avoid long stuttering pauses in playback. Within mythtv frontend application at the client side, the video editing over the network is abominably slow, needing tenths of seconds, to seconds, to minutes, to hours, to completely hung, for the editor to respond to each keypress, getting slower all the time until it eventually grinds to a halt.

Listing directories, editing files, viewing media, using any of the text editors or media players I have installed, all have at least 30 seconds of delay on 'opening' (sending a command, either from a terminal window, or a nautilus window, or a text editor, or whatever), and the entire network slowly grinds to a standstill eventually, with mythtv locked in unusable state at the clients, even though it is still working fine on the server.

My server is a core 2 duo and so is my main media center client. The server is fully populated with 8gig of memory and terabytes of storage, and the client is sparsely populated with 2 gig of memory. I realize this is underpowered for hdtv media applications but surely a core 2 duo should be able to serve at least one standard definition media file at a time without any performance issues at all, and should be able to handle text editors with its eyes closed. I also have an i7 laptop client with 8 gig of memory and a terabyte of storage that suffers from the same poor network performance, even after disabling the troublesome Broadcom wireless power management, or even after plugging in the 1 gigabit wired connection and disabling of wireless.

I have no security at all configured on this network, and root squash is turned off so that I can edit or delete files without having to synchronize my clients accounts over NIS with Kerberos, which I do not even understand how to install let alone set up. I am just learning network admin, doing it on my own, slowly. This is by design a really primitive, drop-dead simple network install with only a hardware firewall protecting it. All transfers are synchronous, meaning there is no automounting going on. Everything is hard mounted, so when the desktop server hangs, so do all the desktop clients, if they happen to be actively running an NFS-supplied media file or etc. It took me months to learn how to do the hard mounting of encrypted volumes properly, needing to let it time out and retry until the encrypted raid password is entered. I did not encrypt the operating system just yet, not until I have everything working in the open, and it is starting to look like I will have to start over with another distribution entirely because of this bug!!

It seems there is an NFS tuning issue because there are some lost packets at the client side, and the NFS forums suggest increasing the number of threads. I am just now learning to debug and I have a dim perspective on these tuning issues at best. I am no Linux guru and have not increased the number of threads, because there is no way I could even use up the default number of threads with just a single user on this network, unless attempting to transcode from multiple machines at once and I do not have the disc space to support that just now anyway. I do not see any reason why NFS defaults should perform so badly that they lock up the system.

So far I have changed nothing on the NFS tuning, and started searching for answers in the bug reports, because it seems to me that for my pathetically underused hardware there should be no issues whatsoever running a simple, hard-mounted, single-user household media center with multiple networked clients and no security, even using conservative default tuning that ships with NFS, even if I occasionally run out of threads once in a blue moon there should be no performance issues whatsoever due to NFS. If anything, I would expect performance issues to be due to the hardware, but not of this character where things worsen over time so that after one day the network freezes.

Now here I find eerily familiar, long-standing bug reports that seem to incriminate kernel updates that changed the NFS auto tuning algorithm. I felt compelled to write an explanation of my own scenario since it is so different from the sophisticated, network-savvy implementations mentioned here. I thought perhaps something of importance could be learned, even from someone as ignorant as me, just because the implementation I am using is so dirt simple, eliminating many potential suspects.

I am wondering to myself, does Ubuntu intend to compete with Microsoft in the home media center market, or just let Microsoft eat their lunch? Or am I supposed to convert my Ubuntu network to run under Windows networking via Samba, and just flush NFS down the toilet? I really, really want to use NFS! I want the performance and I do not trust Windows networking. But it seems that even advanced Ubuntu users have been frustrated for over a year because of this bug! Should I change my entire network to Red Hat? I am completely unfamiliar with even their packaging application, I have no spare cash lying around, and my impression of CentOS is that it is highly stripped-down. I would have to build up all my proprietary hardware drivers from scratch and there is nothing even approaching the level of the Ubuntu support community for Red Hat.

Only solution I have found so far is to reboot all clients and server, after which NFS recovers temporarily, but still has long delays when starting up the streaming, and eventually grinds to a halt again, especially when mythtv fills its allocated disc space, after which it is a miracle if I can even task-switch out of its media player app in order to reboot all the clients and start over. It seems that even 'disc full' error messages are subject to the same limitations of this NFS bug, whatever it is.

One other thing I noticed is that mp4 transcoding jobs running in Handbrake on the clients, that seem to finish properly and close their output files, often actually do not finish writing to the server, leaving incomplete, corrupted files and requiring a second, third, or fourth attempt at transcoding. Also, the 32-bit machine I am using seems to have problems with the no-root-squash function -- while write-protecting transcoded output, it chowns its transcoded files attributed to user 'nobody', even though the other two (64-bit) clients have no problem chowning to root via NFS! Apparently, the older and slower the hardware, the worse the problems it experiences with NFS.

I apologize for the long, rambling comment on this bug report. I only even wrote it to indicate to the folks at Canonical that this NFS problem is affecting ordinary Ubuntu desktop users who are attempting merely to run the same full-house media center scenario that is being advertised on television for DirectTV and cable subscribers, and being implemented independently by Windows users familiar with mythtv and handbrake. Without a properly functioning NFS, Ubuntu is properly crippled.

I am not a programmer or system administrator, just a retired engineer with a little bit of hacking experience. A very little bit. After months learning to install and use Ubuntu properly, then I had to read documentation files and support group forums for a full year to get this far with the network setup, and now to find that the operating system I chose has a known, long-standing fatal networking flaw that I have been battling with all along feels like a big fat stick in the eye. Would some kind soul please remove that stick?

Does anyone consider it important or even kind to add warnings to the package manager, or even to the Ubuntu download page, so we do not all have to stumble across this networking show-stopper on our own? I chose Ubuntu for its ease of use and support, but here is a big fat gaping network hole that makes it impossible to even achieve Windows Media Center level of performance. Adding insult to injury, there is no advance warning anywhere about this networking problem.

Thanks for the learning experience, friends, but if this is the level of functionality I can expect from Ubuntu going forward, I am going to have no choice but to find another distribution that implements NFS correctly. I intend to keep increasing the complexity of my network, implementing full-network login accounts, printer sharing, etc. etc, and no way do I intend to do it all under Samba! Sorry, this is all the feedback I can provide on this bug, mainly that it affects me too and that it seems basic to NFS and that it is a show-stopper for my intended application. thx

Revision history for this message

Tom Vijlbrief (tvijlbrief) wrote on 2013-02-03:

#31

Download full text (12.2 KiB)

@cheryl

Converting your exports and mounts to nfs version v4 will probably fix your
issue. I had similar issues and that fixed it for me and others.
Op 3 feb. 2013 18:50 schreef "cheryl" <email address hidden> het
volgende:

> Sorry, I did not know what to do with my report, so I am attaching it
> here since it seems to be the same problem.
>
> I am running desktop Ubuntu 12.04 lts on 4 separate gigabit-networked
> machines, for my full-home media center, with the tuner installed in the
> 'server' (desktop install running NFS server) under mythtv, and 3
> desktop install NFS clients in separate rooms. I had to upgrade the pre-
> existing 'server' (my learning platform) from 11.10 to 12.04 to match
> the clients because mythtv does not interoperate with differing
> versions, and I did not want to downgrade the clients to 11.10, I wanted
> a long term network install that is reliable and low-maintenance.
>
> Now I have terrible network performance. The tuner works fine within the
> 'server', and I can view shows on the server, channel surf, record, play
> back, etc. no problems. Over the NFS network at the desktop clients, the
> media center system is almost completely broken.
>
> If I start viewing a video media file, or listen to ripped audio, or
> i.e. open any media file at all, that is stored on the server, viewing
> on a client over the network, or if I attempt to edit the commercials
> out of shows on a client from over the network within mythtv editor, or
> even open a text file, the client will pause/hang for at least 30
> seconds while 'loading' the file, and then finally it will start
> sequential streaming the media with OK performance on one or maybe two
> clients max - but when using Videolan VLC to view server media files on
> a client, I had to increase the buffer by 10X (from 3 to 30 seconds of
> standard definition programming, approximately) to avoid long stuttering
> pauses in playback. Within mythtv frontend application at the client
> side, the video editing over the network is abominably slow, needing
> tenths of seconds, to seconds, to minutes, to hours, to completely hung,
> for the editor to respond to each keypress, getting slower all the time
> until it eventually grinds to a halt.
>
> Listing directories, editing files, viewing media, using any of the text
> editors or media players I have installed, all have at least 30 seconds
> of delay on 'opening' (sending a command, either from a terminal window,
> or a nautilus window, or a text editor, or whatever), and the entire
> network slowly grinds to a standstill eventually, with mythtv locked in
> unusable state at the clients, even though it is still working fine on
> the server.
>
> My server is a core 2 duo and so is my main media center client. The
> server is fully populated with 8gig of memory and terabytes of storage,
> and the client is sparsely populated with 2 gig of memory. I realize
> this is underpowered for hdtv media applications but surely a core 2 duo
> should be able to serve at least one standard definition media file at a
> time without any performance issues at all, and should be able to handle
> text editors with its eyes closed. I also h...

@cheryl

Converting your exports and mounts to nfs version v4 will probably fix your
issue. I had similar issues and that fixed it for me and others.
Op 3 feb. 2013 18:50 schreef "cheryl" <879334@bugs.launchpad.net> het
volgende:

> Sorry, I did not know what to do with my report, so I am attaching it
> here since it seems to be the same problem.
>
> I am running desktop Ubuntu 12.04 lts on 4 separate gigabit-networked
> machines, for my full-home media center, with the tuner installed in the
> 'server' (desktop install running NFS server) under mythtv, and 3
> desktop install NFS clients in separate rooms. I had to upgrade the pre-
> existing 'server' (my learning platform) from 11.10 to 12.04 to match
> the clients because mythtv does not interoperate with differing
> versions, and I did not want to downgrade the clients to 11.10, I wanted
> a long term network install that is reliable and low-maintenance.
>
> Now I have terrible network performance. The tuner works fine within the
> 'server', and I can view shows on the server, channel surf, record, play
> back, etc. no problems. Over the NFS network at the desktop clients, the
> media center system is almost completely broken.
>
> If I start viewing a video media file, or listen to ripped audio, or
> i.e. open any media file at all, that is stored on the server, viewing
> on a client over the network, or if I attempt to edit the commercials
> out of shows on a client from over the network within mythtv editor, or
> even open a text file, the client will pause/hang for at least 30
> seconds while 'loading' the file, and then finally it will start
> sequential streaming the media with OK performance on one or maybe two
> clients max - but when using Videolan VLC to view server media files on
> a client,  I had to increase the buffer by 10X (from 3 to 30 seconds of
> standard definition programming, approximately) to avoid long stuttering
> pauses in playback. Within mythtv frontend application at the client
> side, the video editing over the network is abominably slow, needing
> tenths of seconds, to seconds, to minutes, to hours, to completely hung,
> for the editor to respond to each keypress, getting slower all the time
> until it eventually grinds to a halt.
>
> Listing directories, editing files, viewing media, using any of the text
> editors or media players I have installed, all have at least 30 seconds
> of delay on 'opening' (sending a command, either from a terminal window,
> or a nautilus window, or a text editor, or whatever), and the entire
> network slowly grinds to a standstill eventually, with mythtv locked in
> unusable state at the clients, even though it is still working fine on
> the server.
>
> My server is a core 2 duo and so is my main media center client. The
> server is fully populated with 8gig of memory and terabytes of storage,
> and the client is sparsely populated with 2 gig of memory. I realize
> this is underpowered for hdtv media applications but surely a core 2 duo
> should be able to serve at least one standard definition media file at a
> time without any performance issues at all, and should be able to handle
> text editors with its eyes closed. I also have an i7 laptop client with
> 8 gig of memory and a terabyte of storage that suffers from the same
> poor network performance, even after disabling the troublesome Broadcom
> wireless power management, or even after plugging in the 1 gigabit wired
> connection and disabling of wireless.
>
> I have no security at all configured on this network, and root squash is
> turned off so that I can edit or delete files without having to
> synchronize my clients accounts over NIS with Kerberos, which I do not
> even understand how to install let alone set up. I am just learning
> network admin, doing it on my own, slowly. This is by design a really
> primitive, drop-dead simple network install with only a hardware
> firewall protecting it. All transfers are synchronous, meaning there is
> no automounting going on. Everything is hard mounted, so when the
> desktop server hangs, so do all the desktop clients, if they happen to
> be actively running an NFS-supplied media file or etc. It took me months
> to learn how to do the hard mounting of encrypted volumes properly,
> needing to let it time out and retry until the encrypted raid password
> is entered. I did not encrypt the operating system just yet, not until I
> have everything working in the open, and it is starting to look like I
> will have to start over with another distribution entirely because of
> this bug!!
>
> It seems there is an NFS tuning issue because there are some lost
> packets at the client side, and the NFS forums suggest increasing the
> number of threads. I am just now learning to debug and I have a dim
> perspective on these tuning issues at best. I am no Linux guru and have
> not increased the number of threads, because there is no way I could
> even use up the default number of threads with just a single user on
> this network, unless attempting to transcode from multiple machines at
> once and I do not have the disc space to support that just now anyway. I
> do not see any reason why NFS defaults should perform so badly that they
> lock up the system.
>
> So far I have changed nothing on the NFS tuning, and started searching
> for answers in the bug reports, because it seems to me that for my
> pathetically underused hardware there should be no issues whatsoever
> running a simple, hard-mounted, single-user household media center with
> multiple networked clients and no security, even using conservative
> default tuning that ships with NFS, even if I occasionally run out of
> threads once in a blue moon there should be no performance issues
> whatsoever due to NFS. If anything, I would expect performance issues to
> be due to the hardware, but not of this character where things worsen
> over time so that after one day the network freezes.
>
> Now here I find eerily familiar, long-standing bug reports that seem to
> incriminate kernel updates that changed the NFS auto tuning algorithm. I
> felt compelled to write an explanation of my own scenario since it is so
> different from the sophisticated, network-savvy implementations
> mentioned here. I thought perhaps something of importance could be
> learned, even from someone as ignorant as me, just because the
> implementation I am using is so dirt simple, eliminating many potential
> suspects.
>
> I am wondering to myself, does Ubuntu intend to compete with Microsoft
> in the home media center market, or just let Microsoft eat their lunch?
> Or am I supposed to convert my Ubuntu network to run under Windows
> networking via Samba, and just flush NFS down the toilet? I really,
> really want to use NFS! I want the performance and I do not trust
> Windows networking. But it seems that even advanced Ubuntu users have
> been frustrated for over a year because of this bug! Should I change my
> entire network to Red Hat? I am completely unfamiliar with even their
> packaging application, I have no spare cash lying around, and my
> impression of CentOS is that it is highly stripped-down. I would have to
> build up all my proprietary hardware drivers from scratch and there is
> nothing even approaching the level of the Ubuntu support community for
> Red Hat.
>
> Only solution I have found so far is to reboot all clients and server,
> after which NFS recovers temporarily, but still has long delays when
> starting up the streaming, and eventually grinds to a halt again,
> especially when mythtv fills its allocated disc space, after which it is
> a miracle if I can even task-switch out of its media player app in order
> to reboot all the clients and start over. It seems that even 'disc full'
> error messages are subject to the same limitations of this NFS bug,
> whatever it is.
>
> One other thing I noticed is that mp4 transcoding jobs running in
> Handbrake on the clients, that seem to finish properly and close their
> output files, often actually do not finish writing to the server,
> leaving incomplete, corrupted files and requiring a second, third, or
> fourth attempt at transcoding. Also, the 32-bit machine I am using seems
> to have problems with the no-root-squash function -- while write-
> protecting transcoded output, it chowns its transcoded files attributed
> to user 'nobody', even though the other two (64-bit) clients have no
> problem chowning to root via NFS! Apparently, the older and slower the
> hardware, the worse the problems it experiences with NFS.
>
> I apologize for the long, rambling comment on this bug report. I only
> even wrote it to indicate to the folks at Canonical that this NFS
> problem is affecting ordinary Ubuntu desktop users who are attempting
> merely to run the same full-house media center scenario that is being
> advertised on television for DirectTV and cable subscribers, and being
> implemented independently by Windows users familiar with mythtv and
> handbrake. Without a properly functioning NFS, Ubuntu is properly
> crippled.
>
> I am not a programmer or system administrator, just a retired engineer
> with a little bit of hacking experience. A very little bit. After months
> learning to install and use Ubuntu properly, then I had to read
> documentation files and support group forums for a full year to get this
> far with the network setup, and now to find that the operating system I
> chose has a known, long-standing fatal networking flaw that I have been
> battling with all along feels like a big fat stick in the eye. Would
> some kind soul please remove that stick?
>
> Does anyone consider it important or even kind to add warnings to the
> package manager, or even to the Ubuntu download page, so we do not all
> have to stumble across this networking show-stopper on our own? I chose
> Ubuntu for its ease of use and support, but here is a big fat gaping
> network hole that makes it impossible to even achieve Windows Media
> Center level of performance. Adding insult to injury, there is no
> advance warning anywhere about this networking problem.
>
> Thanks for the learning experience, friends, but if this is the level of
> functionality I can expect from Ubuntu going forward, I am going to have
> no choice but to find another distribution that implements NFS
> correctly. I intend to keep increasing the complexity of my network,
> implementing full-network login accounts, printer sharing, etc. etc, and
> no way do I intend to do it all under Samba! Sorry, this is all the
> feedback I can provide on this bug, mainly that it affects me too and
> that it seems basic to NFS and that it is a show-stopper for my intended
> application. thx
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/879334
>
> Title:
>   nfsd from nfs-kernel-server very slow and system load from 25%-100%
>   from nfsd
>
> Status in “linux” package in Ubuntu:
>   Incomplete
> Status in “nfs-utils” package in Ubuntu:
>   Confirmed
> Status in “linux” package in Debian:
>   Incomplete
>
> Bug description:
>   I have a diskless ubuntu 10.10 machine which I boot regularly using
>   pxe-boot from another ubuntu machine where I have the root filesystem
>   of the diskless machine exported over nfs.
>
>   I set it up about a year ago using 10.10. In the mean while the server
>   machine got upgraded to 11.04 and as of yesterday to 11.10.
>
>   After the upgrade to 11.10 the diskless machine is dead slow (most of
>   the times it wont even boot completely) and the load on the server
>   machine is high (25%-100% as shown from top). If in the middle of the
>   diskless computer booting I do a restart of the nfs server, the client
>   computer proceeds with the boot a bit more and then it gets stuck
>   again. I have to restart and nfs-server 3-4 times in order to get the
>   gdm login screen at the client machine
>
>   ProblemType: Bug
>   DistroRelease: Ubuntu 11.10
>   Package: nfs-kernel-server 1:1.2.4-1ubuntu2
>   ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
>   Uname: Linux 3.0.0-12-generic i686
>   ApportVersion: 1.23-0ubuntu3
>   Architecture: i386
>   Date: Fri Oct 21 12:53:02 2011
>   ProcEnviron:
>    LANG=en_US.UTF-8
>    SHELL=/bin/bash
>   SourcePackage: nfs-utils
>   UpgradeStatus: Upgraded to oneiric on 2011-10-20 (1 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/879334/+subscriptions
>

Revision history for this message

Torsten Bronger (bronger) wrote on 2013-05-17:

#32

System load figures are hard to compare. I have an AMD Turion II processor (approx. twice as fast as an Atom), and I manage to have 40 MByte/s NFS throughput. The CPU is at 80%. Is this already an unusual high number, meaning that I may be affected by this bug?

I'm using Ubuntu Server 13.04.

Revision history for this message

Karsten Suehring (suehring) wrote on 2013-05-17:

#33

I did some testing with a newer kernel on Debian a while ago. The 3.x series seemed to have higher load indeed, but as far as I could test, it did not kill the server. If you have multiple clients to the server it would be a good test to start several writes (like dd from /dev/random) over NFS and see if the server can handle that. It did not many writes to completely kill my server on 12.04.

It would be good news to hear that just by upgrading to newer kernel versions in later releases this issue would be resolved even without Ubuntu proactively working on a fix. But it will still remain an issue in the "Long term support" release 12.04.

My bottom line here is that Ubuntu is apparently caring more about mobile phones now than servers which is especially unfortunate after many people finally talked Dell into more support for Ubuntu. I did not even get a reply from the Ubuntu sales department (except for an automatic reply which promised to do so within two days) when we offered to pay for resolving the issue...

Revision history for this message

Thomas Anantharaman (thomas-anantharaman) wrote on 2013-06-01:

#34

I have been struggling at work with the same problem for the last few months, and accidentally stumbled on a fix that seems to work under Ubuntu 12.04 (3.5.0-32-generic 64-bit kernel) :

1. On clients set rsize=8192,wsize=8192 in /etc/fstab (Smaller values of rsize,wsize will also work, but reduce throughput).

Previously with rsize=wsize=32K, any 2 clients writing large files to our server (with 1 Gbit NICs for clients and 2x1Gbit NICs for the server) would freeze : all clients would appear to hang on accessing the NFS server, but the hang would eventually resolve itself (after several hours of writes at around 100k/sec on the server grinding away continuously with jbd2, or within minutes if one of the 2 clients with the large writes temporarily had its ethernet turned off and then back on after the other client's write had completed).

Now 4 clients can write to the server at 53 MB/sec saturating the server bandwidth at 210 MB/sec and the network and server remain responsive from other interactive clients.

Bug Watch Updater (bug-watch-updater) on 2013-10-05

Changed in linux (Debian):
status:	Incomplete → Fix Released

Revision history for this message

masakre (informatikoa1) wrote on 2014-07-08:

#35

Is this bug solved?? I am having similar issues with my Ubuntu server 12.04.04 64 bits. I am using ldap authentication with NFS to mount users home. The server uses 2 gigabyte ethernet using bonding and there are 20 users and have 2x4 processors with 12 Gb ram (it is an ACER gateway gr320 f1 server). I change my server to the 12.04 becouse we were using an old version (7.10) and I was having some problems, but since the change, the clients are too slow. I notice that the server processors load it's high and I think this bug could be afecting me.

Sorry about my english and thank you in advance :)

Revision history for this message

dilan (dilanasanga-x) wrote on 2014-09-26:

#36

Hi All,

I have also setup the same environment at my office for developers to work with JAVA and PHP etc., which was very slow initially.

Later I came to know programs like Netbeans, firefox, Chrome and systems logged in users cache is created in users home (as hidden directories) itself and when there are lot of users logged into system, huge amount of disk I/O goes due to the data operations. Because all users are writing to their homes which is in the same disk (I have setup raid 1 there)

What I simply did was moved all those caching directories to users local hard disk and created symbolic links to them. This gave a significant performance improvement.

like this
root@rcapladm:/home/dilan# pwd
/home/dilan

root@rcapladm:/home/dilan# ls -la

lrwxrwxrwx 1 dilan users 25 May 5 09:55 .local -> /rcapl/home/dilan/.local
lrwxrwxrwx 1 dilan users 27 May 5 09:55 .mozilla -> /rcapl/home/dilan/.mozilla
lrwxrwxrwx 1 dilan users 25 Aug 18 08:37 .mysql -> /rcapl/home/dilan/.mysql
lrwxrwxrwx 1 dilan users 29 Aug 18 08:38 .mysqlgui -> /rcapl/home/dilan/.mysqlgui/
lrwxrwxrwx 1 dilan users 28 May 5 09:55 .netbeans -> /rcapl/home/dilan/.netbeans
lrwxrwxrwx 1 dilan users 34 May 5 09:55 .netbeans-derby -> /rcapl/home/dilan/.netbeans-derby

Because now their caching (which uses most of disk IO ) is now not at server, and it is in local machine server does not have lots of disk IO. Still when a java application compiles using "mvn clean compile", compilation was slow. I also applied a simple trick in their pom.xml to set the compilation location not in the server but in the local machine. Then that was also fast. So all their source codes are in server, protected.

I don't say system is 100 perfect at all. Still little slowness is there.

Still my biggest problem is, if any users network is gone, suddenly the nfs mount goes off and system gets stuck with increasing load. I cannot see any program using system load when I check with "top" command , but system gets stuck and even network comes back, it is not re mounting the users home automatically 90 times out of 100. Even we cannot do it manually. Because when I type "df -h", it shows nothing but trying to get mount information.

Any Idea or solution. ?

Thanks all.

Specks of my NFS & LDAP server.

Cor2Duo Intel, 4 GB RAM, 500 HD (Normal) with RAID 1.
Around 10 Users working this Environment.

Hi All,

I have also setup the same environment at my office for developers to work with JAVA and PHP etc., which was very slow initially.

Later I came to know programs like Netbeans, firefox, Chrome and systems logged in users cache is created in users home (as hidden directories) itself and when there are lot of users logged into system, huge amount of disk I/O goes due to the data operations. Because all users are writing to their homes which is in the same disk (I have setup raid 1 there)

What I simply did was moved all those caching directories to users local hard disk and created symbolic links to them. This gave a significant performance improvement.

like this
root@rcapladm:/home/dilan# pwd
/home/dilan

root@rcapladm:/home/dilan# ls -la

lrwxrwxrwx  1 dilan users     25 May  5 09:55 .local -> /rcapl/home/dilan/.local
lrwxrwxrwx  1 dilan users     27 May  5 09:55 .mozilla -> /rcapl/home/dilan/.mozilla
lrwxrwxrwx  1 dilan users     25 Aug 18 08:37 .mysql -> /rcapl/home/dilan/.mysql
lrwxrwxrwx  1 dilan users     29 Aug 18 08:38 .mysqlgui -> /rcapl/home/dilan/.mysqlgui/
lrwxrwxrwx  1 dilan users     28 May  5 09:55 .netbeans -> /rcapl/home/dilan/.netbeans
lrwxrwxrwx  1 dilan users     34 May  5 09:55 .netbeans-derby -> /rcapl/home/dilan/.netbeans-derby

Because now their caching (which uses most of disk IO ) is now not at server, and it is in local machine server does not have lots of disk IO. Still when a java application compiles using "mvn clean compile", compilation was slow. I also applied a simple trick in their pom.xml to set the compilation location not in the server but in the local machine. Then that was also fast. So all their source codes are in server, protected.

I don't say system is 100 perfect at all. Still little slowness is there.

Still my biggest problem is, if any users network is gone, suddenly the nfs mount goes off and system gets stuck with increasing load. I cannot see any program using system load when I check with "top" command , but system gets stuck and even network comes back, it is not re mounting the users home automatically 90 times out of 100. Even we cannot do it manually. Because when I type "df -h", it shows nothing but trying to get mount information.

Any Idea or solution. ?

Thanks all.

Specks of my NFS & LDAP server.

Cor2Duo Intel, 4 GB RAM, 500 HD (Normal) with RAID 1. 
Around 10 Users working this Environment.

Revision history for this message

Lukas Grzesik (lukgrz) wrote on 2014-12-17:

#37

Duplicate/similar problems https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1315955 ?

Ubuntu
nfs-utils package

nfsd from nfs-kernel-server very slow and system load from 25%-100% from nfsd

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
linux (Debian)	Fix Released	Unknown	debbugs #692957
linux (Ubuntu)	Incomplete	Undecided	Unassigned
nfs-utils (Ubuntu)	Confirmed	Undecided	Unassigned

Ubuntunfs-utils package

nfsd from nfs-kernel-server very slow and system load from 25%-100% from nfsd

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
nfs-utils package