NFSv4 performance problem with newer kernels

Bug #1960826 reported by Andy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I am seeing an issue with NFSv4 performance on Ubuntu 20.04.3 LTS on both clients and server (tested kernels 5.4 to 5.16), where the server is connected by 10Gbit Ethernet and with multiple clients connected by 1Gbit. I am reading a large file from an NFS mount via "dd" sending output to /dev/null with bs=4096. Using default sysctl and mount options I am seeing speeds maxing out below 1Gbit/sec. If I force NFSv3 I see speeds close to 10Gbit/sec with sufficient clients connected. I also see no issue with Ubuntu 16.04 (when used for both server and clients) in conjunction with NFSv4. I have attached the output from two iftop's which shows the status when using NFSv4 and when using NFSv3, in the NFSv4 you can clearly see one client reading at max speed and all the others apparently throttling back to practically nothing.
I have additionally tested a range of mount options, which made no difference, BBR congestion control which made no difference and the following kernel settings which also made no difference:

net.core.netdev_max_backlog=250000
net.core.rmem_max=4194304
net.core.wmem_max=4194304
net.core.rmem_default=4194304
net.core.wmem_default=4194304
net.core.optmem_max=4194304
net.ipv4.tcp_rmem=4096 87380 4194304
net.ipv4.tcp_wmem=4096 65536 4194304
net.ipv4.tcp_mem=786432 1048576 26777216
net.ipv4.udp_mem=1529892 2039859 3059784
net.ipv4.udp_rmem_min=16384
net.ipv4.udp_wmem_min=16384

The problem is seen on dissimilar hardware, ie this problem exists when testing with an HP DL380 G10 with Mellanox 10Gbit Ethernet connected to Cisco switch, and also on a Dell R430 with Broadcom 10Gbit Ethernet connected to a Netgear switch (just to name two of several configurations that have been tested). The clients vary in each test case also, but are desktop PCs and laptops.
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw----+ 1 root audio 116, 1 Feb 15 10:23 seq
 crw-rw----+ 1 root audio 116, 33 Feb 15 10:23 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: pass
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2022-02-14 (0 days ago)
InstallationMedia: Ubuntu-Server 20.04.2 LTS "Focal Fossa" - Release amd64 (20210201.2)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: Dell Inc. PowerEdge R430
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-28-generic root=UUID=d227a745-134e-4633-9c26-bcc062655f95 ro
ProcVersionSignature: Ubuntu 5.13.0-28.31~20.04.1-generic 5.13.19
RelatedPackageVersions:
 linux-restricted-modules-5.13.0-28-generic N/A
 linux-backports-modules-5.13.0-28-generic N/A
 linux-firmware 1.187.26
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: focal uec-images
Uname: Linux 5.13.0-28-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 06/07/2021
dmi.bios.release: 2.13
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.13.0
dmi.board.name: 0CN7X8
dmi.board.vendor: Dell Inc.
dmi.board.version: A04
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.13.0:bd06/07/2021:br2.13:svnDellInc.:pnPowerEdgeR430:pvr:rvnDellInc.:rn0CN7X8:rvrA04:cvnDellInc.:ct23:cvr:skuSKU=NotProvided;ModelName=PowerEdgeR430:
dmi.product.name: PowerEdge R430
dmi.product.sku: SKU=NotProvided;ModelName=PowerEdge R430
dmi.sys.vendor: Dell Inc.

Revision history for this message
Andy (andyaiken) wrote :
Revision history for this message
Andy (andyaiken) wrote (last edit ):

Where both client and server are connected via 10Gbit speeds are as expected with NFSv4 (testing with a single client)

Andy (andyaiken)
description: updated
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1960826/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1960826

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Andy (andyaiken) wrote :
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Andy (andyaiken) wrote : CRDA.txt

apport information

tags: added: apport-collected focal uec-images
description: updated
Revision history for this message
Andy (andyaiken) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : Lspci.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : Lspci-vt.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : Lsusb.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : Lsusb-t.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : Lsusb-v.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : ProcModules.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : UdevDb.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : WifiSyslog.txt

apport information

Revision history for this message
Andy (andyaiken) wrote : acpidump.txt

apport information

Revision history for this message
Andy (andyaiken) wrote :

I have found the root cause, in my test environment the clients are all clones with unique machine-id but the same hostname. The issue is resolve by either changing the hostname of each client, or by setting a unique value here:

echo options nfs nfs4_unique_id=[string] > /etc/modprobe.d/nfsclient.conf

In my production environment the clients all share the same NFS root, and I'm not sure how to set a random value in nfsclient.conf (I tried nfs4_unique_id=`cat /proc/sys/kernel/random/uuid` but this doesn't work). Currently I can work around by setting a random hostname. If anyone can suggest how to do this via nfsclient.conf instead that might be neater.

Revision history for this message
Andy (andyaiken) wrote :

OK, to set a random nfs4_unique_id you can just cat/echo a string like this:

cat /proc/sys/kernel/random/uuid |tr -d '\n' > /sys/module/nfs/parameters/nfs4_unique_id

so long as you do this before your first NFS mount, which in my case is not a problem.

Revision history for this message
Andy (andyaiken) wrote :

so as not to duplicate effort, I see there is a request for enhancement over at Redhat: https://bugzilla.redhat.com/show_bug.cgi?id=1801326

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.