Comment 0 for bug 1957986

Revision history for this message
dann frazier (dannf) wrote :

[Impact]
An NFSv4 client that does a lot of opens/closes can overwhelm and NFSv4 server, causing a significant drop in performance. In my testing, I've seen performance drop from ~700MiB/s down to < 10MiB/s. The same workload using NFSv3 does not have this problem.

[Test Case]
This can be demonstrated using the elbencho benchmark from https://github.com/breuner/elbencho:
 $ elbencho -t 40 -r -n 10 -N 5000 -s 128k -b 128k /mnt/nfs/ubuntu

You'll notice the nfsd threads (I stuck w/ the default of 4) start to consume 100% CPU, and the performance of the elbencho benchmark will begin to trickle.

[Fix]
The following fix solves the problem, but there are a number of patches dependencies required before it will apply to focal:

commit 10717f45639f6c1bc27b56405252c3a027406d92 (refs/bisect/bad)
Author: Trond Myklebust <email address hidden>
Date: Mon Jan 27 09:58:19 2020 -0500

    NFSv4: Limit the total number of cached delegations

    Delegations can be expensive to return, and can cause scalability issues
    for the server. Let's therefore try to limit the number of inactive
    delegations we hold.
    Once the number of delegations is above a certain threshold, start
    to return them on close.

    Signed-off-by: Trond Myklebust <email address hidden>
    Signed-off-by: Anna Schumaker <email address hidden>

[What could go wrong]
The fixes are restricted to NFS code, so problems should be limited to NFS users. They could include performance issues, crashes, etc.