2019-05-14 09:07:51 |
Frank Burkhardt |
bug |
|
|
added bug |
2019-05-14 23:38:48 |
Terry Rudd |
bug |
|
|
added subscriber Terry Rudd |
2019-05-27 14:02:21 |
Launchpad Janitor |
linux-meta-hwe (Ubuntu): status |
New |
Confirmed |
|
2019-05-27 15:46:35 |
Leandro Piccilli |
bug |
|
|
added subscriber Leandro Piccilli |
2019-10-02 04:04:16 |
Matthew Ruffell |
bug task added |
|
linux (Ubuntu) |
|
2019-10-02 04:04:35 |
Matthew Ruffell |
bug task deleted |
linux-meta-hwe (Ubuntu) |
|
|
2019-10-02 04:04:52 |
Matthew Ruffell |
nominated for series |
|
Ubuntu Disco |
|
2019-10-02 04:04:52 |
Matthew Ruffell |
bug task added |
|
linux (Ubuntu Disco) |
|
2019-10-02 04:05:05 |
Matthew Ruffell |
linux (Ubuntu Disco): importance |
Undecided |
Medium |
|
2019-10-02 04:05:18 |
Matthew Ruffell |
linux (Ubuntu): status |
New |
Fix Released |
|
2019-10-02 04:05:21 |
Matthew Ruffell |
linux (Ubuntu Disco): status |
New |
In Progress |
|
2019-10-02 04:05:26 |
Matthew Ruffell |
linux (Ubuntu Disco): assignee |
|
Matthew Ruffell (mruffell) |
|
2019-10-29 23:02:57 |
Matthew Ruffell |
summary |
NFS connections block while causing a high-bandwidth RPC-pingpong between client and server |
NFSv4.1: Interrupted connections cause high bandwidth RPC ping-pong between client and server |
|
2019-10-29 23:04:38 |
Matthew Ruffell |
description |
There's a bug in kernels before Linux 5.0 that affects NFS 4.1 connections. The bug presents itself like this:
* On NFS clients: Attempts to access mounted NFS shares associated with the affected server
block indefinitely.
* On the network: A storm of repeated RPCs between NFS client and server uses a lot
of bandwidth. Each RPC is acknoledged by the server with an NFS4ERR_SEQ_MISORDERED error.
* Other NFS clients connected to the same NFS server: Performance drops dramatically.
A patch is available to fix this problem:
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3453d5708b33efe76f40eca1c0ed60923094b971>
Is is possible to integrate the patch into the 4.18 kernel series?
I'm using Ubuntu 18.04.2 LTS as NFS client an server.
Thank you.
Best regards,
Frank Burkhardt |
BugLink: https://bugs.launchpad.net/bugs/1828978
[Impact]
There is a bug in NFS v4.1 that causes a large amount of RPC calls between a client and server when a previous RPC call is interrupted. This uses a large amount of bandwidth and can saturate the network.
The symptoms are so:
* On NFS clients:
Attempts to access mounted NFS shares associated with the affected server block indefinitely.
* On the network:
A storm of repeated RPCs between NFS client and server uses a lot of bandwidth. Each RPC is acknoledged by the server with an NFS4ERR_SEQ_MISORDERED error.
* Other NFS clients connected to the same NFS server:
Performance drops dramatically.
This occurs during a "false retry", when a client attempts to make a new RPC call using a slot+sequence number that references an older, cached call. This happens when a user process interrupts an RPC call that is in progress.
[Fix]
This was fixed in 5.1 upstream with the below commit:
commit 3453d5708b33efe76f40eca1c0ed60923094b971
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Wed Jun 20 17:53:34 2018 -0400
Subject: NFSv4.1: Avoid false retries when RPC calls are interrupted
The fix is to pre-emptively increment the sequence number if an RPC call is interrupted, and to address corner cases we interpret the NFS4ERR_SEQ_MISORDERED error as a sign we need to locate an approperiate sequence number between the value we sent, and the last successfully acked SEQUENCE call.
Commit 3453d5708b33efe76f40eca1c0ed60923094b971 is a clean cherry-pick to disco.
[Testcase]
This is difficult to reproduce on test systems, and has instead been verified on a production NFS v4.1 system in a customer environment. This server is heavily trafficked and has a large number of different NFS clients connected to it.
I have built a test kernel that contains the above patch, and also patches for Bug 1842037. It is available here:
https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test
Note that the above kernel is for bionic HWE, and not explicitly disco.
Discussion about the patch validation can be found at the bottom of Bug 1842037.
On unpatched kernels, expect to see the symptoms mentioned in Impact, and on patched systems, everything working as intended.
[Regression Potential]
The changes are localised to NFS v4.1 only, and other versions of NFS are not affected. If a regression occurs, users can downgrade NFS versions to v4.0 or v3.x until a fix is made.
The changes only impact when connections are interrupted, and under typical blue sky scenarios would not be invoked.
There have been no fixup commits or commits near the requested commit in newer kernels, which points to this commit fixing the issue, and adopted by the community. |
|
2019-10-29 23:05:19 |
Matthew Ruffell |
tags |
bionic |
bionic disco sts |
|
2019-11-08 17:28:35 |
Khaled El Mously |
linux (Ubuntu Disco): status |
In Progress |
Fix Committed |
|
2019-11-14 18:46:29 |
Ubuntu Kernel Bot |
tags |
bionic disco sts |
bionic disco sts verification-needed-disco |
|
2019-11-18 06:29:24 |
Matthew Ruffell |
tags |
bionic disco sts verification-needed-disco |
bionic disco sts verification-done-disco |
|
2019-12-02 12:43:50 |
Launchpad Janitor |
linux (Ubuntu Disco): status |
Fix Committed |
Fix Released |
|
2019-12-02 12:43:50 |
Launchpad Janitor |
cve linked |
|
2019-15794 |
|