nfsd hangs and never recovers after NFS4ERR_DELAY and a connection loss
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
| Jammy |
Fix Released
|
Medium
|
Matthew Ruffell | ||
| Noble |
Fix Released
|
Medium
|
Matthew Ruffell | ||
| linux-gke (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
| Jammy |
Fix Released
|
Undecided
|
Tim Whisonant | ||
Bug Description
BugLink: https:/
[Impact]
nfsd loops forever in nfsd4_cb_
and the connection is subsequently lost.
What happens is that NFS4ERR_DELAY sets cb->cb_seq_status to -10008, but it is
never set back to 1, so it just keeps sending NFS4ERR_DELAY.
The stack trace looks like:
watchdog: BUG: soft lockup - CPU#33 stuck for 22s! [kworker/
Kernel panic - not syncing: softlockup: hung tasks
CPU: 33 PID: 1520679 Comm: kworker/u120:29 Tainted: G L 5.15.0-1069-gke #75-Ubuntu
Workqueue: rpciod rpc_async_schedule [sunrpc]
Call Trace:
RIP: 0010:__
Code: 0f b6 f9 66 90 44 89 fa 48 89 de 4d 8d 7e 50 4c 89 f7 e8 c8 fb ff ff 4c 89 6b 28 49 8b 46 50 49 39 c7 74 5a 4d 3b 6e 60 78 54 <49> 8b 56 50 48 8d 43 60 48 89 42 08 48 89 53 60 4c 89 7b 68 49 89
...
rpc_sleep_
rpc_delay+
nfsd4_
nfsd4_
pc_exit_
? __rpc_sleep_
__rpc_
rpc_async_
process_
worker_
? process_
kthread+
? set_kthread_
ret_from_
</TASK>
There is no workaround.
[Fix]
This was fixed in 6.9-rc1 by:
commit 961b4b5e86bf56a
From: Chuck Lever <email address hidden>
Date: Fri, 26 Jan 2024 12:45:17 -0500
Subject: NFSD: Reset cb_seq_status after NFS4ERR_DELAY
Link: https:/
This is present in 5.15.179 and 6.6.76 upstream stable.
[Testcase]
There is no known synthetic reproducer available.
Currently we see it in production workloads on Google Kubernetes Engine, and
we have successfully deployed and ran a test kernel in production with no
further incidents occurring. Before it would lock up once a day.
The test kernel is available in the following ppa:
https:/
If you install the kernel from the ppa, the issue no longer occurs.
[Where problems can occur]
We are resetting the value of cb->cb_seq_status back to 1 to let it get out of
its state machine, and to actually make some progress, instead of being
trapped at NFS4ERR_DELAY.
If a regression were to occur, it would affect NFS v4.x systems, and it wouldn't
likely cause any real issues, likely some flapping between NFS4ERR_DELAY and
sending callbacks.
CVE References
- 2023-52664
- 2023-52927
- 2024-26837
- 2024-26982
- 2024-36476
- 2024-39282
- 2024-41013
- 2024-47408
- 2024-47726
- 2024-47736
- 2024-49568
- 2024-49571
- 2024-53125
- 2024-53179
- 2024-53685
- 2024-53687
- 2024-53690
- 2024-54193
- 2024-54455
- 2024-54460
- 2024-54683
- 2024-55639
- 2024-55881
- 2024-55916
- 2024-56369
- 2024-56372
- 2024-56599
- 2024-56652
- 2024-56653
- 2024-56654
- 2024-56656
- 2024-56657
- 2024-56659
- 2024-56660
- 2024-56662
- 2024-56664
- 2024-56667
- 2024-56670
- 2024-56675
- 2024-56709
- 2024-56710
- 2024-56715
- 2024-56716
- 2024-56717
- 2024-56718
- 2024-56721
- 2024-56758
- 2024-56759
- 2024-56760
- 2024-56761
- 2024-56763
- 2024-56764
- 2024-56767
- 2024-56769
- 2024-56770
- 2024-57791
- 2024-57792
- 2024-57793
- 2024-57801
- 2024-57802
- 2024-57804
- 2024-57806
- 2024-57807
- 2024-57834
- 2024-57841
- 2024-57879
- 2024-57882
- 2024-57883
- 2024-57884
- 2024-57885
- 2024-57887
- 2024-57888
- 2024-57889
- 2024-57890
- 2024-57892
- 2024-57893
- 2024-57895
- 2024-57896
- 2024-57897
- 2024-57898
- 2024-57899
- 2024-57900
- 2024-57901
- 2024-57902
- 2024-57903
- 2024-57904
- 2024-57906
- 2024-57907
- 2024-57908
- 2024-57910
- 2024-57911
- 2024-57912
- 2024-57913
- 2024-57916
- 2024-57917
- 2024-57925
- 2024-57926
- 2024-57929
- 2024-57931
- 2024-57932
- 2024-57933
- 2024-57938
- 2024-57939
- 2024-57940
- 2024-57945
- 2024-57946
- 2024-57973
- 2024-57977
- 2024-57978
- 2024-57979
- 2024-57980
- 2024-57981
- 2024-57986
- 2024-58001
- 2024-58002
- 2024-58005
- 2024-58007
- 2024-58010
- 2024-58014
- 2024-58016
- 2024-58017
- 2024-58020
- 2024-58034
- 2024-58051
- 2024-58052
- 2024-58055
- 2024-58058
- 2024-58063
- 2024-58069
- 2024-58071
- 2024-58072
- 2024-58076
- 2024-58079
- 2024-58083
- 2024-58085
- 2024-58086
- 2024-58090
- 2025-21631
- 2025-21632
- 2025-21634
- 2025-21635
- 2025-21636
- 2025-21637
- 2025-21638
- 2025-21639
- 2025-21640
- 2025-21642
- 2025-21643
- 2025-21645
- 2025-21646
- 2025-21647
- 2025-21648
- 2025-21649
- 2025-21650
- 2025-21651
- 2025-21652
- 2025-21653
- 2025-21654
- 2025-21655
- 2025-21656
- 2025-21658
- 2025-21659
- 2025-21660
- 2025-21662
- 2025-21663
- 2025-21664
- 2025-21684
- 2025-21704
- 2025-21707
- 2025-21708
- 2025-21711
- 2025-21715
- 2025-21718
- 2025-21719
- 2025-21721
- 2025-21722
- 2025-21726
- 2025-21727
- 2025-21728
- 2025-21731
- 2025-21735
- 2025-21736
- 2025-21744
- 2025-21745
- 2025-21748
- 2025-21749
- 2025-21753
- 2025-21758
- 2025-21760
- 2025-21761
- 2025-21762
- 2025-21763
- 2025-21764
- 2025-21765
- 2025-21766
- 2025-21767
- 2025-21772
- 2025-21776
- 2025-21779
- 2025-21781
- 2025-21782
- 2025-21785
- 2025-21787
- 2025-21791
- 2025-21795
- 2025-21796
- 2025-21799
- 2025-21802
- 2025-21804
- 2025-21806
- 2025-21811
- 2025-21814
- 2025-21820
- 2025-21823
- 2025-21826
- 2025-21830
- 2025-21835
- 2025-21844
- 2025-21846
- 2025-21848
- 2025-21858
- 2025-21859
- 2025-21862
- 2025-21865
- 2025-21866
- 2025-21871
- 2025-21875
- 2025-21877
- 2025-21878
- 2025-21887
- 2025-21971
| description: | updated |
| Changed in linux (Ubuntu): | |
| status: | New → Fix Released |
| Changed in linux (Ubuntu Jammy): | |
| status: | New → In Progress |
| Changed in linux (Ubuntu Noble): | |
| status: | New → In Progress |
| Changed in linux (Ubuntu Jammy): | |
| importance: | Undecided → Medium |
| Changed in linux (Ubuntu Noble): | |
| importance: | Undecided → Medium |
| Changed in linux (Ubuntu Jammy): | |
| assignee: | nobody → Matthew Ruffell (mruffell) |
| Changed in linux (Ubuntu Noble): | |
| assignee: | nobody → Matthew Ruffell (mruffell) |
| tags: | added: sts |
| no longer affects: | linux-gke (Ubuntu Noble) |
| Changed in linux-gke (Ubuntu Jammy): | |
| assignee: | nobody → Tim Whisonant (tswhison) |
| Changed in linux (Ubuntu Jammy): | |
| status: | In Progress → Fix Committed |
| Changed in linux (Ubuntu Noble): | |
| status: | In Progress → Fix Committed |
| tags: | added: kernel-daily-bug |
| tags: |
added: verification-done-jammy-linux-gke removed: verification-needed-jammy-linux-gke |

Patches submitted to the Kernel Team Mailing List
Cover Letter: /lists. ubuntu. com/archives/ kernel- team/2025- March/158128. html /lists. ubuntu. com/archives/ kernel- team/2025- March/158129. html
https:/
Patch:
https:/