Activity log for bug #2012020

Date Who What changed Old value New value Message
2023-03-17 03:34:23 AzT3K bug added bug
2023-03-17 04:00:06 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2023-03-17 04:09:02 AzT3K tags apport-collected bionic ec2-images
2023-03-17 04:09:04 AzT3K description We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage. Diagnostics... `nfsiostat` shows 50-600 ops/s on the servers where it is happening. `nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y` `nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles; ``` CALL [ operations: 4 tag: void minor version: 1 [ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ] [ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ] [ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ] [ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ] REPLY [ operations: 4 status: OK tag: void [ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ] [ PUTFH(22) [ status: OK ] [ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ] [ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ] ``` `lsof -N` shows nothing using files from the NFS drive. I tried changing some of the mount options to reduce the amount of metadata requests going through to the NFS server and found that `lazytime` did help - observations are as below; Server 1: checked mount options - using relatime checked nfs stats - 460.982 ops/s updated mount options unmounted and mounted the NFS drive checked nfs stats - 1.400 ops/s rebooted checked nfs stats - 1.316 ops/s Server 2: checked mount options - using relatime checked nfs stats - 390.998 ops/s did not updated mount options unmounted and mounted the NFS drive checked nfs stats - 1.750 ops/s rebooted checked nfs stats - 531.932 ops/s Conclusion from this is that; - Unmounting and remounting the disk stops the issue temporarily - `lazytime` option seems to prevent the issue from happening - Issue may be boot related I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below; ``` Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot ``` This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`. I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount. Output from `ubuntu-bug linux` ``` ProblemType: Bug ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 Date: Fri Mar 17 16:29:14 2023 Dependencies: adduser 3.116ubuntu1 apt 1.6.14 apt-utils 1.6.14 busybox-initramfs 1:1.27.2-2ubuntu3.4 ca-certificates 20211016ubuntu0.18.04.1 coreutils 8.28-1ubuntu1 cpio 2.12+dfsg-6ubuntu0.18.04.4 debconf 1.5.66ubuntu1 debconf-i18n 1.5.66ubuntu1 dmsetup 2:1.02.145-4.1ubuntu3.18.04.3 dpkg 1.19.0.5ubuntu2.4 fdisk 2.31.1-0.4ubuntu3.7 gcc-8-base 8.4.0-1ubuntu1~18.04 gettext-base 0.19.8.1-6ubuntu0.3 gpgv 2.2.4-1ubuntu1.6 grub-common 2.02-2ubuntu8.25 grub-gfxpayload-lists 0.7 grub-pc 2.02-2ubuntu8.25 grub-pc-bin 2.02-2ubuntu8.25 grub2-common 2.02-2ubuntu8.25 init-system-helpers 1.51 initramfs-tools 0.130ubuntu3.13 initramfs-tools-bin 0.130ubuntu3.13 initramfs-tools-core 0.130ubuntu3.13 klibc-utils 2.0.4-9ubuntu2.2 kmod 24-1ubuntu3.5 libacl1 2.2.52-3build1 libapt-inst2.0 1.6.14 libapt-pkg5.0 1.6.14 libattr1 1:2.4.47-2build1 libaudit-common 1:2.8.2-1ubuntu1.1 libaudit1 1:2.8.2-1ubuntu1.1 libblkid1 2.31.1-0.4ubuntu3.7 libbz2-1.0 1.0.6-8.1ubuntu0.2 libc6 2.27-3ubuntu1.6 libcap-ng0 0.7.7-3.1 libdb5.3 5.3.28-13.1ubuntu1.1 libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3 libfdisk1 2.31.1-0.4ubuntu3.7 libffi6 3.2.1-8 libfreetype6 2.8.1-2ubuntu2.2 libfuse2 2.9.7-1ubuntu1 libgcc1 1:8.4.0-1ubuntu1~18.04 libgcrypt20 1.8.1-4ubuntu1.3 libgmp10 2:6.1.2+dfsg-2ubuntu0.1 libgnutls30 3.5.18-1ubuntu1.6 libgpg-error0 1.27-6 libgpm2 1.20.7-5 libhogweed4 3.4.1-0ubuntu0.18.04.1 libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php] libklibc 2.0.4-9ubuntu2.2 libkmod2 24-1ubuntu3.5 liblocale-gettext-perl 1.07-3build2 liblz4-1 0.0~r131-2ubuntu3.1 liblzma5 5.2.2-1.3ubuntu0.1 libmount1 2.31.1-0.4ubuntu3.7 libncurses5 6.1-1ubuntu1.18.04 libncursesw5 6.1-1ubuntu1.18.04 libnettle6 3.4.1-0ubuntu0.18.04.1 libp11-kit0 0.23.9-2ubuntu0.1 libpam-modules 1.1.8-3.6ubuntu2.18.04.6 libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6 libpam0g 1.1.8-3.6ubuntu2.18.04.6 libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php] libpng16-16 1.6.34-1ubuntu0.18.04.2 libprocps6 2:3.3.12-3ubuntu1.2 libseccomp2 2.5.1-1ubuntu1~18.04.2 libselinux1 2.7-2build2 libsemanage-common 2.7-2build2 libsemanage1 2.7-2build2 libsepol1 2.7-1ubuntu0.1 libsmartcols1 2.31.1-0.4ubuntu3.7 libssl1.1 1.1.1-1ubuntu2.1~18.04.21 libstdc++6 8.4.0-1ubuntu1~18.04 libsystemd0 237-3ubuntu10.57 libtasn1-6 4.13-2 libtext-charwidth-perl 0.04-7.1 libtext-iconv-perl 1.7-5build6 libtext-wrapi18n-perl 0.06-7.1 libtinfo5 6.1-1ubuntu1.18.04 libudev1 237-3ubuntu10.57 libunistring2 0.9.9-0ubuntu2 libuuid1 2.31.1-0.4ubuntu3.7 libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php] linux-base 4.5ubuntu1.7 linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1 lsb-base 9.20170808ubuntu1 openssl 1.1.1-1ubuntu2.1~18.04.21 os-prober 1.74ubuntu1 passwd 1:4.5-1ubuntu2.5 perl-base 5.26.1-6ubuntu0.6 procps 2:3.3.12-3ubuntu1.2 psmisc 23.1-1ubuntu0.1 sensible-utils 0.0.12 tar 1.29b-2ubuntu0.4 ubuntu-keyring 2018.09.18.1~18.04.2 ucf 3.0038 udev 237-3ubuntu10.57 util-linux 2.31.1-0.4ubuntu3.7 uuid-runtime 2.31.1-0.4ubuntu3.7 zlib1g 1:1.2.11.dfsg-0ubuntu2.2 DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0479a9a73c32d2448 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1 PackageArchitecture: amd64 ProcCpuinfoMinimal: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz stepping : 1 microcode : 0xb000040 cpu MHz : 2299.922 cache size : 46080 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data bogomips : 4600.03 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=C.UTF-8 SHELL=/bin/bash ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 SourcePackage: linux-signed-aws-5.4 Tags: third-party-packages bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) _MarkForUpload: True ``` We also need: 1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu Description: Ubuntu 18.04.6 LTS Release: 18.04 2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center linux-aws-5.4-tools-5.4.0-1097: Installed: (none) Candidate: 5.4.0-1097.105~18.04.1 Version table: 5.4.0-1097.105~18.04.1 500 500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages 3) What you expected to happen The OS to not hose the NFS server with metadata requests until something breaks 4) What happened instead The OS hosed the NFS server with metadata requests until it caused a production outage We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage. Diagnostics... `nfsiostat` shows 50-600 ops/s on the servers where it is happening. `nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y` `nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles; ``` CALL [ operations: 4 tag: void minor version: 1 [ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ] [ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ] [ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ] [ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ] REPLY [ operations: 4 status: OK tag: void [ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ] [ PUTFH(22) [ status: OK ] [ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ] [ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ] ``` `lsof -N` shows nothing using files from the NFS drive. I tried changing some of the mount options to reduce the amount of metadata requests going through to the NFS server and found that `lazytime` did help - observations are as below; Server 1: checked mount options - using relatime checked nfs stats - 460.982 ops/s updated mount options unmounted and mounted the NFS drive checked nfs stats - 1.400 ops/s rebooted checked nfs stats - 1.316 ops/s Server 2: checked mount options - using relatime checked nfs stats - 390.998 ops/s did not updated mount options unmounted and mounted the NFS drive checked nfs stats - 1.750 ops/s rebooted checked nfs stats - 531.932 ops/s Conclusion from this is that; - Unmounting and remounting the disk stops the issue temporarily - `lazytime` option seems to prevent the issue from happening - Issue may be boot related I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below; ``` Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot ``` This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`. I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount. Output from `ubuntu-bug linux` ``` ProblemType: Bug ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 Date: Fri Mar 17 16:29:14 2023 Dependencies: adduser 3.116ubuntu1 apt 1.6.14 apt-utils 1.6.14 busybox-initramfs 1:1.27.2-2ubuntu3.4 ca-certificates 20211016ubuntu0.18.04.1 coreutils 8.28-1ubuntu1 cpio 2.12+dfsg-6ubuntu0.18.04.4 debconf 1.5.66ubuntu1 debconf-i18n 1.5.66ubuntu1 dmsetup 2:1.02.145-4.1ubuntu3.18.04.3 dpkg 1.19.0.5ubuntu2.4 fdisk 2.31.1-0.4ubuntu3.7 gcc-8-base 8.4.0-1ubuntu1~18.04 gettext-base 0.19.8.1-6ubuntu0.3 gpgv 2.2.4-1ubuntu1.6 grub-common 2.02-2ubuntu8.25 grub-gfxpayload-lists 0.7 grub-pc 2.02-2ubuntu8.25 grub-pc-bin 2.02-2ubuntu8.25 grub2-common 2.02-2ubuntu8.25 init-system-helpers 1.51 initramfs-tools 0.130ubuntu3.13 initramfs-tools-bin 0.130ubuntu3.13 initramfs-tools-core 0.130ubuntu3.13 klibc-utils 2.0.4-9ubuntu2.2 kmod 24-1ubuntu3.5 libacl1 2.2.52-3build1 libapt-inst2.0 1.6.14 libapt-pkg5.0 1.6.14 libattr1 1:2.4.47-2build1 libaudit-common 1:2.8.2-1ubuntu1.1 libaudit1 1:2.8.2-1ubuntu1.1 libblkid1 2.31.1-0.4ubuntu3.7 libbz2-1.0 1.0.6-8.1ubuntu0.2 libc6 2.27-3ubuntu1.6 libcap-ng0 0.7.7-3.1 libdb5.3 5.3.28-13.1ubuntu1.1 libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3 libfdisk1 2.31.1-0.4ubuntu3.7 libffi6 3.2.1-8 libfreetype6 2.8.1-2ubuntu2.2 libfuse2 2.9.7-1ubuntu1 libgcc1 1:8.4.0-1ubuntu1~18.04 libgcrypt20 1.8.1-4ubuntu1.3 libgmp10 2:6.1.2+dfsg-2ubuntu0.1 libgnutls30 3.5.18-1ubuntu1.6 libgpg-error0 1.27-6 libgpm2 1.20.7-5 libhogweed4 3.4.1-0ubuntu0.18.04.1 libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php] libklibc 2.0.4-9ubuntu2.2 libkmod2 24-1ubuntu3.5 liblocale-gettext-perl 1.07-3build2 liblz4-1 0.0~r131-2ubuntu3.1 liblzma5 5.2.2-1.3ubuntu0.1 libmount1 2.31.1-0.4ubuntu3.7 libncurses5 6.1-1ubuntu1.18.04 libncursesw5 6.1-1ubuntu1.18.04 libnettle6 3.4.1-0ubuntu0.18.04.1 libp11-kit0 0.23.9-2ubuntu0.1 libpam-modules 1.1.8-3.6ubuntu2.18.04.6 libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6 libpam0g 1.1.8-3.6ubuntu2.18.04.6 libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php] libpng16-16 1.6.34-1ubuntu0.18.04.2 libprocps6 2:3.3.12-3ubuntu1.2 libseccomp2 2.5.1-1ubuntu1~18.04.2 libselinux1 2.7-2build2 libsemanage-common 2.7-2build2 libsemanage1 2.7-2build2 libsepol1 2.7-1ubuntu0.1 libsmartcols1 2.31.1-0.4ubuntu3.7 libssl1.1 1.1.1-1ubuntu2.1~18.04.21 libstdc++6 8.4.0-1ubuntu1~18.04 libsystemd0 237-3ubuntu10.57 libtasn1-6 4.13-2 libtext-charwidth-perl 0.04-7.1 libtext-iconv-perl 1.7-5build6 libtext-wrapi18n-perl 0.06-7.1 libtinfo5 6.1-1ubuntu1.18.04 libudev1 237-3ubuntu10.57 libunistring2 0.9.9-0ubuntu2 libuuid1 2.31.1-0.4ubuntu3.7 libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php] linux-base 4.5ubuntu1.7 linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1 lsb-base 9.20170808ubuntu1 openssl 1.1.1-1ubuntu2.1~18.04.21 os-prober 1.74ubuntu1 passwd 1:4.5-1ubuntu2.5 perl-base 5.26.1-6ubuntu0.6 procps 2:3.3.12-3ubuntu1.2 psmisc 23.1-1ubuntu0.1 sensible-utils 0.0.12 tar 1.29b-2ubuntu0.4 ubuntu-keyring 2018.09.18.1~18.04.2 ucf 3.0038 udev 237-3ubuntu10.57 util-linux 2.31.1-0.4ubuntu3.7 uuid-runtime 2.31.1-0.4ubuntu3.7 zlib1g 1:1.2.11.dfsg-0ubuntu2.2 DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0479a9a73c32d2448 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1 PackageArchitecture: amd64 ProcCpuinfoMinimal: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz stepping : 1 microcode : 0xb000040 cpu MHz : 2299.922 cache size : 46080 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data bogomips : 4600.03 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=C.UTF-8 SHELL=/bin/bash ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 SourcePackage: linux-signed-aws-5.4 Tags: third-party-packages bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) _MarkForUpload: True ``` We also need: 1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu Description: Ubuntu 18.04.6 LTS Release: 18.04 2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center linux-aws-5.4-tools-5.4.0-1097: Installed: (none) Candidate: 5.4.0-1097.105~18.04.1 Version table: 5.4.0-1097.105~18.04.1 500 500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages 3) What you expected to happen The OS to not hose the NFS server with metadata requests until something breaks 4) What happened instead The OS hosed the NFS server with metadata requests until it caused a production outage --- ProblemType: Bug AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' CRDA: N/A DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0b76c3b150c6b1423 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: MachineType: Xen HVM domU Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1097-aws root=UUID=cff99495-4abf-46b6-af32-df1478a88ffe ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 RelatedPackageVersions: linux-restricted-modules-5.4.0-1097-aws N/A linux-backports-modules-5.4.0-1097-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 08/24/2006 dmi.bios.vendor: Xen dmi.bios.version: 4.11.amazon dmi.chassis.type: 1 dmi.chassis.vendor: Xen dmi.modalias: dmi:bvnXen:bvr4.11.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.11.amazon:cvnXen:ct1:cvr: dmi.product.name: HVM domU dmi.product.version: 4.11.amazon dmi.sys.vendor: Xen
2023-03-17 04:09:05 AzT3K attachment added AudioDevicesInUse.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655003/+files/AudioDevicesInUse.txt
2023-03-17 04:09:08 AzT3K attachment added CurrentDmesg.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655004/+files/CurrentDmesg.txt
2023-03-17 04:09:09 AzT3K attachment added Lspci.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655005/+files/Lspci.txt
2023-03-17 04:09:11 AzT3K attachment added ProcCpuinfo.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655006/+files/ProcCpuinfo.txt
2023-03-17 04:09:12 AzT3K attachment added ProcCpuinfoMinimal.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655007/+files/ProcCpuinfoMinimal.txt
2023-03-17 04:09:14 AzT3K attachment added ProcInterrupts.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655008/+files/ProcInterrupts.txt
2023-03-17 04:09:16 AzT3K attachment added ProcModules.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655009/+files/ProcModules.txt
2023-03-17 04:09:19 AzT3K attachment added UdevDb.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655010/+files/UdevDb.txt
2023-03-17 04:09:22 AzT3K attachment added WifiSyslog.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655011/+files/WifiSyslog.txt
2023-03-17 04:10:43 AzT3K linux (Ubuntu): status Incomplete Confirmed
2023-03-17 04:12:50 AzT3K description We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage. Diagnostics... `nfsiostat` shows 50-600 ops/s on the servers where it is happening. `nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y` `nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles; ``` CALL [ operations: 4 tag: void minor version: 1 [ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ] [ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ] [ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ] [ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ] REPLY [ operations: 4 status: OK tag: void [ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ] [ PUTFH(22) [ status: OK ] [ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ] [ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ] ``` `lsof -N` shows nothing using files from the NFS drive. I tried changing some of the mount options to reduce the amount of metadata requests going through to the NFS server and found that `lazytime` did help - observations are as below; Server 1: checked mount options - using relatime checked nfs stats - 460.982 ops/s updated mount options unmounted and mounted the NFS drive checked nfs stats - 1.400 ops/s rebooted checked nfs stats - 1.316 ops/s Server 2: checked mount options - using relatime checked nfs stats - 390.998 ops/s did not updated mount options unmounted and mounted the NFS drive checked nfs stats - 1.750 ops/s rebooted checked nfs stats - 531.932 ops/s Conclusion from this is that; - Unmounting and remounting the disk stops the issue temporarily - `lazytime` option seems to prevent the issue from happening - Issue may be boot related I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below; ``` Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot ``` This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`. I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount. Output from `ubuntu-bug linux` ``` ProblemType: Bug ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 Date: Fri Mar 17 16:29:14 2023 Dependencies: adduser 3.116ubuntu1 apt 1.6.14 apt-utils 1.6.14 busybox-initramfs 1:1.27.2-2ubuntu3.4 ca-certificates 20211016ubuntu0.18.04.1 coreutils 8.28-1ubuntu1 cpio 2.12+dfsg-6ubuntu0.18.04.4 debconf 1.5.66ubuntu1 debconf-i18n 1.5.66ubuntu1 dmsetup 2:1.02.145-4.1ubuntu3.18.04.3 dpkg 1.19.0.5ubuntu2.4 fdisk 2.31.1-0.4ubuntu3.7 gcc-8-base 8.4.0-1ubuntu1~18.04 gettext-base 0.19.8.1-6ubuntu0.3 gpgv 2.2.4-1ubuntu1.6 grub-common 2.02-2ubuntu8.25 grub-gfxpayload-lists 0.7 grub-pc 2.02-2ubuntu8.25 grub-pc-bin 2.02-2ubuntu8.25 grub2-common 2.02-2ubuntu8.25 init-system-helpers 1.51 initramfs-tools 0.130ubuntu3.13 initramfs-tools-bin 0.130ubuntu3.13 initramfs-tools-core 0.130ubuntu3.13 klibc-utils 2.0.4-9ubuntu2.2 kmod 24-1ubuntu3.5 libacl1 2.2.52-3build1 libapt-inst2.0 1.6.14 libapt-pkg5.0 1.6.14 libattr1 1:2.4.47-2build1 libaudit-common 1:2.8.2-1ubuntu1.1 libaudit1 1:2.8.2-1ubuntu1.1 libblkid1 2.31.1-0.4ubuntu3.7 libbz2-1.0 1.0.6-8.1ubuntu0.2 libc6 2.27-3ubuntu1.6 libcap-ng0 0.7.7-3.1 libdb5.3 5.3.28-13.1ubuntu1.1 libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3 libfdisk1 2.31.1-0.4ubuntu3.7 libffi6 3.2.1-8 libfreetype6 2.8.1-2ubuntu2.2 libfuse2 2.9.7-1ubuntu1 libgcc1 1:8.4.0-1ubuntu1~18.04 libgcrypt20 1.8.1-4ubuntu1.3 libgmp10 2:6.1.2+dfsg-2ubuntu0.1 libgnutls30 3.5.18-1ubuntu1.6 libgpg-error0 1.27-6 libgpm2 1.20.7-5 libhogweed4 3.4.1-0ubuntu0.18.04.1 libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php] libklibc 2.0.4-9ubuntu2.2 libkmod2 24-1ubuntu3.5 liblocale-gettext-perl 1.07-3build2 liblz4-1 0.0~r131-2ubuntu3.1 liblzma5 5.2.2-1.3ubuntu0.1 libmount1 2.31.1-0.4ubuntu3.7 libncurses5 6.1-1ubuntu1.18.04 libncursesw5 6.1-1ubuntu1.18.04 libnettle6 3.4.1-0ubuntu0.18.04.1 libp11-kit0 0.23.9-2ubuntu0.1 libpam-modules 1.1.8-3.6ubuntu2.18.04.6 libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6 libpam0g 1.1.8-3.6ubuntu2.18.04.6 libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php] libpng16-16 1.6.34-1ubuntu0.18.04.2 libprocps6 2:3.3.12-3ubuntu1.2 libseccomp2 2.5.1-1ubuntu1~18.04.2 libselinux1 2.7-2build2 libsemanage-common 2.7-2build2 libsemanage1 2.7-2build2 libsepol1 2.7-1ubuntu0.1 libsmartcols1 2.31.1-0.4ubuntu3.7 libssl1.1 1.1.1-1ubuntu2.1~18.04.21 libstdc++6 8.4.0-1ubuntu1~18.04 libsystemd0 237-3ubuntu10.57 libtasn1-6 4.13-2 libtext-charwidth-perl 0.04-7.1 libtext-iconv-perl 1.7-5build6 libtext-wrapi18n-perl 0.06-7.1 libtinfo5 6.1-1ubuntu1.18.04 libudev1 237-3ubuntu10.57 libunistring2 0.9.9-0ubuntu2 libuuid1 2.31.1-0.4ubuntu3.7 libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php] linux-base 4.5ubuntu1.7 linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1 lsb-base 9.20170808ubuntu1 openssl 1.1.1-1ubuntu2.1~18.04.21 os-prober 1.74ubuntu1 passwd 1:4.5-1ubuntu2.5 perl-base 5.26.1-6ubuntu0.6 procps 2:3.3.12-3ubuntu1.2 psmisc 23.1-1ubuntu0.1 sensible-utils 0.0.12 tar 1.29b-2ubuntu0.4 ubuntu-keyring 2018.09.18.1~18.04.2 ucf 3.0038 udev 237-3ubuntu10.57 util-linux 2.31.1-0.4ubuntu3.7 uuid-runtime 2.31.1-0.4ubuntu3.7 zlib1g 1:1.2.11.dfsg-0ubuntu2.2 DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0479a9a73c32d2448 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1 PackageArchitecture: amd64 ProcCpuinfoMinimal: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz stepping : 1 microcode : 0xb000040 cpu MHz : 2299.922 cache size : 46080 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data bogomips : 4600.03 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=C.UTF-8 SHELL=/bin/bash ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 SourcePackage: linux-signed-aws-5.4 Tags: third-party-packages bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) _MarkForUpload: True ``` We also need: 1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu Description: Ubuntu 18.04.6 LTS Release: 18.04 2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center linux-aws-5.4-tools-5.4.0-1097: Installed: (none) Candidate: 5.4.0-1097.105~18.04.1 Version table: 5.4.0-1097.105~18.04.1 500 500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages 3) What you expected to happen The OS to not hose the NFS server with metadata requests until something breaks 4) What happened instead The OS hosed the NFS server with metadata requests until it caused a production outage --- ProblemType: Bug AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' CRDA: N/A DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0b76c3b150c6b1423 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: MachineType: Xen HVM domU Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=C.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1097-aws root=UUID=cff99495-4abf-46b6-af32-df1478a88ffe ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 RelatedPackageVersions: linux-restricted-modules-5.4.0-1097-aws N/A linux-backports-modules-5.4.0-1097-aws N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 08/24/2006 dmi.bios.vendor: Xen dmi.bios.version: 4.11.amazon dmi.chassis.type: 1 dmi.chassis.vendor: Xen dmi.modalias: dmi:bvnXen:bvr4.11.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.11.amazon:cvnXen:ct1:cvr: dmi.product.name: HVM domU dmi.product.version: 4.11.amazon dmi.sys.vendor: Xen We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage. Diagnostics... `nfsiostat` shows 50-600 ops/s on the servers where it is happening. `nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y` `nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles; ```    CALL [ operations: 4 tag: void minor version: 1        [ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ]        [ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ]        [ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ]        [ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]    REPLY [ operations: 4 status: OK tag: void        [ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ]        [ PUTFH(22) [ status: OK ]        [ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ]        [ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ] ``` `lsof -N` shows nothing using files from the NFS drive. Unmounting and remounting the disk stops the issue temporarily I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below; ``` Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot ``` This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`. I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount. Output from `ubuntu-bug linux` ``` ProblemType: Bug ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 Date: Fri Mar 17 16:29:14 2023 Dependencies:  adduser 3.116ubuntu1  apt 1.6.14  apt-utils 1.6.14  busybox-initramfs 1:1.27.2-2ubuntu3.4  ca-certificates 20211016ubuntu0.18.04.1  coreutils 8.28-1ubuntu1  cpio 2.12+dfsg-6ubuntu0.18.04.4  debconf 1.5.66ubuntu1  debconf-i18n 1.5.66ubuntu1  dmsetup 2:1.02.145-4.1ubuntu3.18.04.3  dpkg 1.19.0.5ubuntu2.4  fdisk 2.31.1-0.4ubuntu3.7  gcc-8-base 8.4.0-1ubuntu1~18.04  gettext-base 0.19.8.1-6ubuntu0.3  gpgv 2.2.4-1ubuntu1.6  grub-common 2.02-2ubuntu8.25  grub-gfxpayload-lists 0.7  grub-pc 2.02-2ubuntu8.25  grub-pc-bin 2.02-2ubuntu8.25  grub2-common 2.02-2ubuntu8.25  init-system-helpers 1.51  initramfs-tools 0.130ubuntu3.13  initramfs-tools-bin 0.130ubuntu3.13  initramfs-tools-core 0.130ubuntu3.13  klibc-utils 2.0.4-9ubuntu2.2  kmod 24-1ubuntu3.5  libacl1 2.2.52-3build1  libapt-inst2.0 1.6.14  libapt-pkg5.0 1.6.14  libattr1 1:2.4.47-2build1  libaudit-common 1:2.8.2-1ubuntu1.1  libaudit1 1:2.8.2-1ubuntu1.1  libblkid1 2.31.1-0.4ubuntu3.7  libbz2-1.0 1.0.6-8.1ubuntu0.2  libc6 2.27-3ubuntu1.6  libcap-ng0 0.7.7-3.1  libdb5.3 5.3.28-13.1ubuntu1.1  libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3  libfdisk1 2.31.1-0.4ubuntu3.7  libffi6 3.2.1-8  libfreetype6 2.8.1-2ubuntu2.2  libfuse2 2.9.7-1ubuntu1  libgcc1 1:8.4.0-1ubuntu1~18.04  libgcrypt20 1.8.1-4ubuntu1.3  libgmp10 2:6.1.2+dfsg-2ubuntu0.1  libgnutls30 3.5.18-1ubuntu1.6  libgpg-error0 1.27-6  libgpm2 1.20.7-5  libhogweed4 3.4.1-0ubuntu0.18.04.1  libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]  libklibc 2.0.4-9ubuntu2.2  libkmod2 24-1ubuntu3.5  liblocale-gettext-perl 1.07-3build2  liblz4-1 0.0~r131-2ubuntu3.1  liblzma5 5.2.2-1.3ubuntu0.1  libmount1 2.31.1-0.4ubuntu3.7  libncurses5 6.1-1ubuntu1.18.04  libncursesw5 6.1-1ubuntu1.18.04  libnettle6 3.4.1-0ubuntu0.18.04.1  libp11-kit0 0.23.9-2ubuntu0.1  libpam-modules 1.1.8-3.6ubuntu2.18.04.6  libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6  libpam0g 1.1.8-3.6ubuntu2.18.04.6  libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]  libpng16-16 1.6.34-1ubuntu0.18.04.2  libprocps6 2:3.3.12-3ubuntu1.2  libseccomp2 2.5.1-1ubuntu1~18.04.2  libselinux1 2.7-2build2  libsemanage-common 2.7-2build2  libsemanage1 2.7-2build2  libsepol1 2.7-1ubuntu0.1  libsmartcols1 2.31.1-0.4ubuntu3.7  libssl1.1 1.1.1-1ubuntu2.1~18.04.21  libstdc++6 8.4.0-1ubuntu1~18.04  libsystemd0 237-3ubuntu10.57  libtasn1-6 4.13-2  libtext-charwidth-perl 0.04-7.1  libtext-iconv-perl 1.7-5build6  libtext-wrapi18n-perl 0.06-7.1  libtinfo5 6.1-1ubuntu1.18.04  libudev1 237-3ubuntu10.57  libunistring2 0.9.9-0ubuntu2  libuuid1 2.31.1-0.4ubuntu3.7  libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php]  linux-base 4.5ubuntu1.7  linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1  lsb-base 9.20170808ubuntu1  openssl 1.1.1-1ubuntu2.1~18.04.21  os-prober 1.74ubuntu1  passwd 1:4.5-1ubuntu2.5  perl-base 5.26.1-6ubuntu0.6  procps 2:3.3.12-3ubuntu1.2  psmisc 23.1-1ubuntu0.1  sensible-utils 0.0.12  tar 1.29b-2ubuntu0.4  ubuntu-keyring 2018.09.18.1~18.04.2  ucf 3.0038  udev 237-3ubuntu10.57  util-linux 2.31.1-0.4ubuntu3.7  uuid-runtime 2.31.1-0.4ubuntu3.7  zlib1g 1:1.2.11.dfsg-0ubuntu2.2 DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0479a9a73c32d2448 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1 PackageArchitecture: amd64 ProcCpuinfoMinimal:  processor : 1  vendor_id : GenuineIntel  cpu family : 6  model : 79  model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz  stepping : 1  microcode : 0xb000040  cpu MHz : 2299.922  cache size : 46080 KB  physical id : 0  siblings : 2  core id : 1  cpu cores : 2  apicid : 2  initial apicid : 2  fpu : yes  fpu_exception : yes  cpuid level : 13  wp : yes  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt  bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data  bogomips : 4600.03  clflush size : 64  cache_alignment : 64  address sizes : 46 bits physical, 48 bits virtual  power management: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  XDG_RUNTIME_DIR=<set>  LANG=C.UTF-8  SHELL=/bin/bash ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 SourcePackage: linux-signed-aws-5.4 Tags: third-party-packages bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) _MarkForUpload: True ``` We also need: 1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu Description: Ubuntu 18.04.6 LTS Release: 18.04 2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center linux-aws-5.4-tools-5.4.0-1097:   Installed: (none)   Candidate: 5.4.0-1097.105~18.04.1   Version table:      5.4.0-1097.105~18.04.1 500         500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages         500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages 3) What you expected to happen The OS to not hose the NFS server with metadata requests until something breaks 4) What happened instead The OS hosed the NFS server with metadata requests until it caused a production outage --- ProblemType: Bug AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' CRDA: N/A DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0b76c3b150c6b1423 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: MachineType: Xen HVM domU Package: linux (not installed) PciMultimedia: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  XDG_RUNTIME_DIR=<set>  LANG=C.UTF-8  SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1097-aws root=UUID=cff99495-4abf-46b6-af32-df1478a88ffe ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 RelatedPackageVersions:  linux-restricted-modules-5.4.0-1097-aws N/A  linux-backports-modules-5.4.0-1097-aws N/A  linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 08/24/2006 dmi.bios.vendor: Xen dmi.bios.version: 4.11.amazon dmi.chassis.type: 1 dmi.chassis.vendor: Xen dmi.modalias: dmi:bvnXen:bvr4.11.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.11.amazon:cvnXen:ct1:cvr: dmi.product.name: HVM domU dmi.product.version: 4.11.amazon dmi.sys.vendor: Xen
2023-03-19 19:31:46 AzT3K description We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage. Diagnostics... `nfsiostat` shows 50-600 ops/s on the servers where it is happening. `nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y` `nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles; ```    CALL [ operations: 4 tag: void minor version: 1        [ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ]        [ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ]        [ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ]        [ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]    REPLY [ operations: 4 status: OK tag: void        [ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ]        [ PUTFH(22) [ status: OK ]        [ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ]        [ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ] ``` `lsof -N` shows nothing using files from the NFS drive. Unmounting and remounting the disk stops the issue temporarily I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below; ``` Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot ``` This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`. I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount. Output from `ubuntu-bug linux` ``` ProblemType: Bug ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 Date: Fri Mar 17 16:29:14 2023 Dependencies:  adduser 3.116ubuntu1  apt 1.6.14  apt-utils 1.6.14  busybox-initramfs 1:1.27.2-2ubuntu3.4  ca-certificates 20211016ubuntu0.18.04.1  coreutils 8.28-1ubuntu1  cpio 2.12+dfsg-6ubuntu0.18.04.4  debconf 1.5.66ubuntu1  debconf-i18n 1.5.66ubuntu1  dmsetup 2:1.02.145-4.1ubuntu3.18.04.3  dpkg 1.19.0.5ubuntu2.4  fdisk 2.31.1-0.4ubuntu3.7  gcc-8-base 8.4.0-1ubuntu1~18.04  gettext-base 0.19.8.1-6ubuntu0.3  gpgv 2.2.4-1ubuntu1.6  grub-common 2.02-2ubuntu8.25  grub-gfxpayload-lists 0.7  grub-pc 2.02-2ubuntu8.25  grub-pc-bin 2.02-2ubuntu8.25  grub2-common 2.02-2ubuntu8.25  init-system-helpers 1.51  initramfs-tools 0.130ubuntu3.13  initramfs-tools-bin 0.130ubuntu3.13  initramfs-tools-core 0.130ubuntu3.13  klibc-utils 2.0.4-9ubuntu2.2  kmod 24-1ubuntu3.5  libacl1 2.2.52-3build1  libapt-inst2.0 1.6.14  libapt-pkg5.0 1.6.14  libattr1 1:2.4.47-2build1  libaudit-common 1:2.8.2-1ubuntu1.1  libaudit1 1:2.8.2-1ubuntu1.1  libblkid1 2.31.1-0.4ubuntu3.7  libbz2-1.0 1.0.6-8.1ubuntu0.2  libc6 2.27-3ubuntu1.6  libcap-ng0 0.7.7-3.1  libdb5.3 5.3.28-13.1ubuntu1.1  libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3  libfdisk1 2.31.1-0.4ubuntu3.7  libffi6 3.2.1-8  libfreetype6 2.8.1-2ubuntu2.2  libfuse2 2.9.7-1ubuntu1  libgcc1 1:8.4.0-1ubuntu1~18.04  libgcrypt20 1.8.1-4ubuntu1.3  libgmp10 2:6.1.2+dfsg-2ubuntu0.1  libgnutls30 3.5.18-1ubuntu1.6  libgpg-error0 1.27-6  libgpm2 1.20.7-5  libhogweed4 3.4.1-0ubuntu0.18.04.1  libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]  libklibc 2.0.4-9ubuntu2.2  libkmod2 24-1ubuntu3.5  liblocale-gettext-perl 1.07-3build2  liblz4-1 0.0~r131-2ubuntu3.1  liblzma5 5.2.2-1.3ubuntu0.1  libmount1 2.31.1-0.4ubuntu3.7  libncurses5 6.1-1ubuntu1.18.04  libncursesw5 6.1-1ubuntu1.18.04  libnettle6 3.4.1-0ubuntu0.18.04.1  libp11-kit0 0.23.9-2ubuntu0.1  libpam-modules 1.1.8-3.6ubuntu2.18.04.6  libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6  libpam0g 1.1.8-3.6ubuntu2.18.04.6  libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]  libpng16-16 1.6.34-1ubuntu0.18.04.2  libprocps6 2:3.3.12-3ubuntu1.2  libseccomp2 2.5.1-1ubuntu1~18.04.2  libselinux1 2.7-2build2  libsemanage-common 2.7-2build2  libsemanage1 2.7-2build2  libsepol1 2.7-1ubuntu0.1  libsmartcols1 2.31.1-0.4ubuntu3.7  libssl1.1 1.1.1-1ubuntu2.1~18.04.21  libstdc++6 8.4.0-1ubuntu1~18.04  libsystemd0 237-3ubuntu10.57  libtasn1-6 4.13-2  libtext-charwidth-perl 0.04-7.1  libtext-iconv-perl 1.7-5build6  libtext-wrapi18n-perl 0.06-7.1  libtinfo5 6.1-1ubuntu1.18.04  libudev1 237-3ubuntu10.57  libunistring2 0.9.9-0ubuntu2  libuuid1 2.31.1-0.4ubuntu3.7  libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php]  linux-base 4.5ubuntu1.7  linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1  lsb-base 9.20170808ubuntu1  openssl 1.1.1-1ubuntu2.1~18.04.21  os-prober 1.74ubuntu1  passwd 1:4.5-1ubuntu2.5  perl-base 5.26.1-6ubuntu0.6  procps 2:3.3.12-3ubuntu1.2  psmisc 23.1-1ubuntu0.1  sensible-utils 0.0.12  tar 1.29b-2ubuntu0.4  ubuntu-keyring 2018.09.18.1~18.04.2  ucf 3.0038  udev 237-3ubuntu10.57  util-linux 2.31.1-0.4ubuntu3.7  uuid-runtime 2.31.1-0.4ubuntu3.7  zlib1g 1:1.2.11.dfsg-0ubuntu2.2 DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0479a9a73c32d2448 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1 PackageArchitecture: amd64 ProcCpuinfoMinimal:  processor : 1  vendor_id : GenuineIntel  cpu family : 6  model : 79  model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz  stepping : 1  microcode : 0xb000040  cpu MHz : 2299.922  cache size : 46080 KB  physical id : 0  siblings : 2  core id : 1  cpu cores : 2  apicid : 2  initial apicid : 2  fpu : yes  fpu_exception : yes  cpuid level : 13  wp : yes  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt  bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data  bogomips : 4600.03  clflush size : 64  cache_alignment : 64  address sizes : 46 bits physical, 48 bits virtual  power management: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  XDG_RUNTIME_DIR=<set>  LANG=C.UTF-8  SHELL=/bin/bash ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 SourcePackage: linux-signed-aws-5.4 Tags: third-party-packages bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) _MarkForUpload: True ``` We also need: 1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu Description: Ubuntu 18.04.6 LTS Release: 18.04 2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center linux-aws-5.4-tools-5.4.0-1097:   Installed: (none)   Candidate: 5.4.0-1097.105~18.04.1   Version table:      5.4.0-1097.105~18.04.1 500         500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages         500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages 3) What you expected to happen The OS to not hose the NFS server with metadata requests until something breaks 4) What happened instead The OS hosed the NFS server with metadata requests until it caused a production outage --- ProblemType: Bug AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' CRDA: N/A DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0b76c3b150c6b1423 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: MachineType: Xen HVM domU Package: linux (not installed) PciMultimedia: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  XDG_RUNTIME_DIR=<set>  LANG=C.UTF-8  SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1097-aws root=UUID=cff99495-4abf-46b6-af32-df1478a88ffe ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 RelatedPackageVersions:  linux-restricted-modules-5.4.0-1097-aws N/A  linux-backports-modules-5.4.0-1097-aws N/A  linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 08/24/2006 dmi.bios.vendor: Xen dmi.bios.version: 4.11.amazon dmi.chassis.type: 1 dmi.chassis.vendor: Xen dmi.modalias: dmi:bvnXen:bvr4.11.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.11.amazon:cvnXen:ct1:cvr: dmi.product.name: HVM domU dmi.product.version: 4.11.amazon dmi.sys.vendor: Xen We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage. Diagnostics... `nfsiostat` shows 50-600 ops/s on the servers where it is happening. `nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y` `nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles; ```    CALL [ operations: 4 tag: void minor version: 1        [ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ]        [ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ]        [ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ]        [ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]    REPLY [ operations: 4 status: OK tag: void        [ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ]        [ PUTFH(22) [ status: OK ]        [ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ]        [ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ] ``` `lsof -N` shows nothing using files from the NFS drive. Unmounting and remounting the disk stops the issue temporarily I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below; ``` Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0) Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0 Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot ``` This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`. I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount. Output from `ubuntu-bug linux` ``` ProblemType: Bug ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 Date: Fri Mar 17 16:29:14 2023 Dependencies:  adduser 3.116ubuntu1  apt 1.6.14  apt-utils 1.6.14  busybox-initramfs 1:1.27.2-2ubuntu3.4  ca-certificates 20211016ubuntu0.18.04.1  coreutils 8.28-1ubuntu1  cpio 2.12+dfsg-6ubuntu0.18.04.4  debconf 1.5.66ubuntu1  debconf-i18n 1.5.66ubuntu1  dmsetup 2:1.02.145-4.1ubuntu3.18.04.3  dpkg 1.19.0.5ubuntu2.4  fdisk 2.31.1-0.4ubuntu3.7  gcc-8-base 8.4.0-1ubuntu1~18.04  gettext-base 0.19.8.1-6ubuntu0.3  gpgv 2.2.4-1ubuntu1.6  grub-common 2.02-2ubuntu8.25  grub-gfxpayload-lists 0.7  grub-pc 2.02-2ubuntu8.25  grub-pc-bin 2.02-2ubuntu8.25  grub2-common 2.02-2ubuntu8.25  init-system-helpers 1.51  initramfs-tools 0.130ubuntu3.13  initramfs-tools-bin 0.130ubuntu3.13  initramfs-tools-core 0.130ubuntu3.13  klibc-utils 2.0.4-9ubuntu2.2  kmod 24-1ubuntu3.5  libacl1 2.2.52-3build1  libapt-inst2.0 1.6.14  libapt-pkg5.0 1.6.14  libattr1 1:2.4.47-2build1  libaudit-common 1:2.8.2-1ubuntu1.1  libaudit1 1:2.8.2-1ubuntu1.1  libblkid1 2.31.1-0.4ubuntu3.7  libbz2-1.0 1.0.6-8.1ubuntu0.2  libc6 2.27-3ubuntu1.6  libcap-ng0 0.7.7-3.1  libdb5.3 5.3.28-13.1ubuntu1.1  libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3  libfdisk1 2.31.1-0.4ubuntu3.7  libffi6 3.2.1-8  libfreetype6 2.8.1-2ubuntu2.2  libfuse2 2.9.7-1ubuntu1  libgcc1 1:8.4.0-1ubuntu1~18.04  libgcrypt20 1.8.1-4ubuntu1.3  libgmp10 2:6.1.2+dfsg-2ubuntu0.1  libgnutls30 3.5.18-1ubuntu1.6  libgpg-error0 1.27-6  libgpm2 1.20.7-5  libhogweed4 3.4.1-0ubuntu0.18.04.1  libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]  libklibc 2.0.4-9ubuntu2.2  libkmod2 24-1ubuntu3.5  liblocale-gettext-perl 1.07-3build2  liblz4-1 0.0~r131-2ubuntu3.1  liblzma5 5.2.2-1.3ubuntu0.1  libmount1 2.31.1-0.4ubuntu3.7  libncurses5 6.1-1ubuntu1.18.04  libncursesw5 6.1-1ubuntu1.18.04  libnettle6 3.4.1-0ubuntu0.18.04.1  libp11-kit0 0.23.9-2ubuntu0.1  libpam-modules 1.1.8-3.6ubuntu2.18.04.6  libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6  libpam0g 1.1.8-3.6ubuntu2.18.04.6  libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]  libpng16-16 1.6.34-1ubuntu0.18.04.2  libprocps6 2:3.3.12-3ubuntu1.2  libseccomp2 2.5.1-1ubuntu1~18.04.2  libselinux1 2.7-2build2  libsemanage-common 2.7-2build2  libsemanage1 2.7-2build2  libsepol1 2.7-1ubuntu0.1  libsmartcols1 2.31.1-0.4ubuntu3.7  libssl1.1 1.1.1-1ubuntu2.1~18.04.21  libstdc++6 8.4.0-1ubuntu1~18.04  libsystemd0 237-3ubuntu10.57  libtasn1-6 4.13-2  libtext-charwidth-perl 0.04-7.1  libtext-iconv-perl 1.7-5build6  libtext-wrapi18n-perl 0.06-7.1  libtinfo5 6.1-1ubuntu1.18.04  libudev1 237-3ubuntu10.57  libunistring2 0.9.9-0ubuntu2  libuuid1 2.31.1-0.4ubuntu3.7  libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php]  linux-base 4.5ubuntu1.7  linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1  lsb-base 9.20170808ubuntu1  openssl 1.1.1-1ubuntu2.1~18.04.21  os-prober 1.74ubuntu1  passwd 1:4.5-1ubuntu2.5  perl-base 5.26.1-6ubuntu0.6  procps 2:3.3.12-3ubuntu1.2  psmisc 23.1-1ubuntu0.1  sensible-utils 0.0.12  tar 1.29b-2ubuntu0.4  ubuntu-keyring 2018.09.18.1~18.04.2  ucf 3.0038  udev 237-3ubuntu10.57  util-linux 2.31.1-0.4ubuntu3.7  uuid-runtime 2.31.1-0.4ubuntu3.7  zlib1g 1:1.2.11.dfsg-0ubuntu2.2 DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0479a9a73c32d2448 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1 PackageArchitecture: amd64 ProcCpuinfoMinimal:  processor : 1  vendor_id : GenuineIntel  cpu family : 6  model : 79  model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz  stepping : 1  microcode : 0xb000040  cpu MHz : 2299.922  cache size : 46080 KB  physical id : 0  siblings : 2  core id : 1  cpu cores : 2  apicid : 2  initial apicid : 2  fpu : yes  fpu_exception : yes  cpuid level : 13  wp : yes  flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt  bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data  bogomips : 4600.03  clflush size : 64  cache_alignment : 64  address sizes : 46 bits physical, 48 bits virtual  power management: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  XDG_RUNTIME_DIR=<set>  LANG=C.UTF-8  SHELL=/bin/bash ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 SourcePackage: linux-signed-aws-5.4 Tags: third-party-packages bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) _MarkForUpload: True ``` We also need: 1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu Description: Ubuntu 18.04.6 LTS Release: 18.04 2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center linux-aws: Installed: 5.4.0.1097.75 Candidate: 5.4.0.1097.75 Version table: *** 5.4.0.1097.75 500 500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages 100 /var/lib/dpkg/status 4.15.0.1007.7 500 500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic/main amd64 Packages 3) What you expected to happen The OS to not hose the NFS server with metadata requests until something breaks 4) What happened instead The OS hosed the NFS server with metadata requests until it caused a production outage --- ProblemType: Bug AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.28 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' CRDA: N/A DistroRelease: Ubuntu 18.04 Ec2AMI: ami-0b76c3b150c6b1423 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: ap-southeast-2c Ec2InstanceType: t2.medium Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Error: command ['lsusb'] failed with exit code 1: MachineType: Xen HVM domU Package: linux (not installed) PciMultimedia: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  XDG_RUNTIME_DIR=<set>  LANG=C.UTF-8  SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1097-aws root=UUID=cff99495-4abf-46b6-af32-df1478a88ffe ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229 RelatedPackageVersions:  linux-restricted-modules-5.4.0-1097-aws N/A  linux-backports-modules-5.4.0-1097-aws N/A  linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic ec2-images Uname: Linux 5.4.0-1097-aws x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 08/24/2006 dmi.bios.vendor: Xen dmi.bios.version: 4.11.amazon dmi.chassis.type: 1 dmi.chassis.vendor: Xen dmi.modalias: dmi:bvnXen:bvr4.11.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.11.amazon:cvnXen:ct1:cvr: dmi.product.name: HVM domU dmi.product.version: 4.11.amazon dmi.sys.vendor: Xen
2023-03-21 04:00:22 AzT3K bug task added linux-aws-5.4 (Ubuntu)
2023-03-21 04:00:42 AzT3K bug task added linux-aws (Ubuntu)
2023-03-22 02:35:22 AzT3K linux-aws (Ubuntu): status New Confirmed
2023-03-22 02:35:24 AzT3K linux-aws-5.4 (Ubuntu): status New Confirmed
2023-03-22 02:57:43 AzT3K bug task added linux-signed-aws-5.4 (Ubuntu)
2023-03-22 02:57:50 AzT3K linux-signed-aws-5.4 (Ubuntu): status New Confirmed
2023-03-23 03:13:11 AzT3K marked as duplicate 2009325