2023-03-17 03:34:23 |
AzT3K |
bug |
|
|
added bug |
2023-03-17 04:00:06 |
Ubuntu Kernel Bot |
linux (Ubuntu): status |
New |
Incomplete |
|
2023-03-17 04:09:02 |
AzT3K |
tags |
|
apport-collected bionic ec2-images |
|
2023-03-17 04:09:04 |
AzT3K |
description |
We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage.
Diagnostics...
`nfsiostat` shows 50-600 ops/s on the servers where it is happening.
`nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y`
`nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles;
```
CALL [ operations: 4 tag: void minor version: 1
[ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ]
[ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ]
[ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
REPLY [ operations: 4 status: OK tag: void
[ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ]
[ PUTFH(22) [ status: OK ]
[ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
```
`lsof -N` shows nothing using files from the NFS drive.
I tried changing some of the mount options to reduce the amount of metadata requests going through to the NFS server and found that `lazytime` did help - observations are as below;
Server 1:
checked mount options - using relatime
checked nfs stats - 460.982 ops/s
updated mount options
unmounted and mounted the NFS drive
checked nfs stats - 1.400 ops/s
rebooted
checked nfs stats - 1.316 ops/s
Server 2:
checked mount options - using relatime
checked nfs stats - 390.998 ops/s
did not updated mount options
unmounted and mounted the NFS drive
checked nfs stats - 1.750 ops/s
rebooted
checked nfs stats - 531.932 ops/s
Conclusion from this is that;
- Unmounting and remounting the disk stops the issue temporarily
- `lazytime` option seems to prevent the issue from happening
- Issue may be boot related
I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below;
```
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot
```
This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`.
I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount.
Output from `ubuntu-bug linux`
```
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
Date: Fri Mar 17 16:29:14 2023
Dependencies:
adduser 3.116ubuntu1
apt 1.6.14
apt-utils 1.6.14
busybox-initramfs 1:1.27.2-2ubuntu3.4
ca-certificates 20211016ubuntu0.18.04.1
coreutils 8.28-1ubuntu1
cpio 2.12+dfsg-6ubuntu0.18.04.4
debconf 1.5.66ubuntu1
debconf-i18n 1.5.66ubuntu1
dmsetup 2:1.02.145-4.1ubuntu3.18.04.3
dpkg 1.19.0.5ubuntu2.4
fdisk 2.31.1-0.4ubuntu3.7
gcc-8-base 8.4.0-1ubuntu1~18.04
gettext-base 0.19.8.1-6ubuntu0.3
gpgv 2.2.4-1ubuntu1.6
grub-common 2.02-2ubuntu8.25
grub-gfxpayload-lists 0.7
grub-pc 2.02-2ubuntu8.25
grub-pc-bin 2.02-2ubuntu8.25
grub2-common 2.02-2ubuntu8.25
init-system-helpers 1.51
initramfs-tools 0.130ubuntu3.13
initramfs-tools-bin 0.130ubuntu3.13
initramfs-tools-core 0.130ubuntu3.13
klibc-utils 2.0.4-9ubuntu2.2
kmod 24-1ubuntu3.5
libacl1 2.2.52-3build1
libapt-inst2.0 1.6.14
libapt-pkg5.0 1.6.14
libattr1 1:2.4.47-2build1
libaudit-common 1:2.8.2-1ubuntu1.1
libaudit1 1:2.8.2-1ubuntu1.1
libblkid1 2.31.1-0.4ubuntu3.7
libbz2-1.0 1.0.6-8.1ubuntu0.2
libc6 2.27-3ubuntu1.6
libcap-ng0 0.7.7-3.1
libdb5.3 5.3.28-13.1ubuntu1.1
libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3
libfdisk1 2.31.1-0.4ubuntu3.7
libffi6 3.2.1-8
libfreetype6 2.8.1-2ubuntu2.2
libfuse2 2.9.7-1ubuntu1
libgcc1 1:8.4.0-1ubuntu1~18.04
libgcrypt20 1.8.1-4ubuntu1.3
libgmp10 2:6.1.2+dfsg-2ubuntu0.1
libgnutls30 3.5.18-1ubuntu1.6
libgpg-error0 1.27-6
libgpm2 1.20.7-5
libhogweed4 3.4.1-0ubuntu0.18.04.1
libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]
libklibc 2.0.4-9ubuntu2.2
libkmod2 24-1ubuntu3.5
liblocale-gettext-perl 1.07-3build2
liblz4-1 0.0~r131-2ubuntu3.1
liblzma5 5.2.2-1.3ubuntu0.1
libmount1 2.31.1-0.4ubuntu3.7
libncurses5 6.1-1ubuntu1.18.04
libncursesw5 6.1-1ubuntu1.18.04
libnettle6 3.4.1-0ubuntu0.18.04.1
libp11-kit0 0.23.9-2ubuntu0.1
libpam-modules 1.1.8-3.6ubuntu2.18.04.6
libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6
libpam0g 1.1.8-3.6ubuntu2.18.04.6
libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]
libpng16-16 1.6.34-1ubuntu0.18.04.2
libprocps6 2:3.3.12-3ubuntu1.2
libseccomp2 2.5.1-1ubuntu1~18.04.2
libselinux1 2.7-2build2
libsemanage-common 2.7-2build2
libsemanage1 2.7-2build2
libsepol1 2.7-1ubuntu0.1
libsmartcols1 2.31.1-0.4ubuntu3.7
libssl1.1 1.1.1-1ubuntu2.1~18.04.21
libstdc++6 8.4.0-1ubuntu1~18.04
libsystemd0 237-3ubuntu10.57
libtasn1-6 4.13-2
libtext-charwidth-perl 0.04-7.1
libtext-iconv-perl 1.7-5build6
libtext-wrapi18n-perl 0.06-7.1
libtinfo5 6.1-1ubuntu1.18.04
libudev1 237-3ubuntu10.57
libunistring2 0.9.9-0ubuntu2
libuuid1 2.31.1-0.4ubuntu3.7
libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php]
linux-base 4.5ubuntu1.7
linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
lsb-base 9.20170808ubuntu1
openssl 1.1.1-1ubuntu2.1~18.04.21
os-prober 1.74ubuntu1
passwd 1:4.5-1ubuntu2.5
perl-base 5.26.1-6ubuntu0.6
procps 2:3.3.12-3ubuntu1.2
psmisc 23.1-1ubuntu0.1
sensible-utils 0.0.12
tar 1.29b-2ubuntu0.4
ubuntu-keyring 2018.09.18.1~18.04.2
ucf 3.0038
udev 237-3ubuntu10.57
util-linux 2.31.1-0.4ubuntu3.7
uuid-runtime 2.31.1-0.4ubuntu3.7
zlib1g 1:1.2.11.dfsg-0ubuntu2.2
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0479a9a73c32d2448
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
PackageArchitecture: amd64
ProcCpuinfoMinimal:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
stepping : 1
microcode : 0xb000040
cpu MHz : 2299.922
cache size : 46080 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data
bogomips : 4600.03
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
SourcePackage: linux-signed-aws-5.4
Tags: third-party-packages bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True
```
We also need:
1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center
linux-aws-5.4-tools-5.4.0-1097:
Installed: (none)
Candidate: 5.4.0-1097.105~18.04.1
Version table:
5.4.0-1097.105~18.04.1 500
500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
3) What you expected to happen
The OS to not hose the NFS server with metadata requests until something breaks
4) What happened instead
The OS hosed the NFS server with metadata requests until it caused a production outage |
We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage.
Diagnostics...
`nfsiostat` shows 50-600 ops/s on the servers where it is happening.
`nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y`
`nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles;
```
CALL [ operations: 4 tag: void minor version: 1
[ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ]
[ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ]
[ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
REPLY [ operations: 4 status: OK tag: void
[ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ]
[ PUTFH(22) [ status: OK ]
[ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
```
`lsof -N` shows nothing using files from the NFS drive.
I tried changing some of the mount options to reduce the amount of metadata requests going through to the NFS server and found that `lazytime` did help - observations are as below;
Server 1:
checked mount options - using relatime
checked nfs stats - 460.982 ops/s
updated mount options
unmounted and mounted the NFS drive
checked nfs stats - 1.400 ops/s
rebooted
checked nfs stats - 1.316 ops/s
Server 2:
checked mount options - using relatime
checked nfs stats - 390.998 ops/s
did not updated mount options
unmounted and mounted the NFS drive
checked nfs stats - 1.750 ops/s
rebooted
checked nfs stats - 531.932 ops/s
Conclusion from this is that;
- Unmounting and remounting the disk stops the issue temporarily
- `lazytime` option seems to prevent the issue from happening
- Issue may be boot related
I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below;
```
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot
```
This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`.
I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount.
Output from `ubuntu-bug linux`
```
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
Date: Fri Mar 17 16:29:14 2023
Dependencies:
adduser 3.116ubuntu1
apt 1.6.14
apt-utils 1.6.14
busybox-initramfs 1:1.27.2-2ubuntu3.4
ca-certificates 20211016ubuntu0.18.04.1
coreutils 8.28-1ubuntu1
cpio 2.12+dfsg-6ubuntu0.18.04.4
debconf 1.5.66ubuntu1
debconf-i18n 1.5.66ubuntu1
dmsetup 2:1.02.145-4.1ubuntu3.18.04.3
dpkg 1.19.0.5ubuntu2.4
fdisk 2.31.1-0.4ubuntu3.7
gcc-8-base 8.4.0-1ubuntu1~18.04
gettext-base 0.19.8.1-6ubuntu0.3
gpgv 2.2.4-1ubuntu1.6
grub-common 2.02-2ubuntu8.25
grub-gfxpayload-lists 0.7
grub-pc 2.02-2ubuntu8.25
grub-pc-bin 2.02-2ubuntu8.25
grub2-common 2.02-2ubuntu8.25
init-system-helpers 1.51
initramfs-tools 0.130ubuntu3.13
initramfs-tools-bin 0.130ubuntu3.13
initramfs-tools-core 0.130ubuntu3.13
klibc-utils 2.0.4-9ubuntu2.2
kmod 24-1ubuntu3.5
libacl1 2.2.52-3build1
libapt-inst2.0 1.6.14
libapt-pkg5.0 1.6.14
libattr1 1:2.4.47-2build1
libaudit-common 1:2.8.2-1ubuntu1.1
libaudit1 1:2.8.2-1ubuntu1.1
libblkid1 2.31.1-0.4ubuntu3.7
libbz2-1.0 1.0.6-8.1ubuntu0.2
libc6 2.27-3ubuntu1.6
libcap-ng0 0.7.7-3.1
libdb5.3 5.3.28-13.1ubuntu1.1
libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3
libfdisk1 2.31.1-0.4ubuntu3.7
libffi6 3.2.1-8
libfreetype6 2.8.1-2ubuntu2.2
libfuse2 2.9.7-1ubuntu1
libgcc1 1:8.4.0-1ubuntu1~18.04
libgcrypt20 1.8.1-4ubuntu1.3
libgmp10 2:6.1.2+dfsg-2ubuntu0.1
libgnutls30 3.5.18-1ubuntu1.6
libgpg-error0 1.27-6
libgpm2 1.20.7-5
libhogweed4 3.4.1-0ubuntu0.18.04.1
libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]
libklibc 2.0.4-9ubuntu2.2
libkmod2 24-1ubuntu3.5
liblocale-gettext-perl 1.07-3build2
liblz4-1 0.0~r131-2ubuntu3.1
liblzma5 5.2.2-1.3ubuntu0.1
libmount1 2.31.1-0.4ubuntu3.7
libncurses5 6.1-1ubuntu1.18.04
libncursesw5 6.1-1ubuntu1.18.04
libnettle6 3.4.1-0ubuntu0.18.04.1
libp11-kit0 0.23.9-2ubuntu0.1
libpam-modules 1.1.8-3.6ubuntu2.18.04.6
libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6
libpam0g 1.1.8-3.6ubuntu2.18.04.6
libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]
libpng16-16 1.6.34-1ubuntu0.18.04.2
libprocps6 2:3.3.12-3ubuntu1.2
libseccomp2 2.5.1-1ubuntu1~18.04.2
libselinux1 2.7-2build2
libsemanage-common 2.7-2build2
libsemanage1 2.7-2build2
libsepol1 2.7-1ubuntu0.1
libsmartcols1 2.31.1-0.4ubuntu3.7
libssl1.1 1.1.1-1ubuntu2.1~18.04.21
libstdc++6 8.4.0-1ubuntu1~18.04
libsystemd0 237-3ubuntu10.57
libtasn1-6 4.13-2
libtext-charwidth-perl 0.04-7.1
libtext-iconv-perl 1.7-5build6
libtext-wrapi18n-perl 0.06-7.1
libtinfo5 6.1-1ubuntu1.18.04
libudev1 237-3ubuntu10.57
libunistring2 0.9.9-0ubuntu2
libuuid1 2.31.1-0.4ubuntu3.7
libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php]
linux-base 4.5ubuntu1.7
linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
lsb-base 9.20170808ubuntu1
openssl 1.1.1-1ubuntu2.1~18.04.21
os-prober 1.74ubuntu1
passwd 1:4.5-1ubuntu2.5
perl-base 5.26.1-6ubuntu0.6
procps 2:3.3.12-3ubuntu1.2
psmisc 23.1-1ubuntu0.1
sensible-utils 0.0.12
tar 1.29b-2ubuntu0.4
ubuntu-keyring 2018.09.18.1~18.04.2
ucf 3.0038
udev 237-3ubuntu10.57
util-linux 2.31.1-0.4ubuntu3.7
uuid-runtime 2.31.1-0.4ubuntu3.7
zlib1g 1:1.2.11.dfsg-0ubuntu2.2
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0479a9a73c32d2448
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
PackageArchitecture: amd64
ProcCpuinfoMinimal:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
stepping : 1
microcode : 0xb000040
cpu MHz : 2299.922
cache size : 46080 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data
bogomips : 4600.03
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
SourcePackage: linux-signed-aws-5.4
Tags: third-party-packages bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True
```
We also need:
1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center
linux-aws-5.4-tools-5.4.0-1097:
Installed: (none)
Candidate: 5.4.0-1097.105~18.04.1
Version table:
5.4.0-1097.105~18.04.1 500
500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
3) What you expected to happen
The OS to not hose the NFS server with metadata requests until something breaks
4) What happened instead
The OS hosed the NFS server with metadata requests until it caused a production outage
---
ProblemType: Bug
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
CRDA: N/A
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0b76c3b150c6b1423
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Xen HVM domU
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1097-aws root=UUID=cff99495-4abf-46b6-af32-df1478a88ffe ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
RelatedPackageVersions:
linux-restricted-modules-5.4.0-1097-aws N/A
linux-backports-modules-5.4.0-1097-aws N/A
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 08/24/2006
dmi.bios.vendor: Xen
dmi.bios.version: 4.11.amazon
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr4.11.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.11.amazon:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 4.11.amazon
dmi.sys.vendor: Xen |
|
2023-03-17 04:09:05 |
AzT3K |
attachment added |
|
AudioDevicesInUse.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655003/+files/AudioDevicesInUse.txt |
|
2023-03-17 04:09:08 |
AzT3K |
attachment added |
|
CurrentDmesg.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655004/+files/CurrentDmesg.txt |
|
2023-03-17 04:09:09 |
AzT3K |
attachment added |
|
Lspci.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655005/+files/Lspci.txt |
|
2023-03-17 04:09:11 |
AzT3K |
attachment added |
|
ProcCpuinfo.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655006/+files/ProcCpuinfo.txt |
|
2023-03-17 04:09:12 |
AzT3K |
attachment added |
|
ProcCpuinfoMinimal.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655007/+files/ProcCpuinfoMinimal.txt |
|
2023-03-17 04:09:14 |
AzT3K |
attachment added |
|
ProcInterrupts.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655008/+files/ProcInterrupts.txt |
|
2023-03-17 04:09:16 |
AzT3K |
attachment added |
|
ProcModules.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655009/+files/ProcModules.txt |
|
2023-03-17 04:09:19 |
AzT3K |
attachment added |
|
UdevDb.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655010/+files/UdevDb.txt |
|
2023-03-17 04:09:22 |
AzT3K |
attachment added |
|
WifiSyslog.txt https://bugs.launchpad.net/bugs/2012020/+attachment/5655011/+files/WifiSyslog.txt |
|
2023-03-17 04:10:43 |
AzT3K |
linux (Ubuntu): status |
Incomplete |
Confirmed |
|
2023-03-17 04:12:50 |
AzT3K |
description |
We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage.
Diagnostics...
`nfsiostat` shows 50-600 ops/s on the servers where it is happening.
`nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y`
`nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles;
```
CALL [ operations: 4 tag: void minor version: 1
[ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ]
[ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ]
[ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
REPLY [ operations: 4 status: OK tag: void
[ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ]
[ PUTFH(22) [ status: OK ]
[ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
```
`lsof -N` shows nothing using files from the NFS drive.
I tried changing some of the mount options to reduce the amount of metadata requests going through to the NFS server and found that `lazytime` did help - observations are as below;
Server 1:
checked mount options - using relatime
checked nfs stats - 460.982 ops/s
updated mount options
unmounted and mounted the NFS drive
checked nfs stats - 1.400 ops/s
rebooted
checked nfs stats - 1.316 ops/s
Server 2:
checked mount options - using relatime
checked nfs stats - 390.998 ops/s
did not updated mount options
unmounted and mounted the NFS drive
checked nfs stats - 1.750 ops/s
rebooted
checked nfs stats - 531.932 ops/s
Conclusion from this is that;
- Unmounting and remounting the disk stops the issue temporarily
- `lazytime` option seems to prevent the issue from happening
- Issue may be boot related
I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below;
```
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot
```
This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`.
I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount.
Output from `ubuntu-bug linux`
```
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
Date: Fri Mar 17 16:29:14 2023
Dependencies:
adduser 3.116ubuntu1
apt 1.6.14
apt-utils 1.6.14
busybox-initramfs 1:1.27.2-2ubuntu3.4
ca-certificates 20211016ubuntu0.18.04.1
coreutils 8.28-1ubuntu1
cpio 2.12+dfsg-6ubuntu0.18.04.4
debconf 1.5.66ubuntu1
debconf-i18n 1.5.66ubuntu1
dmsetup 2:1.02.145-4.1ubuntu3.18.04.3
dpkg 1.19.0.5ubuntu2.4
fdisk 2.31.1-0.4ubuntu3.7
gcc-8-base 8.4.0-1ubuntu1~18.04
gettext-base 0.19.8.1-6ubuntu0.3
gpgv 2.2.4-1ubuntu1.6
grub-common 2.02-2ubuntu8.25
grub-gfxpayload-lists 0.7
grub-pc 2.02-2ubuntu8.25
grub-pc-bin 2.02-2ubuntu8.25
grub2-common 2.02-2ubuntu8.25
init-system-helpers 1.51
initramfs-tools 0.130ubuntu3.13
initramfs-tools-bin 0.130ubuntu3.13
initramfs-tools-core 0.130ubuntu3.13
klibc-utils 2.0.4-9ubuntu2.2
kmod 24-1ubuntu3.5
libacl1 2.2.52-3build1
libapt-inst2.0 1.6.14
libapt-pkg5.0 1.6.14
libattr1 1:2.4.47-2build1
libaudit-common 1:2.8.2-1ubuntu1.1
libaudit1 1:2.8.2-1ubuntu1.1
libblkid1 2.31.1-0.4ubuntu3.7
libbz2-1.0 1.0.6-8.1ubuntu0.2
libc6 2.27-3ubuntu1.6
libcap-ng0 0.7.7-3.1
libdb5.3 5.3.28-13.1ubuntu1.1
libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3
libfdisk1 2.31.1-0.4ubuntu3.7
libffi6 3.2.1-8
libfreetype6 2.8.1-2ubuntu2.2
libfuse2 2.9.7-1ubuntu1
libgcc1 1:8.4.0-1ubuntu1~18.04
libgcrypt20 1.8.1-4ubuntu1.3
libgmp10 2:6.1.2+dfsg-2ubuntu0.1
libgnutls30 3.5.18-1ubuntu1.6
libgpg-error0 1.27-6
libgpm2 1.20.7-5
libhogweed4 3.4.1-0ubuntu0.18.04.1
libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]
libklibc 2.0.4-9ubuntu2.2
libkmod2 24-1ubuntu3.5
liblocale-gettext-perl 1.07-3build2
liblz4-1 0.0~r131-2ubuntu3.1
liblzma5 5.2.2-1.3ubuntu0.1
libmount1 2.31.1-0.4ubuntu3.7
libncurses5 6.1-1ubuntu1.18.04
libncursesw5 6.1-1ubuntu1.18.04
libnettle6 3.4.1-0ubuntu0.18.04.1
libp11-kit0 0.23.9-2ubuntu0.1
libpam-modules 1.1.8-3.6ubuntu2.18.04.6
libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6
libpam0g 1.1.8-3.6ubuntu2.18.04.6
libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]
libpng16-16 1.6.34-1ubuntu0.18.04.2
libprocps6 2:3.3.12-3ubuntu1.2
libseccomp2 2.5.1-1ubuntu1~18.04.2
libselinux1 2.7-2build2
libsemanage-common 2.7-2build2
libsemanage1 2.7-2build2
libsepol1 2.7-1ubuntu0.1
libsmartcols1 2.31.1-0.4ubuntu3.7
libssl1.1 1.1.1-1ubuntu2.1~18.04.21
libstdc++6 8.4.0-1ubuntu1~18.04
libsystemd0 237-3ubuntu10.57
libtasn1-6 4.13-2
libtext-charwidth-perl 0.04-7.1
libtext-iconv-perl 1.7-5build6
libtext-wrapi18n-perl 0.06-7.1
libtinfo5 6.1-1ubuntu1.18.04
libudev1 237-3ubuntu10.57
libunistring2 0.9.9-0ubuntu2
libuuid1 2.31.1-0.4ubuntu3.7
libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php]
linux-base 4.5ubuntu1.7
linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
lsb-base 9.20170808ubuntu1
openssl 1.1.1-1ubuntu2.1~18.04.21
os-prober 1.74ubuntu1
passwd 1:4.5-1ubuntu2.5
perl-base 5.26.1-6ubuntu0.6
procps 2:3.3.12-3ubuntu1.2
psmisc 23.1-1ubuntu0.1
sensible-utils 0.0.12
tar 1.29b-2ubuntu0.4
ubuntu-keyring 2018.09.18.1~18.04.2
ucf 3.0038
udev 237-3ubuntu10.57
util-linux 2.31.1-0.4ubuntu3.7
uuid-runtime 2.31.1-0.4ubuntu3.7
zlib1g 1:1.2.11.dfsg-0ubuntu2.2
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0479a9a73c32d2448
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
PackageArchitecture: amd64
ProcCpuinfoMinimal:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
stepping : 1
microcode : 0xb000040
cpu MHz : 2299.922
cache size : 46080 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data
bogomips : 4600.03
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
SourcePackage: linux-signed-aws-5.4
Tags: third-party-packages bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True
```
We also need:
1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center
linux-aws-5.4-tools-5.4.0-1097:
Installed: (none)
Candidate: 5.4.0-1097.105~18.04.1
Version table:
5.4.0-1097.105~18.04.1 500
500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
3) What you expected to happen
The OS to not hose the NFS server with metadata requests until something breaks
4) What happened instead
The OS hosed the NFS server with metadata requests until it caused a production outage
---
ProblemType: Bug
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
CRDA: N/A
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0b76c3b150c6b1423
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Xen HVM domU
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1097-aws root=UUID=cff99495-4abf-46b6-af32-df1478a88ffe ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
RelatedPackageVersions:
linux-restricted-modules-5.4.0-1097-aws N/A
linux-backports-modules-5.4.0-1097-aws N/A
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 08/24/2006
dmi.bios.vendor: Xen
dmi.bios.version: 4.11.amazon
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr4.11.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.11.amazon:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 4.11.amazon
dmi.sys.vendor: Xen |
We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage.
Diagnostics...
`nfsiostat` shows 50-600 ops/s on the servers where it is happening.
`nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y`
`nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles;
```
CALL [ operations: 4 tag: void minor version: 1
[ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ]
[ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ]
[ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
REPLY [ operations: 4 status: OK tag: void
[ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ]
[ PUTFH(22) [ status: OK ]
[ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
```
`lsof -N` shows nothing using files from the NFS drive.
Unmounting and remounting the disk stops the issue temporarily
I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below;
```
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot
```
This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`.
I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount.
Output from `ubuntu-bug linux`
```
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
Date: Fri Mar 17 16:29:14 2023
Dependencies:
adduser 3.116ubuntu1
apt 1.6.14
apt-utils 1.6.14
busybox-initramfs 1:1.27.2-2ubuntu3.4
ca-certificates 20211016ubuntu0.18.04.1
coreutils 8.28-1ubuntu1
cpio 2.12+dfsg-6ubuntu0.18.04.4
debconf 1.5.66ubuntu1
debconf-i18n 1.5.66ubuntu1
dmsetup 2:1.02.145-4.1ubuntu3.18.04.3
dpkg 1.19.0.5ubuntu2.4
fdisk 2.31.1-0.4ubuntu3.7
gcc-8-base 8.4.0-1ubuntu1~18.04
gettext-base 0.19.8.1-6ubuntu0.3
gpgv 2.2.4-1ubuntu1.6
grub-common 2.02-2ubuntu8.25
grub-gfxpayload-lists 0.7
grub-pc 2.02-2ubuntu8.25
grub-pc-bin 2.02-2ubuntu8.25
grub2-common 2.02-2ubuntu8.25
init-system-helpers 1.51
initramfs-tools 0.130ubuntu3.13
initramfs-tools-bin 0.130ubuntu3.13
initramfs-tools-core 0.130ubuntu3.13
klibc-utils 2.0.4-9ubuntu2.2
kmod 24-1ubuntu3.5
libacl1 2.2.52-3build1
libapt-inst2.0 1.6.14
libapt-pkg5.0 1.6.14
libattr1 1:2.4.47-2build1
libaudit-common 1:2.8.2-1ubuntu1.1
libaudit1 1:2.8.2-1ubuntu1.1
libblkid1 2.31.1-0.4ubuntu3.7
libbz2-1.0 1.0.6-8.1ubuntu0.2
libc6 2.27-3ubuntu1.6
libcap-ng0 0.7.7-3.1
libdb5.3 5.3.28-13.1ubuntu1.1
libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3
libfdisk1 2.31.1-0.4ubuntu3.7
libffi6 3.2.1-8
libfreetype6 2.8.1-2ubuntu2.2
libfuse2 2.9.7-1ubuntu1
libgcc1 1:8.4.0-1ubuntu1~18.04
libgcrypt20 1.8.1-4ubuntu1.3
libgmp10 2:6.1.2+dfsg-2ubuntu0.1
libgnutls30 3.5.18-1ubuntu1.6
libgpg-error0 1.27-6
libgpm2 1.20.7-5
libhogweed4 3.4.1-0ubuntu0.18.04.1
libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]
libklibc 2.0.4-9ubuntu2.2
libkmod2 24-1ubuntu3.5
liblocale-gettext-perl 1.07-3build2
liblz4-1 0.0~r131-2ubuntu3.1
liblzma5 5.2.2-1.3ubuntu0.1
libmount1 2.31.1-0.4ubuntu3.7
libncurses5 6.1-1ubuntu1.18.04
libncursesw5 6.1-1ubuntu1.18.04
libnettle6 3.4.1-0ubuntu0.18.04.1
libp11-kit0 0.23.9-2ubuntu0.1
libpam-modules 1.1.8-3.6ubuntu2.18.04.6
libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6
libpam0g 1.1.8-3.6ubuntu2.18.04.6
libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]
libpng16-16 1.6.34-1ubuntu0.18.04.2
libprocps6 2:3.3.12-3ubuntu1.2
libseccomp2 2.5.1-1ubuntu1~18.04.2
libselinux1 2.7-2build2
libsemanage-common 2.7-2build2
libsemanage1 2.7-2build2
libsepol1 2.7-1ubuntu0.1
libsmartcols1 2.31.1-0.4ubuntu3.7
libssl1.1 1.1.1-1ubuntu2.1~18.04.21
libstdc++6 8.4.0-1ubuntu1~18.04
libsystemd0 237-3ubuntu10.57
libtasn1-6 4.13-2
libtext-charwidth-perl 0.04-7.1
libtext-iconv-perl 1.7-5build6
libtext-wrapi18n-perl 0.06-7.1
libtinfo5 6.1-1ubuntu1.18.04
libudev1 237-3ubuntu10.57
libunistring2 0.9.9-0ubuntu2
libuuid1 2.31.1-0.4ubuntu3.7
libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php]
linux-base 4.5ubuntu1.7
linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
lsb-base 9.20170808ubuntu1
openssl 1.1.1-1ubuntu2.1~18.04.21
os-prober 1.74ubuntu1
passwd 1:4.5-1ubuntu2.5
perl-base 5.26.1-6ubuntu0.6
procps 2:3.3.12-3ubuntu1.2
psmisc 23.1-1ubuntu0.1
sensible-utils 0.0.12
tar 1.29b-2ubuntu0.4
ubuntu-keyring 2018.09.18.1~18.04.2
ucf 3.0038
udev 237-3ubuntu10.57
util-linux 2.31.1-0.4ubuntu3.7
uuid-runtime 2.31.1-0.4ubuntu3.7
zlib1g 1:1.2.11.dfsg-0ubuntu2.2
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0479a9a73c32d2448
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
PackageArchitecture: amd64
ProcCpuinfoMinimal:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
stepping : 1
microcode : 0xb000040
cpu MHz : 2299.922
cache size : 46080 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data
bogomips : 4600.03
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
SourcePackage: linux-signed-aws-5.4
Tags: third-party-packages bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True
```
We also need:
1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center
linux-aws-5.4-tools-5.4.0-1097:
Installed: (none)
Candidate: 5.4.0-1097.105~18.04.1
Version table:
5.4.0-1097.105~18.04.1 500
500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
3) What you expected to happen
The OS to not hose the NFS server with metadata requests until something breaks
4) What happened instead
The OS hosed the NFS server with metadata requests until it caused a production outage
---
ProblemType: Bug
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
CRDA: N/A
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0b76c3b150c6b1423
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Xen HVM domU
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1097-aws root=UUID=cff99495-4abf-46b6-af32-df1478a88ffe ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
RelatedPackageVersions:
linux-restricted-modules-5.4.0-1097-aws N/A
linux-backports-modules-5.4.0-1097-aws N/A
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 08/24/2006
dmi.bios.vendor: Xen
dmi.bios.version: 4.11.amazon
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr4.11.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.11.amazon:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 4.11.amazon
dmi.sys.vendor: Xen |
|
2023-03-19 19:31:46 |
AzT3K |
description |
We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage.
Diagnostics...
`nfsiostat` shows 50-600 ops/s on the servers where it is happening.
`nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y`
`nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles;
```
CALL [ operations: 4 tag: void minor version: 1
[ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ]
[ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ]
[ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
REPLY [ operations: 4 status: OK tag: void
[ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ]
[ PUTFH(22) [ status: OK ]
[ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
```
`lsof -N` shows nothing using files from the NFS drive.
Unmounting and remounting the disk stops the issue temporarily
I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below;
```
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot
```
This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`.
I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount.
Output from `ubuntu-bug linux`
```
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
Date: Fri Mar 17 16:29:14 2023
Dependencies:
adduser 3.116ubuntu1
apt 1.6.14
apt-utils 1.6.14
busybox-initramfs 1:1.27.2-2ubuntu3.4
ca-certificates 20211016ubuntu0.18.04.1
coreutils 8.28-1ubuntu1
cpio 2.12+dfsg-6ubuntu0.18.04.4
debconf 1.5.66ubuntu1
debconf-i18n 1.5.66ubuntu1
dmsetup 2:1.02.145-4.1ubuntu3.18.04.3
dpkg 1.19.0.5ubuntu2.4
fdisk 2.31.1-0.4ubuntu3.7
gcc-8-base 8.4.0-1ubuntu1~18.04
gettext-base 0.19.8.1-6ubuntu0.3
gpgv 2.2.4-1ubuntu1.6
grub-common 2.02-2ubuntu8.25
grub-gfxpayload-lists 0.7
grub-pc 2.02-2ubuntu8.25
grub-pc-bin 2.02-2ubuntu8.25
grub2-common 2.02-2ubuntu8.25
init-system-helpers 1.51
initramfs-tools 0.130ubuntu3.13
initramfs-tools-bin 0.130ubuntu3.13
initramfs-tools-core 0.130ubuntu3.13
klibc-utils 2.0.4-9ubuntu2.2
kmod 24-1ubuntu3.5
libacl1 2.2.52-3build1
libapt-inst2.0 1.6.14
libapt-pkg5.0 1.6.14
libattr1 1:2.4.47-2build1
libaudit-common 1:2.8.2-1ubuntu1.1
libaudit1 1:2.8.2-1ubuntu1.1
libblkid1 2.31.1-0.4ubuntu3.7
libbz2-1.0 1.0.6-8.1ubuntu0.2
libc6 2.27-3ubuntu1.6
libcap-ng0 0.7.7-3.1
libdb5.3 5.3.28-13.1ubuntu1.1
libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3
libfdisk1 2.31.1-0.4ubuntu3.7
libffi6 3.2.1-8
libfreetype6 2.8.1-2ubuntu2.2
libfuse2 2.9.7-1ubuntu1
libgcc1 1:8.4.0-1ubuntu1~18.04
libgcrypt20 1.8.1-4ubuntu1.3
libgmp10 2:6.1.2+dfsg-2ubuntu0.1
libgnutls30 3.5.18-1ubuntu1.6
libgpg-error0 1.27-6
libgpm2 1.20.7-5
libhogweed4 3.4.1-0ubuntu0.18.04.1
libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]
libklibc 2.0.4-9ubuntu2.2
libkmod2 24-1ubuntu3.5
liblocale-gettext-perl 1.07-3build2
liblz4-1 0.0~r131-2ubuntu3.1
liblzma5 5.2.2-1.3ubuntu0.1
libmount1 2.31.1-0.4ubuntu3.7
libncurses5 6.1-1ubuntu1.18.04
libncursesw5 6.1-1ubuntu1.18.04
libnettle6 3.4.1-0ubuntu0.18.04.1
libp11-kit0 0.23.9-2ubuntu0.1
libpam-modules 1.1.8-3.6ubuntu2.18.04.6
libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6
libpam0g 1.1.8-3.6ubuntu2.18.04.6
libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]
libpng16-16 1.6.34-1ubuntu0.18.04.2
libprocps6 2:3.3.12-3ubuntu1.2
libseccomp2 2.5.1-1ubuntu1~18.04.2
libselinux1 2.7-2build2
libsemanage-common 2.7-2build2
libsemanage1 2.7-2build2
libsepol1 2.7-1ubuntu0.1
libsmartcols1 2.31.1-0.4ubuntu3.7
libssl1.1 1.1.1-1ubuntu2.1~18.04.21
libstdc++6 8.4.0-1ubuntu1~18.04
libsystemd0 237-3ubuntu10.57
libtasn1-6 4.13-2
libtext-charwidth-perl 0.04-7.1
libtext-iconv-perl 1.7-5build6
libtext-wrapi18n-perl 0.06-7.1
libtinfo5 6.1-1ubuntu1.18.04
libudev1 237-3ubuntu10.57
libunistring2 0.9.9-0ubuntu2
libuuid1 2.31.1-0.4ubuntu3.7
libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php]
linux-base 4.5ubuntu1.7
linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
lsb-base 9.20170808ubuntu1
openssl 1.1.1-1ubuntu2.1~18.04.21
os-prober 1.74ubuntu1
passwd 1:4.5-1ubuntu2.5
perl-base 5.26.1-6ubuntu0.6
procps 2:3.3.12-3ubuntu1.2
psmisc 23.1-1ubuntu0.1
sensible-utils 0.0.12
tar 1.29b-2ubuntu0.4
ubuntu-keyring 2018.09.18.1~18.04.2
ucf 3.0038
udev 237-3ubuntu10.57
util-linux 2.31.1-0.4ubuntu3.7
uuid-runtime 2.31.1-0.4ubuntu3.7
zlib1g 1:1.2.11.dfsg-0ubuntu2.2
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0479a9a73c32d2448
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
PackageArchitecture: amd64
ProcCpuinfoMinimal:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
stepping : 1
microcode : 0xb000040
cpu MHz : 2299.922
cache size : 46080 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data
bogomips : 4600.03
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
SourcePackage: linux-signed-aws-5.4
Tags: third-party-packages bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True
```
We also need:
1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center
linux-aws-5.4-tools-5.4.0-1097:
Installed: (none)
Candidate: 5.4.0-1097.105~18.04.1
Version table:
5.4.0-1097.105~18.04.1 500
500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
3) What you expected to happen
The OS to not hose the NFS server with metadata requests until something breaks
4) What happened instead
The OS hosed the NFS server with metadata requests until it caused a production outage
---
ProblemType: Bug
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
CRDA: N/A
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0b76c3b150c6b1423
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Xen HVM domU
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1097-aws root=UUID=cff99495-4abf-46b6-af32-df1478a88ffe ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
RelatedPackageVersions:
linux-restricted-modules-5.4.0-1097-aws N/A
linux-backports-modules-5.4.0-1097-aws N/A
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 08/24/2006
dmi.bios.vendor: Xen
dmi.bios.version: 4.11.amazon
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr4.11.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.11.amazon:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 4.11.amazon
dmi.sys.vendor: Xen |
We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks on AWS. This issue ended up exhausting all the burst credits we had accrued and caused a production outage.
Diagnostics...
`nfsiostat` shows 50-600 ops/s on the servers where it is happening.
`nfsstat -m` shows these mount flags: `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,fsc,local_lock=none,addr=y.y.y.y`
`nfstrace --mode=live --verbose=2` shows that the same operations are happening repeatedly on 2-5 filehandles;
```
CALL [ operations: 4 tag: void minor version: 1
[ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ]
[ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ]
[ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
REPLY [ operations: 4 status: OK tag: void
[ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ]
[ PUTFH(22) [ status: OK ]
[ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE access: READ LOOKUP MODIFY EXTEND DELETE ]
[ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ] ]
```
`lsof -N` shows nothing using files from the NFS drive.
Unmounting and remounting the disk stops the issue temporarily
I also enabled rpcdebugging via `rpcdebug -m nfs -s all` and a snippet of the output is below;
```
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot
```
This loops repeatedly at a rate roughly consistent with ops/s output from `nfsiostat`.
I tried rolling back to `5.4.0-1096-aws`, updating grub and rebooting and the issue no longer occurs. I see the issue again once rolling forward to `5.4.0-1097-aws`, updating grub and rebooting as soon as the web server loads anything from the NFS mount.
Output from `ubuntu-bug linux`
```
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
Date: Fri Mar 17 16:29:14 2023
Dependencies:
adduser 3.116ubuntu1
apt 1.6.14
apt-utils 1.6.14
busybox-initramfs 1:1.27.2-2ubuntu3.4
ca-certificates 20211016ubuntu0.18.04.1
coreutils 8.28-1ubuntu1
cpio 2.12+dfsg-6ubuntu0.18.04.4
debconf 1.5.66ubuntu1
debconf-i18n 1.5.66ubuntu1
dmsetup 2:1.02.145-4.1ubuntu3.18.04.3
dpkg 1.19.0.5ubuntu2.4
fdisk 2.31.1-0.4ubuntu3.7
gcc-8-base 8.4.0-1ubuntu1~18.04
gettext-base 0.19.8.1-6ubuntu0.3
gpgv 2.2.4-1ubuntu1.6
grub-common 2.02-2ubuntu8.25
grub-gfxpayload-lists 0.7
grub-pc 2.02-2ubuntu8.25
grub-pc-bin 2.02-2ubuntu8.25
grub2-common 2.02-2ubuntu8.25
init-system-helpers 1.51
initramfs-tools 0.130ubuntu3.13
initramfs-tools-bin 0.130ubuntu3.13
initramfs-tools-core 0.130ubuntu3.13
klibc-utils 2.0.4-9ubuntu2.2
kmod 24-1ubuntu3.5
libacl1 2.2.52-3build1
libapt-inst2.0 1.6.14
libapt-pkg5.0 1.6.14
libattr1 1:2.4.47-2build1
libaudit-common 1:2.8.2-1ubuntu1.1
libaudit1 1:2.8.2-1ubuntu1.1
libblkid1 2.31.1-0.4ubuntu3.7
libbz2-1.0 1.0.6-8.1ubuntu0.2
libc6 2.27-3ubuntu1.6
libcap-ng0 0.7.7-3.1
libdb5.3 5.3.28-13.1ubuntu1.1
libdevmapper1.02.1 2:1.02.145-4.1ubuntu3.18.04.3
libfdisk1 2.31.1-0.4ubuntu3.7
libffi6 3.2.1-8
libfreetype6 2.8.1-2ubuntu2.2
libfuse2 2.9.7-1ubuntu1
libgcc1 1:8.4.0-1ubuntu1~18.04
libgcrypt20 1.8.1-4ubuntu1.3
libgmp10 2:6.1.2+dfsg-2ubuntu0.1
libgnutls30 3.5.18-1ubuntu1.6
libgpg-error0 1.27-6
libgpm2 1.20.7-5
libhogweed4 3.4.1-0ubuntu0.18.04.1
libidn2-0 2.3.0-1+ubuntu18.04.1+deb.sury.org+2 [origin: LP-PPA-ondrej-php]
libklibc 2.0.4-9ubuntu2.2
libkmod2 24-1ubuntu3.5
liblocale-gettext-perl 1.07-3build2
liblz4-1 0.0~r131-2ubuntu3.1
liblzma5 5.2.2-1.3ubuntu0.1
libmount1 2.31.1-0.4ubuntu3.7
libncurses5 6.1-1ubuntu1.18.04
libncursesw5 6.1-1ubuntu1.18.04
libnettle6 3.4.1-0ubuntu0.18.04.1
libp11-kit0 0.23.9-2ubuntu0.1
libpam-modules 1.1.8-3.6ubuntu2.18.04.6
libpam-modules-bin 1.1.8-3.6ubuntu2.18.04.6
libpam0g 1.1.8-3.6ubuntu2.18.04.6
libpcre3 2:8.44-2+ubuntu18.04.1+deb.sury.org+1 [origin: LP-PPA-ondrej-php]
libpng16-16 1.6.34-1ubuntu0.18.04.2
libprocps6 2:3.3.12-3ubuntu1.2
libseccomp2 2.5.1-1ubuntu1~18.04.2
libselinux1 2.7-2build2
libsemanage-common 2.7-2build2
libsemanage1 2.7-2build2
libsepol1 2.7-1ubuntu0.1
libsmartcols1 2.31.1-0.4ubuntu3.7
libssl1.1 1.1.1-1ubuntu2.1~18.04.21
libstdc++6 8.4.0-1ubuntu1~18.04
libsystemd0 237-3ubuntu10.57
libtasn1-6 4.13-2
libtext-charwidth-perl 0.04-7.1
libtext-iconv-perl 1.7-5build6
libtext-wrapi18n-perl 0.06-7.1
libtinfo5 6.1-1ubuntu1.18.04
libudev1 237-3ubuntu10.57
libunistring2 0.9.9-0ubuntu2
libuuid1 2.31.1-0.4ubuntu3.7
libzstd1 1.4.8+dfsg-2+ubuntu18.04.1+deb.sury.org+4 [origin: LP-PPA-ondrej-php]
linux-base 4.5ubuntu1.7
linux-modules-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
lsb-base 9.20170808ubuntu1
openssl 1.1.1-1ubuntu2.1~18.04.21
os-prober 1.74ubuntu1
passwd 1:4.5-1ubuntu2.5
perl-base 5.26.1-6ubuntu0.6
procps 2:3.3.12-3ubuntu1.2
psmisc 23.1-1ubuntu0.1
sensible-utils 0.0.12
tar 1.29b-2ubuntu0.4
ubuntu-keyring 2018.09.18.1~18.04.2
ucf 3.0038
udev 237-3ubuntu10.57
util-linux 2.31.1-0.4ubuntu3.7
uuid-runtime 2.31.1-0.4ubuntu3.7
zlib1g 1:1.2.11.dfsg-0ubuntu2.2
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0479a9a73c32d2448
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-image-5.4.0-1097-aws 5.4.0-1097.105~18.04.1
PackageArchitecture: amd64
ProcCpuinfoMinimal:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
stepping : 1
microcode : 0xb000040
cpu MHz : 2299.922
cache size : 46080 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data
bogomips : 4600.03
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
SourcePackage: linux-signed-aws-5.4
Tags: third-party-packages bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True
```
We also need:
1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center
linux-aws:
Installed: 5.4.0.1097.75
Candidate: 5.4.0.1097.75
Version table:
*** 5.4.0.1097.75 500
500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
100 /var/lib/dpkg/status
4.15.0.1007.7 500
500 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
3) What you expected to happen
The OS to not hose the NFS server with metadata requests until something breaks
4) What happened instead
The OS hosed the NFS server with metadata requests until it caused a production outage
---
ProblemType: Bug
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.28
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
CRDA: N/A
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0b76c3b150c6b1423
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2c
Ec2InstanceType: t2.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Xen HVM domU
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1097-aws root=UUID=cff99495-4abf-46b6-af32-df1478a88ffe ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295
ProcVersionSignature: Ubuntu 5.4.0-1097.105~18.04.1-aws 5.4.229
RelatedPackageVersions:
linux-restricted-modules-5.4.0-1097-aws N/A
linux-backports-modules-5.4.0-1097-aws N/A
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic ec2-images
Uname: Linux 5.4.0-1097-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 08/24/2006
dmi.bios.vendor: Xen
dmi.bios.version: 4.11.amazon
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr4.11.amazon:bd08/24/2006:svnXen:pnHVMdomU:pvr4.11.amazon:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 4.11.amazon
dmi.sys.vendor: Xen |
|
2023-03-21 04:00:22 |
AzT3K |
bug task added |
|
linux-aws-5.4 (Ubuntu) |
|
2023-03-21 04:00:42 |
AzT3K |
bug task added |
|
linux-aws (Ubuntu) |
|
2023-03-22 02:35:22 |
AzT3K |
linux-aws (Ubuntu): status |
New |
Confirmed |
|
2023-03-22 02:35:24 |
AzT3K |
linux-aws-5.4 (Ubuntu): status |
New |
Confirmed |
|
2023-03-22 02:57:43 |
AzT3K |
bug task added |
|
linux-signed-aws-5.4 (Ubuntu) |
|
2023-03-22 02:57:50 |
AzT3K |
linux-signed-aws-5.4 (Ubuntu): status |
New |
Confirmed |
|
2023-03-23 03:13:11 |
AzT3K |
marked as duplicate |
|
2009325 |
|