NFSv4 do not invalidate cached information about deleted files

Bug #1641049 reported by Arseny Tolmachev
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

When using NFSv4 client in Ubuntu 16.04 the the following sequence of actions fails.

You need a NFS server (S) and two clients (A and B).

Let the NFS share be mounted at /nfsdata

A: echo test > /nfsdata/file
B: cat /nfsdata/file ===> test
A: rm /nfsdata/file
B: cat /nfsdata/file ===> cat: No such file or directory
A: echo test > /nfsdata/file
B: cat /nfsdata/file ===> cat: No such file or directory (!)

Doing echo 3 > proc/sys/vm/drop_caches as root makes the file visible on B.

14.04 works without any problem.

I have attached log from one of our servers, 4.4.0-47 kernel does not fix this issue.
Syslog contains rpcdebug -m nfs -s all output for the scenario (with different filenames).
I can do full packet capture of NFS traffic if you want.
Mounting nfs with lookupcache=positive do not fix this issue.

NFSv4 server is CentOS 6.7 in our case.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-45-generic 4.4.0-45.66
ProcVersionSignature: Ubuntu 4.4.0-45.66-generic 4.4.21
Uname: Linux 4.4.0-45-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Nov 8 03:13 seq
 crw-rw---- 1 root audio 116, 33 Nov 8 03:13 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Fri Nov 11 17:06:51 2016
HibernationDevice: RESUME=UUID=7b92d2e2-e481-471a-bc38-e178a9418aa1
InstallationDate: Installed on 2016-11-07 (3 days ago)
InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 002 Device 002: ID 8087:8002 Intel Corp.
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
 Bus 001 Device 002: ID 8087:800a Intel Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. PowerEdge R630
PciMultimedia:

ProcEnviron:
 LC_CTYPE=en_US.UTF-8
 TERM=screen-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-45-generic.efi.signed root=UUID=11e697e5-38f8-41ae-863c-2755793044a6 ro
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-45-generic N/A
 linux-backports-modules-4.4.0-45-generic N/A
 linux-firmware 1.157.4
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/12/2016
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.0.1
dmi.board.name: 02C2CP
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.0.1:bd02/12/2016:svnDellInc.:pnPowerEdgeR630:pvr:rvnDellInc.:rn02C2CP:rvrA01:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R630
dmi.sys.vendor: Dell Inc.
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Nov 8 03:13 seq
 crw-rw---- 1 root audio 116, 33 Nov 8 03:13 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 16.04
HibernationDevice: RESUME=UUID=7b92d2e2-e481-471a-bc38-e178a9418aa1
InstallationDate: Installed on 2016-11-07 (3 days ago)
InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
IwConfig: Error: [Errno 2] No such file or directory
Lsusb:
 Bus 002 Device 002: ID 8087:8002 Intel Corp.
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
 Bus 001 Device 002: ID 8087:800a Intel Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. PowerEdge R630
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 LC_CTYPE=en_US.UTF-8
 TERM=screen-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-45-generic.efi.signed root=UUID=11e697e5-38f8-41ae-863c-2755793044a6 ro
ProcVersionSignature: Ubuntu 4.4.0-45.66-generic 4.4.21
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-45-generic N/A
 linux-backports-modules-4.4.0-45-generic N/A
 linux-firmware 1.157.4
RfKill: Error: [Errno 2] No such file or directory
Tags: xenial
Uname: Linux 4.4.0-45-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 02/12/2016
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.0.1
dmi.board.name: 02C2CP
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.0.1:bd02/12/2016:svnDellInc.:pnPowerEdgeR630:pvr:rvnDellInc.:rn02C2CP:rvrA01:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R630
dmi.sys.vendor: Dell Inc.

description: updated
description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1641049

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Arseny Tolmachev (eiennohito) wrote : CRDA.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Arseny Tolmachev (eiennohito) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Arseny Tolmachev (eiennohito) wrote : JournalErrors.txt

apport information

Revision history for this message
Arseny Tolmachev (eiennohito) wrote : Lspci.txt

apport information

Revision history for this message
Arseny Tolmachev (eiennohito) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Arseny Tolmachev (eiennohito) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Arseny Tolmachev (eiennohito) wrote : ProcModules.txt

apport information

Revision history for this message
Arseny Tolmachev (eiennohito) wrote : UdevDb.txt

apport information

Revision history for this message
Arseny Tolmachev (eiennohito) wrote : WifiSyslog.txt

apport information

Revision history for this message
Arseny Tolmachev (eiennohito) wrote :

All information should be added now.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.9 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9-rc5

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
tags: added: needs-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Arseny Tolmachev (eiennohito) wrote :

We are testing this with different kernel versions right now.

I would like to report that NFSv3 workaround does not work with symlinks (the bug still appears), but seems to work with regular files.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Arseny Tolmachev (eiennohito) wrote :

I was able to reproduce this bug in a bit different manner in vagrant VM environment.

We were using 16.04 as nfs server and 14.04, 16.04 (with latest kernel and 4.9-rc6) and 16.10 as clients.

For the testing we were using the script in the end of this post.

Scenario is:
* create 1k symlinks by A on share
* ls that directory
* delete symlinks on A
* create them again on A
* ls the directory again by B
* delete everything

repeat 10 times

Kernels 4.4 (from 16.04) AND 4.8 (from 16.10) have errors like

ls: cannot read symbolic link '/mnt/ubuntu/347': Input/output error
ls: cannot read symbolic link '/mnt/ubuntu/891': Input/output error
ls: cannot read symbolic link '/mnt/ubuntu/248': Input/output error
ls: cannot read symbolic link '/mnt/ubuntu/872': Input/output error

starting from the second ls.

Kernels 3.13 (from 14.04) and upstream 4.9-rc6 do not have these errors.

If you want vagrant files for vms please tell.

Testing script:

#!/bin/zsh

a=9e0147f
b=7329576

function exec_a {
    vagrant ssh -c "/bin/bash -c ${(q)*}" $a -- -q
}

function exec_b {
    vagrant ssh -c "/bin/bash -c ${(q)*}" $b -- -q
}

SHARE_ROOT=/mnt/ubuntu

function iteration {
    exec_a "for i in {0..1000}; do ln -s /dev/null $SHARE_ROOT/\$i ; done"
    exec_b "ls -li $SHARE_ROOT | wc -l | paste <(echo \"first ls\") - "
    exec_a "find $SHARE_ROOT -type l -delete"
    exec_a "for i in {0..1000}; do ln -s /dev/null $SHARE_ROOT/\$i ; done"
    exec_b "ls -li $SHARE_ROOT | wc -l | paste <(echo \"second ls\") - "
    exec_a "find $SHARE_ROOT -type l -delete"
}

exec_a "find $SHARE_ROOT -type l -delete"

for i in {0..10}; do iteration; done

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

We can perform a "Reverse" bisect to identify the commit that fixes this in v4.9-rc6. We first need to identify the last bad kernel version and the first good one. Does v4.9-rc5 exhibit the bug?

Revision history for this message
Petr gregor (gregy.cz) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.