NFS client reports a 'readdir loop' with a corrupt name

Bug #1240143 reported by Justin Fletcher on 2013-10-15
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Debian)
Fix Released
Unknown
linux (Fedora)
Invalid
Critical
linux (Ubuntu)
Medium
Unassigned

Bug Description

We have an NFS server running on a RedHat system. One particular directory contains many, many RPMs (96850). It reports that there is a 'readdir loop', and the loop in question contains corrupted names. I assume the name corruption is happening on the Linux kernel end, not the server end:

"NFS: directory Development/rpms contains a readdir loop.Please contact your server vendor. The file: foo-bar-11.0flange-12345.AB5.x86_64.rpmmpmpmmT53 has duplicate cookie 1110018804"
"NFS: directory Development/rpms contains a readdir loop.Please contact your server vendor. The file: widget-wiggle-11.0-12356.AB5.x86_64.rpmpm.AB5.x86_64.rpm\xffffffffm has duplicate cookie 353422206"

Since the corrupted names are never displayed in an 'ls' of the directory (even whilst the problem is occurring), I assume that this is a presentation problem in the warning message.

Unfortunately the problem had gone away by the time I tried using tcpdump to capture the on-the-wire data.

jfletcher@gromit:~$ cat /proc/version
Linux version 3.2.0-29-generic (buildd@allspice) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012
jfletcher@gromit:~$ lsb_release -rd
Description: Ubuntu 12.04.3 LTS
Release: 12.04

The lspci information would not be useful - the system was running under KVM, with a single interface.

Justin Fletcher (jfletcher) wrote :

"We have an NFS server running on a RedHat system."
... which we access through an Ubuntu 12.04 LTS system. It is on this system that the NFS client problems occur.

Sorry, that wasn't especially clear :-(

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1240143

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: precise
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.12 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.12-rc5-saucy/

Changed in linux (Ubuntu):
importance: Undecided → Medium
Justin Fletcher (jfletcher) wrote :

Missing log files were intentional; these are company systems and I am not allowed by policy to upload arbitrary files without review.

Testing to follow, but as the problem is sporadic, I'm not sure that we can say categorically that it is a fixed or not.

Justin Fletcher (jfletcher) wrote :

Tested with kernel 3.12 as advised and we still see the problem.

tags: added: kernel-bug-exists-upstream
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Justin Fletcher (jfletcher) wrote :

Kernel bug remains. Realised that jsalisbury had said that I should mark it as confirmed, and I hadn't.

Changed in linux (Ubuntu):
status: Expired → Confirmed
David Hedberg (david-hedberg-t) wrote :

We have an NFS server running on ubuntu 12.04, and after upgrading one client from 10.04 to 12.04 the other day we are hitting a similar (possibly the same) problem. The server setup has not been touched for months.

We have a directory with a lot of .xml files (~1009700 of them). Running an ls on this directory from another client running 12.04 initially produced this message:

[423354.265296] NFS: directory xxx/OLD contains a readdir loop.Please contact your server vendor. The file: 900015.xml\xffffffa1;s0z\xffffffda\xffffffa0\xffffffa0\xffffff91]c\x03\xffffff88\xffffffff\xffffffffml\xffffffa3\xffffffa3\x1b\xfffffff1' \xffffffb0\xffffff91]c\x03\xffffff88\xffffffff\xffffffffml#q%G\xffffff8c\xffffffa0\xffffffc0\xffffff91]c\x03\xffffff88\xffffffff\xffffffffxml\xffffffc4\xffffffe3>\xffffff9f\xffffffa8\xffffffd0\xffffff91\xffffff91]c\x03\xffff\x0f\xffffffbf\xfffffff0\xffffff91]c\x03\xffffff88\xffffffff\xffffffffxml}\xffffff9e\xffffff88\xffffffc3P has duplicate cookie 514419709fml\xffffffbb\xffffffb6\xfffffff2

Doing an cp -a on this file, removing the original file and moving the copy back in place fixes the corrupted filename, but the duplicate cookie problem remains.

Running a find | sort on the server and on the clients and diffing the output reveals no difference with 10.04 clients, but with the 12.04 client (and the problematic file moved away) we get ~10 duplicate entries in the output on the 12.04 client.

Our 10.04-clients seem unaffected. I've tried a 12.04 client with kernel 3.8 which shows the same problem.

I've tried mounting with different nfs versions, and the only change was that with nfsvers=2 I managed to list around ~700k files before it broke (as opposed to ~300k files otherwise)).

It also breaks rsync with

rsync: readdir("/the-path/OLD"): Too many levels of symbolic links (40)

Server information:
---
Linux xxx 3.2.0-39-generic #62-Ubuntu SMP Thu Feb 28 00:28:53 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

/dev/drbd0 on /data type ext3 (rw,noatime)

ii nfs-common 1:1.2.5-3ubuntu3.1 NFS support files common to client and server
ii nfs-kernel-server 1:1.2.5-3ubuntu3.1 support for NFS kernel server

David Hedberg (david-hedberg-t) wrote :

The "readdir loop" problem seems to be fairly widely known ( http://lwn.net/Articles/544520/ ), and an upgrade to the latest 12.04 kernel (3.2.0-60-generic) on the nfs server seems to have fixed the problem for us.

Changed in linux (Debian):
status: Unknown → Fix Released

I am closing this bug because it is fixed in recent kernel versions.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Changed in linux (Fedora):
importance: Unknown → Critical
status: Unknown → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.