Ubuntu
coreutils package

du crashes when traversing nfs mounted .snapshot directories

Bug #506798 reported by Tim Nicholas on 2010-01-13

This bug affects 1 person

	Status	Importance	Assigned to
coreutils (Fedora)	In Progress	Medium	redhat-bugs #501848
coreutils (Ubuntu)	Triaged	High	Unassigned
findutils (Ubuntu)	Triaged	Medium	Unassigned
linux (Fedora)	Won't Fix	Medium	redhat-bugs #533569
linux (Ubuntu)	Confirmed	Medium	Unassigned

Bug Description

Binary package hint: coreutils

I'm getting a problem where du errors (and exits) with "du: fts_read failed: no such file or directory" when traversing a directory with a NetApp ".snapshot" directory.

My understanding (clarified by the discussions linked bellow) is that:

1) The device ID/inode of a directory is recorded before the submount is made.
2) The device ID of the directory changes after the directory has been read (via readdir which causes the submount)
3) After examining the contents of the directory du goes back up the tree (via '..') finds the device ID doesn't match what it has recorded and assumes things have been moved around under it and bails for safety reasons.

I've researched online and this is an upstream bug. We're using Ubuntu 9.10 so I feel there should be a bug in the Ubuntu system.

The best information I've found is within Redhat's bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=501848
https://bugzilla.redhat.com/show_bug.cgi?id=533569

This bug has also been discussed on the coreutils mailing list:

http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00027.html
http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00032.html

and LKML:

http://lkml.org/lkml/2009/11/4/451

Unfortunately none of these discussions has resulted in a widely accepted solution.

We use NetApp .snapshots very extensively and can't afford for du to be unreliable. At the moment we will either have to patch du or downgrade all of coreutils to an older version.

For comparison we are upgrading from Ubunto 7.04 which works perfectly.

There is a similar problem with find, but it has a --without-fts build option which 'fixes' it.

Revision history for this message

In Red Hat Bugzilla #501848, Issue (issue-redhat-bugs) wrote on 2009-05-21:

#32

Escalated to Bugzilla from IssueTracker

Revision history for this message

In Red Hat Bugzilla #501848, Issue (issue-redhat-bugs) wrote on 2009-05-21:

#33

Description of problem:
If you run a du -h on a directory with .snapshot sub directories with coreutils-6.10+ (Could be lower, but >5.97-20) you will get a fts_read error:
du: fts_read failed: No such file or directory

How reproducible:
Every time.

Steps to Reproduce:
1. Use F10 or anything with the higher versions of coreutils. and a machine with the .snapshot directories created by netapp.
2. du -h
3. wait.

Actual results:
du: fts_read failed: No such file or directory

Expected results:
The size listing of all files and/or directories

Additional info:
This event sent from IssueTracker by cwyse [Pixar Animation Studios - Fedora Queue]
issue 298936

Revision history for this message

In Red Hat Bugzilla #501848, Issue (issue-redhat-bugs) wrote on 2009-05-21:

#34

I guess I spoke to soon. With coreutils-6.9-2 the problem is less
noticeable. On smaller directories it doesn't show up at all. But with
larger directories (directories with many sub directories) it is still
there, so some users will notice it and some will not. Problem is now
between:
coreutils-5.97-19 > and < coreutils-6.9-2.

I tried compiling the 6.7.* coreutils package but it keeps failing on
build and isn't saying what or why. I will look more into this and
update with what I find.

This event sent from IssueTracker by cwyse
issue 298936

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-05-21:

#35

It can be caused by the on-the-fly changes within the directory. It just want to traverse a directory (or file?) which no more exists. I am pretty sure you can't see the errors if you mount the file system read-only.

But there is no doubt the error message might be more verbose, it is listed as FIXME in du.c.

Revision history for this message

In Red Hat Bugzilla #501848, Ondrej (ondrej-redhat-bugs) wrote on 2009-05-21:

#36

Additionally - I guess Fedora version should be changed to something not EOL (as F-8 is EOL and F-9 will be EOL in ~2 months). From the comments I think version should be changed to F-10 - correct? Or some RHEL version?

Revision history for this message

In Red Hat Bugzilla #501848, Ondrej (ondrej-redhat-bugs) wrote on 2009-05-21:

#37

Additionally strace from the failure could be useful to better analyze what's the culprit...

Revision history for this message

In Red Hat Bugzilla #501848, Charlie (charlie-redhat-bugs) wrote on 2009-05-21:

#38

Created attachment 344994
strace of failure

I agree, changing version to F10, I originally just set it for the first version I noticed this problem in. Also, here is an strace of the failure on a F10 machine. I'm gonna try some of the 6.7 packages again and see if I can narrow down the window in which this fails.

Revision history for this message

In Red Hat Bugzilla #501848, Issue (issue-redhat-bugs) wrote on 2009-05-21:

#39

Finally got 6.7-1 compiled. It shows the same fts_read issue.
5.97-22 >< 6.7-1
This is about as narrowed as I can get it, I'm gonna try diff'ing up a
patch between du.c and... see what happens.

This event sent from IssueTracker by cwyse
issue 298936

Revision history for this message

In Red Hat Bugzilla #501848, Issue (issue-redhat-bugs) wrote on 2009-05-21:

#40

patching fts.c was a fail on compile. There looks like a fts.c.du file in
6.7 and a fts.c.inaccessibledirs in 5.7. Since the files do not exist in
both trees I'm kinda not sure how to test patching that.

This event sent from IssueTracker by cwyse
issue 298936

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-05-21:

#41

Could you please try following on the same directory?

$ find -printf %b\\n

Does it give the same errors? Different errors? No errors?

Revision history for this message

In Red Hat Bugzilla #501848, Issue (issue-redhat-bugs) wrote on 2009-05-21:

#42

Ran "find -printf %b\n" I didn't get any errors, it took over an hour
and ran my cpu at 121.6%, but no errors, running it again to verify.

This event sent from IssueTracker by cwyse
issue 298936

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-05-21:

#43

Does 'du' print just one error message and then die? Is the output obviously incomplete? Or the problem is only about the error message and return code?

Revision history for this message

In Red Hat Bugzilla #501848, Ondrej (ondrej-redhat-bugs) wrote on 2009-06-10:

#44

Something interesting to read (about the same issue and how to reduce impact):
http://www.unixtutorial.org/2009/02/troubleshooting-du-fts_read-no-such-file-or-directory-error/

From what I have quickly checked, if find's fts_read() returns NULL, it just closes FTS structure and goes to next argument. If du's fts_read() returns NULL, it checks for errno - and spits corresponding diagnostics. Difference is in checking function - find has a bit more complex checking function consider_visiting() - maybe some parts from it should be used/adapted in du's process_file() function.

Revision history for this message

In Red Hat Bugzilla #501848, Ondrej (ondrej-redhat-bugs) wrote on 2009-07-20:

#45

Played a bit with that bz again - fts_read error is being set on lib/fts.c:2000 - hardcoding ENOENT to errno. Error occurs when ".." entry is not cached yet, so with more repeating after mount, it seems to be possible to get rid off those errors and to get correct result. It seems that check on fts.c:1997/1998 has to be extended to handle properly that situation with NetApp .snapshot dir. Using du -Lsh also helps in some cases.

Revision history for this message

In Red Hat Bugzilla #501848, Ondrej (ondrej-redhat-bugs) wrote on 2009-07-21:

#46

Created attachment 354463
Workaround for ".." directories and ?race conditions?

Played a bit more with that fts_read failure, attached patch is workarounding the issue. It seems that due to "maybe caching race condition" after fstat on ".." fts entry it sometimes has device number of the parent directory (first run after mount).

e.g. (variable: fts_value : fstat_value):
devicenum: 25 : 33
inode: 8217100 : 8217100
Next run on the same place has correctly same values for fts_value and fstat_value and it looks like:
devicenum: 33 : 33
inode: 8217100 : 8217100

I'm quite sure that patch is NOT correct way how to solve that issue, that race condition should be eliminated - but I'm not really sure where. Filesystem? Kamil - any idea?

Revision history for this message

In Red Hat Bugzilla #501848, Ondrej (ondrej-redhat-bugs) wrote on 2009-07-21:

#47

Created attachment 354464
Better one ;) workaround for ".." directories and ?race conditions?

Damned, previous one was obviously not correct ... that one should be better...

Revision history for this message

In Red Hat Bugzilla #501848, Charlie (charlie-redhat-bugs) wrote on 2009-07-21:

#48

I added the patch to the latest coreutils package and I haven't seen the error yet. I ran a du -h over my lunch break. I'm letting my customer try it out and give it his stamp of approval. But so far it looks like it resolves the issue. I'll let you know if anything changes.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-07-22:

#49

I've narrowed down the strange behavior to sort of minimal example (/mnt/archive is a NetApp mount point):

umount /mnt/archive && mount /mnt/archive \
    && stat --printf "%d\t%i\t%n\n" /mnt/archive/.snapshot \
    && stat --printf "%d\t%i\t%n\n" /mnt/archive/.snapshot/hourly.0 \
    && stat --printf "%d\t%i\t%n\n" /mnt/archive/.snapshot

The output is following:

    20 67 /mnt/archive/.snapshot
    26 222 /mnt/archive/.snapshot/hourly.0
    26 67 /mnt/archive/.snapshot

The device number is being changed on the fly while the inode number stays unchanged. It sounds like a file system bug to me. It's 100% reproducible on my box.

Revision history for this message

In Red Hat Bugzilla #501848, Charlie (charlie-redhat-bugs) wrote on 2009-07-22:

#50

I ran the patched coreutils package on my .snapshot directory 3 times and didn't see a single error. It takes about 30 minutes to go through the .snapshot directory. Before Ovasik's patch it would run for about 30 seconds then fail. Kdudka, are you using the patch and still noticing this?

Revision history for this message

In Red Hat Bugzilla #501848, Issue (issue-redhat-bugs) wrote on 2009-07-22:

#51

Event posted on 07-22-2009 02:51pm EDT by cwyse

Customer just got back to me with some comments. This new package creates
.snapshot directories on his desktop. This was a problem in F10 which
went away with F11. So it looks like a slight regression?

This event sent from IssueTracker by cwyse
issue 298936

Revision history for this message

In Red Hat Bugzilla #501848, Charlie (charlie-redhat-bugs) wrote on 2009-07-22:

#52

Here are the previous bugs that were related to the .snapshots showing up on the desktop. Just posting them here in case they help.

As noted in https://bugzilla.redhat.com/show_bug.cgi?id=472778 and https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb44598, NetApp filers use different FSIDs for the hidden snapshot directories they provide.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-07-23:

#53

(In reply to comment #21)
> Kdudka, are you using the patch and still noticing this?

The patch is only workaround for 'du' utility. It works for me, too. But it does not fix the file system bug. The minimal example uses 'stat', so it has nothing to do with that patch.

The comment #22 is missing some context here. Which package does create the .snapshot directories on customer's desktop? I am quite sure that coreutils does not.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-08-05:

#54

The problem persists with latest rawhide kernel:
Linux 2.6.31-0.122.rc5.git2.fc12.x86_64 #1 SMP Mon Aug 3 12:58:47 EDT 2009 x86_64

/etc/fstab:
filer-eng.brq.redhat.com:/vol/engineering/share /mnt/archive nfs ro 0 0

Revision history for this message

In Red Hat Bugzilla #501848, Jeff (jeff-redhat-bugs) wrote on 2009-08-05:

#55

Looking at the capture, it doesn't appear that the server is returning inconsistent inode info. However Kamil's reproducer seems to indicate that the client is changing the device number after it traverses into the directory.

I suspect that this means that the client isn't doing the shrinkable mount before returning the info on the first stat call.

Revision history for this message

In Red Hat Bugzilla #501848, Jeff (jeff-redhat-bugs) wrote on 2009-09-02:

#56

Confirmed...same behavior in rawhide too. I can also reproduce this with a non-netapp server simply by exporting a filesystem and then exporting another filesystem mounted onto a subdir of the first fs. Nothing netapp-specific here.

There's also a somewhat related problem...if a submount is done and then gets automatically unmounted, then the device numbers can change and even be reused for a completely different submount.

This is a bit tricky. On the one hand, the device number seems to change and that's probably bad for some apps. On the other hand, do we really want to trigger a mount just because someone did a stat() on the directory where we would eventually do a submount?

If I have a ton of exports that are subdirs of another exported filesystem I don't think I really want to do submounts of all of those filesystems just because someone did a "ls -l" in that directory.

Unfortunately, the device numbers for NFS are allocated on the fly during mount. So we can't easily "fake up" the device numbers and expect them to remain consistent without actually triggering a mount. The device number may be different once the submount gets done.

I suspect that the best we can probably do is to just make sure the device number is different from that of the parent filesystem, but we probably won't be able to make it consistent. That is, it'll change as soon as you walk into the dir...

I'll plan to do a writeup of this problem in the near future and post it to the upstream mailing list.

Revision history for this message

In Red Hat Bugzilla #501848, Jeff (jeff-redhat-bugs) wrote on 2009-09-02:

#57

This problem is really no different than how autofs works. When you run stat on an autofs mountpoint, you'll just get the directory until you walk into that directory.

That's actually correct behavior since you're adding a new mount when that occurs. This is almost completely the same thing, it's just that the kernel does a new mount w/o needing autofs.

I'm not sure this is actually bug, rather you're just seeing expected results when the kernel adds a new mount on the fly.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-09-02:

#58

Jeff, thanks for the analysis. I'll look at the fts code again and possibly reassign back to coreutils. Good to know it's reproducible independently on the NetApp mount point.

Revision history for this message

In Red Hat Bugzilla #501848, Jeff (jeff-redhat-bugs) wrote on 2009-09-03:

#59

Sounds good. I'll reassign this back to you for now.

Let me know if you need further clarification.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-10-18:

#60

making the bug public...

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-10-21:

#61

Reported upstream:
http://lists.gnu.org/archive/html/bug-gnulib/2009-10/msg00207.html

Revision history for this message

In Red Hat Bugzilla #501848, Jim (jim-redhat-bugs) wrote on 2009-10-31:

#62

Hello,

Is this happening because the device number is assigned first to one value initially, and later to another value -- all during a single hierarchy traversal?

If so, I'll have to push this back into the kernel/file-system court.
I think we'll have to make the file system present a consistent device and inode number for any file it serves.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-10-31:

#63

(In reply to comment #40)
> Is this happening because the device number is assigned first to one value
> initially, and later to another value -- all during a single hierarchy
> traversal?

It looks like a sort of expected behavior to me. If the file system is not mounted, the device number describes the directory which belongs to the surrounding file system. Once you trigger the mount, the same path (directory) belongs to the newly mounted file system, thus gets a new device number.

In fact I was more likely surprised how the inode number could stay consistent among the mounts.

> If so, I'll have to push this back into the kernel/file-system court.
> I think we'll have to make the file system present a consistent device and
> inode number for any file it serves.

Well, I try to prepare a complete client/server reproducer first since the one from comment #20 uses our internal server, not available to others for testing.

Revision history for this message

In Red Hat Bugzilla #501848, Jim (jim-redhat-bugs) wrote on 2009-10-31:

#64

What event triggers the mount?

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-10-31:

#65

(In reply to comment #42)
> What event triggers the mount?

From my observation with gdb:
1. calling fstatat() with AT_SYMLINK_NOFOLLOW does NOT trigger the mount.
2. calling fstatat() without AT_SYMLINK_NOFOLLOW triggers the mount, opening a directory as well.

If you are asking which events are guaranteed to trigger the mount and/or which events are guaranteed to NOT trigger the mount, kernel guys might give you a reliable answer.

Jeff, any idea?

Revision history for this message

In Red Hat Bugzilla #501848, Jeff (jeff-redhat-bugs) wrote on 2009-11-02:

#66

Submounts are triggered via the follow_link inode operation, so in some ways these are treated like symlinks...

The short answer is that the mount will be triggered whenever you walk a path in such a way that, if this component were a symlink it would be resolved to its target.

Longer answer:

If the place where you transition into a new filesystem is in the middle of a path, then generally the path will be resolved. If it's the last component of the path, then it depends on whether the LOOKUP_FOLLOW link flag is set in nameidata in the kernel. That varies with the type of operation -- for instance, lstat() won't have that set, but a "normal" stat() generally will.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-11-03:

#67

Minimal example which works reliably on my Fedora 11 installation:

# mount | grep ^/
/dev/sda1 on / type ext3 (rw)
/dev/sda3 on /home type ext4 (rw)

# ls -d /home/test
/home/test

# printf "/ *(fsid=0,crossmnt)\n/home *(crossmnt)\n" \
> /etc/exports

# service nfs restart
# mkdir /tmp/mnt
# mount -t nfs4 localhost:/ /tmp/mnt \
    && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home \
    && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home/test \
    && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home

29 2 /tmp/mnt/home
30 12 /tmp/mnt/home/test
30 2 /tmp/mnt/home

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-11-03:

#68

A patch for gnulib proposed upstream:

http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00027.html

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-11-04:

#69

(In reply to comment #46)
> A patch for gnulib proposed upstream:
>
> http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00027.html

The patch has been rejected by upstream because of performance impact in some obscure situations (namely traversing a directory which consists of 200000 directories nested in each other):

http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00032.html

As solution it was proposed to find (or perhaps implement?) a low cost way of recognizing a mount point during the traversal. "low cost" means cheaper than a stat call here.

Since there seems to be nothing I can do with this bug at the moment, I am reassigning it back to kernel.

Revision history for this message

In Red Hat Bugzilla #501848, Jim (jim-redhat-bugs) wrote on 2009-11-04:

#70

Hi Kamil,

Using your reproducer (above, thanks!) let's print one more dev/ino pair
(this is on F12):

$ stat --printf "%d %i %n\n" /tmp/mnt/home /tmp/mnt
24 2 /tmp/mnt/home
24 2 /tmp/mnt

That shows a big problem: two distinct directories have the same dev/ino pair,
and fts rightly objects, returning FTS_DC to indicate the directory cycle.
Because when fts encounters the same dev/ino pair twice in a traversal, and when not traversing symlinks, that represents a hard-linked directory cycle, which is usually a big problem. [Note that currently du does not diagnose this problem, but I'll fix that shortly. ]

Even if the above kernel/nfs bug is fixed, I am becoming more and more convinced that this varying-device-number problem is something that must be addressed in the kernel, and not in every single application that must perform dev/ino checks for security. Thanks for reassigning to the kernel.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-11-04:

#71

(In reply to comment #48)
> $ stat --printf "%d %i %n\n" /tmp/mnt/home /tmp/mnt
> 24 2 /tmp/mnt/home
> 24 2 /tmp/mnt

Good catch! Though I don't think you hit the cause of the original bug report, this looks indeed broken. The dev/ino pair should be unique per whole VFS, or am I wrong?

Jeff, what do think about the example?

Revision history for this message

In Red Hat Bugzilla #501848, Jeff (jeff-redhat-bugs) wrote on 2009-11-04:

#72

I'd have to look at the example more closely, but it's likely that the kernel code is picking up the inode number of the root inode of the underlying filesystem.

I think what's happening is that the server sends the inode number of /tmp/mnt/home and a new fsid, but the client doesn't actually spawn a new submount there. So the device ID ends up the same. In fact, all of my ext3/4 filesystems seem to give the root inode st_ino == 2, so that's probably what's happening.

The trivial workaround here is to probably use stat() instead of lstat() here (-L option to the stat program), but I imagine that won't be suitable?

How to fix this? I don't think there is a way to do so without triggering a submount even when we don't want to follow symlinks.

That's going to be very costly for performance in many cases (if it's even reasonably doable). Imagine cd'ing into a directory that has a 1000 exported filesystems under it. Simply doing a readdir() in there is going to make the client spawn 1000 new mounts.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-11-04:

#73

(In reply to comment #50)
> The trivial workaround here is to probably use stat() instead of lstat() here
> (-L option to the stat program), but I imagine that won't be suitable?

Yep, this suppresses the bug as well as du -L in the original bug report. But we get a different result, so it's really not suitable.

> How to fix this? I don't think there is a way to do so without triggering a
> submount even when we don't want to follow symlinks.

I think this *should* be fixed since it breaks one of the basic axioms about VFS.

> That's going to be very costly for performance in many cases (if it's even
> reasonably doable). Imagine cd'ing into a directory that has a 1000 exported
> filesystems under it. Simply doing a readdir() in there is going to make the
> client spawn 1000 new mounts.

No chance to get unique dev/ino pairs without triggering the mount first?

Revision history for this message

In Red Hat Bugzilla #501848, Peter (peter-redhat-bugs) wrote on 2009-11-04:

#74

No, sorry, no way to determine what the ino is for the new file system
without talking to the server.

Doing an ls in a directory full of many autofs mounted file systems
should not trigger mounts for all of those file systems. This will
cause a bigger performance problem than the original perceived
problem ever did.

Perhaps the right way to address this is to flag the returned
directory entries to the user level with something which indicates
that the metadata information for that entry will change if the
file system which would be mounted there was actually mounted
there. This would eliminate most of the extra stat calls that Jim
Meyering is worried about.

Revision history for this message

In Red Hat Bugzilla #501848, Jim (jim-redhat-bugs) wrote on 2009-11-04:

#75

FYI, I've (re)raised the issue on LKML:

http://lkml.org/lkml/2009/11/4/451

Revision history for this message

In Red Hat Bugzilla #501848, Jeff (jeff-redhat-bugs) wrote on 2009-11-04:

#76

Minor nit...we get the correct st_ino for the directory. The problem is that we don't have accurate st_dev info at that point since the mount hasn't occurred yet.

That said...it would be nice to be able to flag the entries in the way that Peter suggests. The question is how to do that in a way that's compatible with POSIX here.

Maybe we could declare a new S_IF* value for st_mode:

S_IFXDEV 020000

That should allow us to leave the S_IFDIR bit set and it employs a bit that's outside of __S_IFMT. The kernel could set this bit in the statbuf when it detects that the fsid on the inode is not the same as that of the parent directory.

The big question is whether and if someone wants to implement this and then sell it upstream :)

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-11-05:

#77

Another question is how coreutils will detect that running kernel has the ability to indicate mount points, thus decide whether to use the optimization or not.

Revision history for this message

In Red Hat Bugzilla #501848, Peter (peter-redhat-bugs) wrote on 2009-11-05:

#78

If an approach similar to what Jeff has suggested, then it won't matter.
If the kernel sets S_IFXDEV,then coreutils can use the optimization. If
it doesn't, then it won't?

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-11-05:

#79

Nope, if I understand it correctly, the semantic of S_IFXDEV bit is exactly opposite. If the bit is set, we need to call stat again after opening a directory. But if it's not set and we don't know if the kernel provides this feature, we can't use the optimization and need to call stat anyway. Or am I wrong?

Revision history for this message

In Red Hat Bugzilla #501848, Peter (peter-redhat-bugs) wrote on 2009-11-05:

#80

Yes, sorry, was looking at the other way around.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2009-11-05:

#81

I think we need either a bit with exactly inverse value, or another equipment indicating that kernel is able to set the S_IFXDEV bit reliably.

Revision history for this message

In Red Hat Bugzilla #533569, Jim (jim-redhat-bugs) wrote on 2009-11-07:

#18

Description of problem: in the vicinity of a mount point directory, two directories may have the same device and inode number. This is a serious problem because many tools treat the condition as indicating a hard directory cycle, which usually indicates file system corruption.

Version-Release number of selected component (if applicable):
2.6.31.5-122.fc12.x86_64

How reproducible: every time

Steps to Reproduce:
Based on the set-up from Kamil Dudka in https://bugzilla.redhat.com/show_bug.cgi?id=501848#c45

# mount | grep ^/
...
/dev/sda8 on /home type ext4 (rw,noatime)
...
# top=/home
# cat /etc/exports
# printf "/ *(fsid=0,crossmnt)\n$top *(crossmnt)\n" >> /etc/exports
# service nfs restart
...
# mkdir /tmp/mnt
# mount -t nfs4 localhost:/ /tmp/mnt
# stat --printf "%d %i %n\n" /tmp/mnt{,$top}
22 2 /tmp/mnt
22 2 /tmp/mnt/home

Then, using the very latest du from upstream coreutils.git,
I see this:

    $ du /tmp/mnt > /dev/null
    du: WARNING: Circular directory structure.
    This almost certainly means that you have a corrupted file system.
    NOTIFY YOUR SYSTEM MANAGER.
    The following directory is part of the cycle:
      `/tmp/mnt/home'

Actual results: above

Expected results: different dev and/or inode, no du failure

Additional info:

Revision history for this message

In Red Hat Bugzilla #501848, Jim (jim-redhat-bugs) wrote on 2009-11-07:

#82

(In reply to comment #48)
> Using your reproducer (above, thanks!) let's print one more dev/ino pair
> (this is on F12):
>
> $ stat --printf "%d %i %n\n" /tmp/mnt/home /tmp/mnt
> 24 2 /tmp/mnt/home
> 24 2 /tmp/mnt
>
> That shows a big problem: two distinct directories have the same dev/ino pair,

FYI, I've opened a new BZ to track this separate problem:

https://bugzilla.redhat.com/show_bug.cgi?id=533569

Revision history for this message

In Red Hat Bugzilla #533569, Steve (steve-redhat-bugs) wrote on 2009-11-10:

#19

> # stat --printf "%d %i %n\n" /tmp/mnt{,$top}
> 22 2 /tmp/mnt
> 22 2 /tmp/mnt/home
I do see this... but

> $ du /tmp/mnt > /dev/null
> du: WARNING: Circular directory structure.
> This almost certainly means that you have a corrupted file system.
> NOTIFY YOUR SYSTEM MANAGER.
> The following directory is part of the cycle:
> `/tmp/mnt/home'

What kernel are you using and nfs-utils

Revision history for this message

In Red Hat Bugzilla #533569, Steve (steve-redhat-bugs) wrote on 2009-11-10:

#20

I meant to say... I don't see the du error... what kernel/nfs-utils are
you using..

Revision history for this message

In Red Hat Bugzilla #533569, Kamil (kamil-redhat-bugs) wrote on 2009-11-10:

#21

(In reply to comment #2)
> I meant to say... I don't see the du error... what kernel/nfs-utils are
> you using..

You need to compile GNU coreutils from git to see the error.

Revision history for this message

In Red Hat Bugzilla #533569, Jim (jim-redhat-bugs) wrote on 2009-11-10:

#22

Hi Steve, kernel version is listed above.
nfs-utils-1.2.0-18.fc12.x86_64

Revision history for this message

In Red Hat Bugzilla #533569, Jeff (jeff-redhat-bugs) wrote on 2009-11-10:

#23

I think I understand what the issue is here. I just don't think that there's much we can do about it...

The stat program is doing a lstat() and that doesn't trigger a submount (LOOKUP_FOLLOW isn't set). So we end up doing a GETATTR call that returns info on the root inode of the /home mount. So the stat() syscall gets the "real" st_ino of /tmp/mnt/home, but the st_dev is still that of the parent (/tmp/mnt).

This is particularly evident here because the root of any ext3/4 filesystem has an st_ino of 2.

I think our options are:

1) fix the kernel to trigger a submount even when LOOKUP_FOLLOW isn't set (quite possibly very hard on performance)

2) fix the kernel to return a bit more info when we have a "potential mountpoint" like this. My suggestion on LKML was to coopt a new st_mode/i_mode bit and use that to indicate that a directory is potentially a new mountpoint if someone were to walk into it

So far, my suggestion hasn't received any feedback upstream.

Revision history for this message

In Red Hat Bugzilla #533569, Bug (bug-redhat-bugs) wrote on 2009-11-16:

#24

This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message

C de-Avillez (hggdh2) wrote on 2010-01-13:

Thank you for opening this bug and helping make Ubuntu better. Even more, thank you for your background work -- it is really appreciated.

I have added bug watches for both Fedora bugs you gave us (kernel -- which is 'linux' for us --, and coreutils), and I am also adding an empty bug watch for upstream coreutils.

As you point out, there is no clear path here. It looks like upstream is tending to consider this more of a kernel-addressable issue (in lieu of adding the patch additional time penalty for up to 10 to everybody). I guess we will have to wait and see.

Changed in coreutils (Ubuntu):
importance:	Undecided → Medium
status:	New → Triaged

Revision history for this message

Tim Nicholas (tjn) wrote on 2010-01-14:

I haven't recorded a bug against findutils but the same problem affects modern find as well which luckily enough can be worked around with './configure --without-fts'.

I'd be happy enough (in the short term) to have a compile time option like that to work around the issue.

For my own edification, has the adoption of fts been driven by speed, security or something else? It seems like this issue will be affecting more and more people with the increasing adoption of NetApp WAFL/zfs/btrfs etc like filesystems.

I'm going to see if the problem happens with an opensolaris live CD... If the bug exists across multiple operating systems fixing the way fts functions work would really be preferable to a kernel fix.

Revision history for this message

Tim Nicholas (tjn) wrote on 2010-01-14:

I tested with coreutils 7.4 on Mac OSX and it functioned correctly. I tried checking with opensolaris but the livecd doesn't have gcc and runs an older core utils version (6.7) which didn't error either.

I guess a kernel fix is the right choice... It's a shame it'll probably take longer to be implemented.

Dustin Kirkland  (kirkland) on 2010-01-19

Changed in coreutils (Ubuntu):
assignee:	nobody → Dustin Kirkland (kirkland)
importance:	Medium → High

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-01-19:

So FYI, the coreutils in both Ubuntu 9.10 (Karmic) and our 10.04 (Lucid) under development is version 7.4-2ubuntu1, probably closer to your MacOSX version.

Any chance you could try to reproduce it with one of those?

Changed in findutils (Ubuntu):
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-01-19:

No need to open a new bug against findutils. I just marked this bug as affecting findutils too. Let me see if there's any controversy in adding that configure option to find...

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-01-19:

Tim, can you characterize any disadvantages or limitations placed on other users by configuring find --without-fts ?

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-01-19:

Andy- can you take a look at this, from our kernel's perspective? Sounds like there was some discussion, but a lack of a consensus in the above reference LKML discussion. Is there anything we can do here to help our Ubuntu kernel users?

Changed in linux (Ubuntu):
status:	New → Confirmed
importance:	Undecided → Medium
assignee:	nobody → Andy Whitcroft (apw)

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-01-20:

Subscribing Colin too. Colin, would care to comment on what we could (and perhaps shouldn't) do to fix this in userspace in coreutils and findutils?

Revision history for this message

Tim Nicholas (tjn) wrote on 2010-01-20:

Thanks Dustin,

I'm not entirely sure what the motivations were for fts but I think it's to do with sanity checking to ensure that nothing has been moved around underneath a file system traversal... But again, not entirely sure about that.

One thing that does worry me is that I have no idea where else fts is used... coreutils and findutils so far but could well be in other places - which i guess would be the good thing about a kernel fix.

When I tested on MacOSX I also tested on Ubuntu 9.10 with the same version (7.4). I compiled it with the default configure options (apart from install location).

Revision history for this message

C de-Avillez (hggdh2) wrote on 2010-01-20:

#10

FWIW, I have started to provide a daily build of coreutils GIT on my PPA, right now only for Lucid (although I can do it for other versions if needed). There has been a bit of changes around fts since 7.4, so there.

If needed, you can get it at https://edge.launchpad.net/~hggdh2/+archive/ppa

Revision history for this message

Tim Nicholas (tjn) wrote on 2010-01-20:

#11

Thanks,

I build the package for 9.10 and it seems to be passing the simple test case I had for it... I'm at linux.conf.au at the moment so wont be able to do more testing at the moment but it seems to have helped.

Still not sure if this is something that _should_ be fixed in userspace but I'm happy to have a du that works!

Revision history for this message

C de-Avillez (hggdh2) wrote on 2010-01-20:

#12

So, I understand you are running with Kamil's patch, correct?

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-01-20:

#13

Tim-

Are you building the 9.10 package with any patches? If so, could you attach them here?

Revision history for this message

C de-Avillez (hggdh2) wrote on 2010-01-20:

#14

upstream proposed patch, not yet accepted, if at all (just for reference) Edit (1.6 KiB, text/plain)

Dustin, I am (still) guessing Tim used the patch Kamil proposed to upstream, and started the whole discussion on where to patch this (given the worst case scenario with an overhead of ~10%-17%).

I am attaching the patch here for completeness.

Revision history for this message

Tim Nicholas (tjn) wrote on 2010-01-20:

#15

Hi C,

I have not patched core utils myself.

The only modern version of coreutils which has functioned correctly (not heavily tested) was the ppa you linked to above - I don't know what patches have been added to that. The version I have is coreutils_20100119~8.4.5-e489f~ppa1_amd64.deb

We're just testing out Ubuntu 9.10 so I havne't been forced to roll out a fix yet and am hoping for a better option than the kamil patch (which i don't expect to be maintained upstream).

Cheers,
Tim

Revision history for this message

C de-Avillez (hggdh2) wrote on 2010-01-20:

#16

This is somewhat puzzling, given that I did *not* include Kamil's patch in it. The PPA carries the patches we have from Debian (and, I think, one local), plus two additional patches:

61_whoips, 63_dd-appenderrors, 72_id_checkngroups. These are standard patches on Debian/Ubuntu coreutils 7.4.

77_ls_colour, 80_fedora_sysinfo. 77_ls_colour is a proposed patch for bug 494663, and may be added upstream to a new release later on (the author is still considering). 80_fedora_sysinfo is a patch I took from Fedora for bug 470550. I have no idea if it will be accepted or not.

Finally, patch 78_kfbsd_tab.dpatch has been dropped, since it is present on 8.4.5 upstream.

Revision history for this message

Tim Nicholas (tjn) wrote on 2010-01-20:

#17

Hmm. OK. I may well have done insufficient and ineffective testing. When I get a chance I'll do it a bit more carefully - probably in the next couple of days.

Dustin Kirkland  (kirkland) on 2010-01-20

Changed in coreutils (Ubuntu):
assignee:	Dustin Kirkland (kirkland) → nobody

Revision history for this message

In Red Hat Bugzilla #533569, Jim (jim-redhat-bugs) wrote on 2010-03-16:

#25

AFAIK, nothing has changed, so I've reset "Version:" to rawhide.

Revision history for this message

In Red Hat Bugzilla #533569, Bug (bug-redhat-bugs) wrote on 2010-03-16:

#26

This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message

In Red Hat Bugzilla #533569, Jim (jim-redhat-bugs) wrote on 2010-04-08:

#27

Still affects rawhide, too.

Revision history for this message

In Red Hat Bugzilla #501848, Bug (bug-redhat-bugs) wrote on 2010-04-27:

#83

This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 11 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Andy Whitcroft (apw) on 2010-06-18

Changed in linux (Ubuntu):
assignee:	Andy Whitcroft (apw) → nobody

Revision history for this message

In Red Hat Bugzilla #501848, Bug (bug-redhat-bugs) wrote on 2010-06-28:

#84

Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Revision history for this message

In Red Hat Bugzilla #501848, Jim (jim-redhat-bugs) wrote on 2010-06-28:

#85

I wish it could be closed...
Still afflicts rawhide.

Revision history for this message

In Red Hat Bugzilla #501848, Bug (bug-redhat-bugs) wrote on 2010-07-30:

#86

This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message

In Red Hat Bugzilla #533569, Bug (bug-redhat-bugs) wrote on 2010-07-30:

#28

This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message

In Red Hat Bugzilla #533569, Jim (jim-redhat-bugs) wrote on 2010-09-02:

#29

Changing version back to 'rawhide'.

Revision history for this message

In Red Hat Bugzilla #501848, Jim (jim-redhat-bugs) wrote on 2010-11-24:

#87

still affects rawhide.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2011-01-11:

#88

(In reply to comment #45)
> # mount -t nfs4 localhost:/ /tmp/mnt \
> && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home \
> && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home/test \
> && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home
>
> 29 2 /tmp/mnt/home
> 30 12 /tmp/mnt/home/test
> 30 2 /tmp/mnt/home

FYI I tried the same example on my RHEL-5 machine and, surprisingly, there seems to be no such optimization. The first lstat() syscall on /tmp/mnt/home triggers the the mount of /tmp/mnt/home and picks the final dev/ino pair.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2011-01-11:

#89

... but it is still reproducible with autofs mount points even on RHEL-5.

Revision history for this message

In Red Hat Bugzilla #501848, Jeff (jeff-redhat-bugs) wrote on 2011-01-11:

#90

I concur. I can't reproduce this any more either on nfsv4:

# mount /mnt/dantu && stat --printf "%d\t%i\t%n\n" /mnt/dantu && stat --printf "%d\t%i\t%n\n" /mnt/dantu/ext3 && stat --printf "%d\t%i\t%n\n" /mnt/dantu/ext3/testfile && stat --printf "%d\t%i\t%n\n" /mnt/dantu/ext3
24 2 /mnt/dantu
25 2 /mnt/dantu/ext3
25 49153 /mnt/dantu/ext3/testfile
25 2 /mnt/dantu/ext3

...in my setup the host exports a filesystem and "ext3" is a mounted and exported filesystem under that. It seems like something has changed and now lstat() calls are triggering the mount. I'm going back through the changelogs now to see why it's different now.

Revision history for this message

In Red Hat Bugzilla #501848, Jeff (jeff-redhat-bugs) wrote on 2011-01-11:

#91

I should point out that those last results were with my latest RHEL5 test kernels.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2011-01-11:

#92

Jeff, sorry if my comment was confusing, but I think we both have exactly same results. This bug (501848) is against Fedora. RHEL-5 didn't repeat the the bug with nfsv4 for me, but I am still able to reproduce it on RHEL-5 with autofs. I wrote the comment here only as an auxiliary observation while investigating bug 537463 , which is against RHEL-5.

Revision history for this message

In Red Hat Bugzilla #501848, Jeff (jeff-redhat-bugs) wrote on 2011-01-11:

#93

No problem. It wasn't confusing. Steve asked me to have a look at this and I was just surprised that I was unable to reproduce this on recent RHEL5 kernels with NFSv4. Not sure why that is so far...

Revision history for this message

In Red Hat Bugzilla #533569, Ric (ric-redhat-bugs) wrote on 2012-10-17:

#30

Is this something that we can change in upstream or should we close this out?

Revision history for this message

In Red Hat Bugzilla #533569, Jeff (jeff-redhat-bugs) wrote on 2012-10-18:

#31

Not much we can do, I don't think...

If anything, the automount semantics are even less likely to trigger a mount these days. I think the only hope for this problem is the xstat() work that dhowells was working on, but that has sort of died upstream.

I'll go ahead and close this WONTFIX for now. Please reopen it if you want to discuss it further.

Revision history for this message

In Red Hat Bugzilla #501848, Fedora (fedora-redhat-bugs) wrote on 2013-04-03:

#94

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Revision history for this message

In Red Hat Bugzilla #501848, Justin (justin-redhat-bugs) wrote on 2013-04-05:

#95

Is this still a problem with 3.9 based F19 kernels?

Revision history for this message

In Red Hat Bugzilla #501848, Justin (justin-redhat-bugs) wrote on 2013-04-23:

#96

This bug is being closed with INSUFFICIENT_DATA as there has not been a
response in 2 weeks. If you are still experiencing this issue,
please reopen and attach the relevant data from the latest kernel you are
running and any data that might have been requested previously.

Revision history for this message

In Red Hat Bugzilla #501848, Kamil (kamil-redhat-bugs) wrote on 2013-04-23:

#97

The problem still exists in kernel-3.9.0-0.rc7.git3.1.fc20.x86_64. The reproducer from comment #45 works for me:

[root@f20 ~]# mount -t nfs4 localhost:/ /tmp/mnt && stat --printf "%d\t%i\t%n\n" /tmp/mnt/boot && stat --printf "%d\t%i\t%n\n" /tmp/mnt/boot/grub2 && stat --printf "%d\t%i\t%n\n" /tmp/mnt/boot
36 2 /tmp/mnt/boot
37 65025 /tmp/mnt/boot/grub2
37 2 /tmp/mnt/boot

Bug Watch Updater (bug-watch-updater) on 2017-10-27