os.lstat != os.fstat on Windows, possible dirstate caching issue

Bug #478023 reported by John A Meinel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

In running the test suite, I found that

 os.lstat('file')
versus
 os.fstat(open('file', 'rb').fileno())

gives slightly different results.

Namely, the latter will have a value for 'st.st_ino' while the former will have 0 stored.

I haven't investigated, but potentially this is a performance issue for Windows and dirstate. If dirstate ends up caching the result of fstat() but then when iterating we compare that to lstat() we will think the content has changed, and re-read and re-sha the file. If we then stored the value from fstat() again, we would have done all that work and then store bogus data again.

I don't think we are being that bad, or we would have noticed earlier. But it is something we should be careful about trusting fstat versus lstat, and I haven't had a chance to audit the code.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 478023] [NEW] os.lstat != os.fstat on Windows, possible dirstate caching issue

On Sun, 2009-11-08 at 03:46 +0000, John A Meinel wrote:
>
>
> In running the test suite, I found that
>
> os.lstat('file')
> versus
> os.fstat(open('file', 'rb').fileno())
>
> gives slightly different results.
>
> Namely, the latter will have a value for 'st.st_ino' while the former
> will have 0 stored.
>
> I haven't investigated, but potentially this is a performance issue
> for
> Windows and dirstate. If dirstate ends up caching the result of
> fstat()
> but then when iterating we compare that to lstat() we will think the
> content has changed, and re-read and re-sha the file. If we then
> stored
> the value from fstat() again, we would have done all that work and
> then
> store bogus data again.
>
> I don't think we are being that bad, or we would have noticed earlier.
> But it is something we should be careful about trusting fstat versus
> lstat, and I haven't had a chance to audit the code.

Cygwin or win32?

-Rob

Revision history for this message
Martin Packman (gz) wrote :

Are you sure it's not ST_DEV that differs between stat and fstat? That's what it seems to be in my testing. That said, ST_INO has no meaning under windows, per the definition in the documentation:

<http://msdn.microsoft.com/en-us/library/14h5k7ff(VS.71).aspx>
"Number of the information node (the inode) for the file (UNIX-specific). On UNIX file systems, the inode describes the file date and time stamps, permissions, and content. When files are hard-linked to one another, they share the same inode. The inode, and therefore st_ino, has no meaning in the FAT, HPFS, or NTFS file systems."

I see John is fixing some tests involving this in ~jameinel/bzr/2.1.0b4-win32-get-file-with-stat - I had a different version in my nocpython branch for a bit but pulled it out as it wasn't nonc- specific.

=== modified file 'bzrlib/tests/__init__.py'
--- bzrlib/tests/__init__.py 2009-08-16 00:52:53 +0000
+++ bzrlib/tests/__init__.py 2009-08-16 15:11:33 +0000
@@ -985,12 +985,13 @@
         :raises AssertionError: If the expected and actual stat values differ
             other than by atime.
         """
- self.assertEqual(expected.st_size, actual.st_size)
- self.assertEqual(expected.st_mtime, actual.st_mtime)
- self.assertEqual(expected.st_ctime, actual.st_ctime)
- self.assertEqual(expected.st_dev, actual.st_dev)
- self.assertEqual(expected.st_ino, actual.st_ino)
- self.assertEqual(expected.st_mode, actual.st_mode)
+ self.assertEqual(expected.st_size, actual.st_size, "ST_SIZE differs")
+ self.assertEqual(expected.st_mtime, actual.st_mtime, "ST_MTIME differs")
+ self.assertEqual(expected.st_ctime, actual.st_ctime, "ST_CTIME differs")
+ if not osutils.is_windows:
+ self.assertEqual(expected.st_dev, actual.st_dev, "ST_DEV differs")
+ self.assertEqual(expected.st_ino, actual.st_ino, "ST_INO differs")
+ self.assertEqual(expected.st_mode, actual.st_mode, "ST_MODE differs")

     def assertLength(self, length, obj_with_len):
         """Assert that obj_with_len is of length length."""

I think a fix at this level is more correct.

Revision history for this message
Martin Packman (gz) wrote :

Ow, launchpad eats spaces (and doesn't have a preview option...), should have posted that diff hunk as an attachment. The point of it is, st_dev and st_ino are not checked for equality on windows. This may be what is wanted in other code expecting to compare stat results.

Revision history for this message
Martin Packman (gz) wrote :

Okay, it *is* ST_DEV and the reason for the variation is given in the MSDN _fstat documentation:

<http://msdn.microsoft.com/en-us/library/221w8e43(VS.71).aspx>
"If a device, fd; otherwise 0."

It's always 0 for a file, whereas for a stat of a filename it's {"A":0, "B":1, "C":2 ...}[driveletter]

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 478023] Re: os.lstat != os.fstat on Windows, possible dirstate caching issue

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin wrote:
> Okay, it *is* ST_DEV and the reason for the variation is given in the
> MSDN _fstat documentation:
>
> <http://msdn.microsoft.com/en-us/library/221w8e43(VS.71).aspx>
> "If a device, fd; otherwise 0."
>
> It's always 0 for a file, whereas for a stat of a filename it's {"A":0,
> "B":1, "C":2 ...}[driveletter]
>

>>> st = os.fstat(f.fileno())
>>> st
nt.stat_result(st_mode=33206, st_ino=5629499534396349L, st_dev=0,
st_nlink=1, st_uid=0, st_gid=0, st_size=5811L, st_atime=1257365979L,
st_mtime=1257365979L, st_ctime=1252003844L)

It is pretty clear that regardless what docs we find, python itself is
storing the value as "st_ino".

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkr7NDMACgkQJdeBCYSNAAO+jgCcDBDg940YPn+6VEwl0uK0SYyr
zJoAoKJvPP57PCWxVHl/v8JrtGAPzuHx
=OWuC
-----END PGP SIGNATURE-----

Revision history for this message
Martin Packman (gz) wrote :

Okay, here's why we're seeing different things, for 2.5 MvL did an end run around the CRT:

<http://svn.python.org/view/python/trunk/Modules/posixmodule.c?r1=42093&r2=42230>

In that implementation, os.fstat uses GetFileInformationByHandle and os.stat uses GetFileAttributesExW (or FindFirstFile in a later patch). For some reason the fstat path doesn't set ST_DEV from dwVolumeSerialNumber, but does set ST_INO and ST_NLINK which aren't available on the stat path.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 478023] [NEW] os.lstat != os.fstat on Windows, possible dirstate caching issue

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Sun, 2009-11-08 at 03:46 +0000, John A Meinel wrote:
>>
>> In running the test suite, I found that
>>
>> os.lstat('file')
>> versus
>> os.fstat(open('file', 'rb').fileno())
>>
>> gives slightly different results.
>>
>> Namely, the latter will have a value for 'st.st_ino' while the former
>> will have 0 stored.
>>
>> I haven't investigated, but potentially this is a performance issue
>> for
>> Windows and dirstate. If dirstate ends up caching the result of
>> fstat()
>> but then when iterating we compare that to lstat() we will think the
>> content has changed, and re-read and re-sha the file. If we then
>> stored
>> the value from fstat() again, we would have done all that work and
>> then
>> store bogus data again.
>>
>> I don't think we are being that bad, or we would have noticed earlier.
>> But it is something we should be careful about trusting fstat versus
>> lstat, and I haven't had a chance to audit the code.
>
> Cygwin or win32?
>
> -Rob
>

win32

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksBgM0ACgkQJdeBCYSNAANXHgCbBumkN/q1hg3+6u2syfZqMyC0
18IAn1ZKG8AUgxsWZO3zQyUpjLSDMIw4
=8kBk
-----END PGP SIGNATURE-----

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.