bzr add allows versioning of same files twice due to case-insensitive HFS+ filesystem on mac

Bug #120542 reported by Stuart Colville on 2007-06-15
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
High
Unassigned
Breezy
Medium
Unassigned

Bug Description

HFS+ on macs is case insensitve therefore it is possible for someone to run "bzr add foo" when Foo is already versioned. As "foo" is reported as a valid file bzr tries to version this also. The problem occurs when someone tries to update as the file-system can't tell the difference between the foo + Foo as it treats them the same.

This manifests itself as the following error when running "bzr update":
bzr: ERROR: [Errno 66] Directory not empty

Proposed solution would be for bzr to do a case-insensitive check on what is versioned before allowing a file to be added. This could be applied only for macs OR better still available in the configuration and "on" by default for macs. This would be just in case other mac developers are using bzr on a HFS+ partition with case-sensitivity turned on.

Andrew Bennetts (spiv) wrote :

We could perhaps try to autodetect if a HFS+ partition is case-insensitive by calling getattrlist(2) and looking for ATTR_CMN_NAME, as suggested by Martin v. Löwis at http://www.thescripts.com/forum/thread441007.html. It's not wrapped, but we could probably use ctypes for it if it's available.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> We could perhaps try to autodetect if a HFS+ partition is case-
> insensitive by calling getattrlist(2) and looking for ATTR_CMN_NAME, as
> suggested by Martin v. Löwis at
> http://www.thescripts.com/forum/thread441007.html. It's not wrapped,
> but we could probably use ctypes for it if it's available.

For fixing merge (and TreeTransform generally), we just need to adjust
the duplicate-file detection to optionally do case-insensitive comparisons.

For prevention, I'm not sure what the best approach is.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGcsXB0F+nu1YWqI0RAsUGAJ0bj2/xNGNlcAvTLlWokRb78UWzwACfdC/m
A3NZrCZ0QXn/XdLo5482ilQ=
=f6LH
-----END PGP SIGNATURE-----

Matthew Fuller (fullermd) wrote :

We could store device/inode for the files along with the other stat info, and catch attempted dupes that way. Of course, that also wouldn't let us version more than one link to the same file, and Murphy dictates somebody will want to...

Does add another couple hunks of info to the struct that we'd only need for this one case, though. Maybe it's cheap enough to generate and check on the fly for case-differing-only files at 'add' time?

Vincent Ladeuil (vila) wrote :

Can we agree to call HFS+ either case-sensitive (when the option is activated) or *case-preserving* in its default configuration ?

case-insensitive is when you create foo.txt and 'dir' will display 'FOO.TXT'.

case-preserving is when you do 'echo Bar >Foo', ls lists 'Foo' but 'cat fOO' displays 'Bar'

i.e. whatever case you use at creation is preserved but from that point you can refer to the file in any case you want.

And don't forget that getattrlist(2) is BSD specific, so it's available under OS X but will not solve the problem for OS X filesystems mounted under linux or windows via NFS.

Vincent Ladeuil (vila) wrote :

<complement to previous comment>

Said otherwise: the difference between case-insensitive and case-preserving is that if you copy a tree from a case-sensitive fs to a case-insensitive fs and back, the result is a mess. If you copy a tree from a case-sensitive fs to a case-preserving fs and back, the result is the same tree (iff no files differ in case only of course).

Aaron Bentley (abentley) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

vila wrote:
> Can we agree to call HFS+ either case-sensitive (when the option is
> activated) or *case-preserving* in its default configuration ?

I find that isn't a helpful description, because it doesn't make the
defect clear enough. The problem isn't that HFS+ preserves case, it's
that it is *insensitive to case* in important areas.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGdVSE0F+nu1YWqI0RApegAJ9OMtOv7u3VTln6ga2+LZWHXqITbQCfZBhz
X9+j205naSbAuUyRkhacsf8=
=ctos
-----END PGP SIGNATURE-----

Hi all,

If I could fix I would this is just inportant to me. The thing is the fix
I will keep trying to see if I can fix what is wrong. I no that everyone
of you have being working to help me and many more an they will keep coming.

Best Regards.

flintt
On 6/16/07, vila <email address hidden> wrote:
>
> <complement to previous comment>
>
> Said otherwise: the difference between case-insensitive and case-
> preserving is that if you copy a tree from a case-sensitive fs to a
> case-insensitive fs and back, the result is a mess. If you copy a tree
> from a case-sensitive fs to a case-preserving fs and back, the result is
> the same tree (iff no files differ in case only of course).
>
> --
> bzr add allows versioning of same files twice due to case-insensitive HFS+
> filesystem on mac
> https://bugs.launchpad.net/bugs/120542
> You received this bug notification because you are a member of Bazaar
> Developers, which is the registrant for Bazaar.
>

--
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQFFkcN7yXWcajQQndYRAgbqAKCMyXN9Jx4g0X7jocg+aUSFz0x4LwCgrURW
eGtqLjpzQVYa9+gzpCRtB84=
=zrpM
-----END PGP SIGNATURE-----

Also to note the same problems are caused if a user renames a directory to the same name with a case change. Presumably this could be avoided with bzr move.

In terms of fixing this I'm thinking there needs to be a configurable switch to turn on the fix. In a Linux dev environment you may want to keep the ability to have Foo and foo in a branch. However with mixed developer OS's you want bazaar to always check directories/files don't already exist.

Just out of interest is anyone working on a fix for this already?

John A Meinel (jameinel) wrote :

I'm pretty sure there are duplicates out there, but this is still an important issue.
I haven't found a way on case-preserving but case-insensitive filesystems to find out the real filename. 'stat(foo)' doesn't give back any indication of the real filename.
On Windows, you could use the FindFiles api. Something like:
>>> import win32file
>>> win32file.FindFilesW('BZR')
[(32, <PyTime:1/16/2008 19:09:47>, <PyTime:8/12/2008 20:25:35>, <PyTime:8/12/2008 20:25:35>, 0L, 5418L, 0L, 0L, u'bzr', u'')]

*note that all the way down at the end is u'bzr' which is the win32 unicode filename for the file that matches 'BZR'.

I would guess that OS X would have a similar function, but someone would need to track it down.

At that point, I would just use a function like this to validate all user's input names. So when they do "bzr add foo" we just translate it to Foo. (Everything else added will be found using os.listdir(), which gives us the proper case for all files.)

So in 'bzrlib/mutabletree.py' in the 'smart_add' function:
# validate user file paths and convert all paths to tree
# relative : it's cheaper to make a tree relative path an abspath
# than to convert an abspath to tree relative.
for filepath in file_list:
    rf = _FastPath(self.relpath(filepath))

Just add another entry for:
filepath = osutils.proper_cased_path(filepath)

And then on win32 it uses FindFiles (either via ctypes or via pywin32) and on Mac it uses the appropriate function, and on Linux, it can just be a no-op function.

Changed in bzr:
importance: Undecided → High
status: New → Triaged
Per Johansson (per.j) wrote :

I believe the libc function (for OS X) you're looking for is realpath(3), though it returns the full path beginning with /.

2009/12/31 Per Johansson <email address hidden>:
> I believe the libc function (for OS X) you're looking for is
> realpath(3), though it returns the full path beginning with /.

That could be really useful. Does it in fact do this normalization,
or does it just dereference symbolic links? The manpage
<http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man3/realpath.3.html>
seems to imply the latter, but it might just be out of date.

--
Martin <http://launchpad.net/~mbp/>

Martin Pool (mbp) wrote :

... according to
<http://qt.gitorious.org/qt/qt/commit/6135da7c97830ea46ca807b7cf5944dc74fdb960?diffmode=sidebyside>
it does handle case. If it also did unicode normalization that would
be really nice.

--
Martin <http://launchpad.net/~mbp/>

Per Johansson (per.j) wrote :

It handles case and also makes sure to return the file as UTF-8 NFD (as they are stored on disk), according to my testing. I don't really know how to access it from python though, but I guess some of you guys do (os.path.realpath does not use this function).

The limitation is that it only works on files that actually exist.

Martin Pool (mbp) on 2010-03-17
Changed in bzr:
status: Triaged → Confirmed
Martin Pool (mbp) on 2010-03-17
tags: added: case-sensitivity
Michael J. Vinca (michaelj) wrote :

We have had a similar problem in Windows where MAIN.C becomes main.c, but is the same file to Windows because Windows is currently setup case-insensitive. Looking at the ideas above, it seems most ideas are platform specific. I would suggest that Bazaar more generically checks to see if the underlying file system is currently case sensitive and behaves accordingly. (If the repo had two files that would become the same, I guess you'd have to broadcast an error.)

Samuel Bronson (naesten) on 2012-06-18
tags: added: mac
Jelmer Vernooij (jelmer) on 2017-11-08
tags: added: check-for-breezy
Jelmer Vernooij (jelmer) on 2017-11-11
tags: removed: check-for-breezy
Jelmer Vernooij (jelmer) on 2017-11-11
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers