Ubuntu One should normalize names

Bug #692241 reported by Denis Moyogo Jacquerye on 2010-12-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu One Client
Medium
Ubuntu One Foundations+ team

Bug Description

Currently, if a user creates or add a file/directory that has the name of one already present, there's a name conflict.
However there is no conflict between canonically equivalent names, when there should be as well.

For example, a user can have a file "é" <U+00E9> and create or add another with the name "é" <U+0065 U+0301> without a name conflict. This is wrong. Both strings are equivalent and should be conflicting.

See Unicode Normalization Forms http://unicode.org/reports/tr15/
The W3C recommends using NFC forms http://www.w3.org/International/questions/qa-html-css-normalization
On Mac OS, HFS normalizes names with NFD.
Most keyboard layouts already use NFC by default, but that's not true for all keyboard layouts.

Since not everything out there is in one normalization form, Ubuntu One should normalize names to one form before comparison and for storage.

affects: bindwood → ubuntuone-storage-protocol
Chad Miller (cmiller) on 2010-12-20
Changed in ubuntuone-storage-protocol:
status: New → Confirmed
importance: Undecided → Medium
Chad Miller (cmiller) wrote :

I suspect, but do not know, that we decode filenames to Unicode only for ordering them, and we accept whatever is stored on in the filesystem as a collision or not. That is to say, we do not originate many files in the storage system. If there's a filename on disk with a unnormalized name, then we'd interpret and transmit it as-is.

The only alternative, I think, is to either 1) keep track of the local name (and silently reinterpret outgoing and incoming names) or 2) rename the local file out from under the user when it's discovered. Number 1 will present a problem if the user creates more filenames that map to the same normalized form. Number 2 is horrendous. I don't think either of these alternatives are good.

I suspect this should only be addressed by us in the place we do originate files: The web interface ("ubuntuone-servers" project). That should avoid creating new entries that are not normalized.

This bug should be fixed in the filesystem implementations or the VFS to get it done right. I hesitate to link it to the kernel project yet, though, and I input from others.

Denis Moyogo Jacquerye (moyogo) wrote :

Chad: Unfortunately this is not a kernel or file system issue, see https://bugzilla.kernel.org/show_bug.cgi?id=8289
It's up to libraries and apps to do Unicode normalization.

Chad Miller (cmiller) on 2010-12-20
Changed in ubuntuone-storage-protocol:
assignee: nobody → Facundo Batista (facundo)
tags: added: chicharra chicharra-natty foundations+
Changed in ubuntuone-storage-protocol:
assignee: Facundo Batista (facundo) → Ubuntu One Foundations+ team (ubuntuone-foundations+)
Facundo Batista (facundo) wrote :

Put it back in "New", as we still didn't take the decision of actually doing this.

Changed in ubuntuone-storage-protocol:
status: Confirmed → New
affects: ubuntuone-storage-protocol → ubuntuone-client
tags: added: chicharra-oneiric
removed: chicharra-natty
Rick McBride (rmcbride) on 2012-02-06
Changed in ubuntuone-client:
status: New → Confirmed
Changed in ubuntuone-client:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.