Comment 1 for bug 666763

Revision history for this message
Johannes Sasongko (sjohannes) wrote :

This is the other side of the problem I documented in metadata/_matroska.py:

    if sys.platform == 'win32' and '://' not in location:
        # XXX: This is most likely a bug in the Win32 GIO port; it converts
        # paths into UTF-8 and requires them to be specified in UTF-8 as well.
        # Here we decode the path according to the FS encoding to get the
        # Unicode representation first. If the path is in a different encoding,
        # this step will fail.
        location = location.decode(sys.getfilesystemencoding()).encode('utf-8')

(I'm not sure this is exclusive to win32, but I have no easy way to check other systems where the filesystem encoding is not UTF-8.)

Anyway, back to this particular case, we want to get the original URI/path but all we have is a URL that was pre-encoded with UTF-8. For example, a file named ú (FA in my encoding, C3 BA in UTF-8) will have %C3%BA in the URI, which is wrong. How did we get %C3%BA in the first place? I'm not sure at the moment; possibly from gio.File.enumerate_children.

The crazy thing is, this only affects interaction with non-GIO stuff (namely, Mutagen). GIO itself happily lives with the problem (for example, GStreamer plays the URI just fine), so I don't know if this is actually a bug or if it's intended.

I'm going to try working around the problem by re-encoding the URI/path with sys.getfilesystemencoding().