UnicodeDecodeError: problem with funny character in filename/track title

Bug #404444 reported by David Wagner
32
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Exaile
Fix Released
High
Unassigned

Bug Description

Exaile was unable to play a track with a non-7-bit ASCII character in its track title and its filename.

In particular, the tracks on the album are something like this:

    ...
    Iron Gods
    Ragnarök
    ...

I added the entire album to the playlist, and started listening to it. When "Iron Gods" finished playing, Exaile was unable to start playing "Ragnarök". All audio stopped, an error appeared in ~/.xsession-errors, and Exaile entered a funny state. In other words, Exaile seems to be unable to play the track "Ragnarök". Note that both the track title and the filename contain a special character, so I'm not sure which causes the problem.

Here's an excerpt of what I got in my ~/.xsession-errors:

INFO : Playing file:///home/daw/more/music/taverner/music/stormwarrior/heading_northe/iron_gods.mp3
Traceback (most recent call last):
  File "/usr/local/lib/exaile/xlgui/main.py", line 563, in <lambda>
    lambda *e: self.queue.next(),
  File "/usr/local/lib/exaile/xl/player/queue.py", line 89, in next
    self.player.play(track)
  File "/usr/local/lib/exaile/xl/player/engine_normal.py", line 192, in play
    uri = self._get_track_uri(track)
  File "/usr/local/lib/exaile/xl/player/engine_normal.py", line 157, in _get_track_uri
    path = common.local_file_from_url(uri).encode()
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 70-73: invalid data

The Python Traceback appeared as soon as it started trying to play "Ragnarök". The latter is located at the following file:

/home/daw/more/music/taverner/music/stormwarrior/heading_northe/Ragnar�k.mp3

The funny character in the file name is 0xF6:

echo ~/more/music/taverner/music/stormwarrior/heading_northe/Ragnar$'\366'k.mp3 | hexdump -C

00000000 2f 68 6f 6d 65 2f 64 61 77 2f 6d 6f 72 65 2f 6d |/home/daw/more/m|
00000010 75 73 69 63 2f 74 61 76 65 72 6e 65 72 2f 6d 75 |usic/taverner/mu|
00000020 73 69 63 2f 73 74 6f 72 6d 77 61 72 72 69 6f 72 |sic/stormwarrior|
00000030 2f 68 65 61 64 69 6e 67 5f 6e 6f 72 74 68 65 2f |/heading_northe/|
00000040 52 61 67 6e 61 72 f6 6b 2e 6d 70 33 0a |Ragnar.k.mp3.|
0000004d

I believe the ID3 tag in the file containing the title of the track also has a 0xF6 byte after "Ragnar" and before "k", just as in the filename. I show my detailed reasoning below, so that you can check my work. Let me know if you'd like me to upload the file, or try to come up with a minimized test case.

This is repeatable.

I'm using Exaile compiled from the latest bzr head (bzr version 2190), on Fedora 11 x86_64 with python-2.6-9.fc11.x86_64, in case that's relevant.

Here's some more info about that file:

$ mutagen-inspect ~/more/music/taverner/music/stormwarrior/heading_northe/Ragnar$'\366'k.mp3
-- /home/daw/more/music/taverner/music/stormwarrior/heading_northe/Ragnar�k.mp3
- MPEG 1 layer 3, 284239 bps, 44100 Hz, 255.00 seconds (audio/mp3)
TDRC=2008
TIT2=Ragnarök
TRCK=6
TPE1=Stormwarrior
TALB=Heading Northe
COMM=ID3v1 Comment='eng'=Created by Grip
TCON=Metal

$ mutagen-inspect ~/more/music/taverner/music/stormwarrior/heading_northe/Ragnar$'\366'k.mp3 | grep TIT2 | hexdump -C
00000000 54 49 54 32 3d 52 61 67 6e 61 72 c3 b6 6b 0a |TIT2=Ragnar..k.|
0000000f

$ mp3info ~/more/music/taverner/music/stormwarrior/heading_northe/Ragnar$'\366'k.mp3
File: /home/daw/more/music/taverner/music/stormwarrior/heading_northe/Ragnar�k.mp3
Title: Ragnar�k Track: 6
Artist: Stormwarrior
Album: Heading Northe Year: 2008
Comment: Created by Grip Genre: Metal [9]

$ mp3info ~/more/music/taverner/music/stormwarrior/heading_northe/Ragnar$'\366'k.mp3 | grep Title | hexdump -C
00000000 54 69 74 6c 65 3a 20 20 20 52 61 67 6e 61 72 f6 |Title: Ragnar.|
00000010 6b 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |k |
00000020 20 20 20 20 20 20 20 20 54 72 61 63 6b 3a 20 36 | Track: 6|
00000030 0a |.|
00000031

So in mutagen-inspect, the character after "Ragnar" and before "k" (the ö character, we hope) is showing up as 0xB6 0x6B, whereas in mp3info, that character is showing up as 0xF6.

Looking at the file with "view -b", I see the following contents:

... <aa><aa>TAGRagnar<f6>k^@^@ ...

and with "hexdump -C":

008a3ef0 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa |................|
*
008a3f20 aa aa aa aa aa aa aa aa aa aa aa 54 41 47 52 61 |...........TAGRa|
008a3f30 67 6e 61 72 f6 6b 00 00 00 00 00 00 00 00 00 00 |gnar.k..........|
008a3f40 00 00 00 00 00 00 00 00 00 00 00 00 53 74 6f 72 |............Stor|
008a3f50 6d 77 61 72 72 69 6f 72 00 00 00 00 00 00 00 00 |mwarrior........|
008a3f60 00 00 00 00 00 00 00 00 00 00 48 65 61 64 69 6e |..........Headin|
008a3f70 67 20 4e 6f 72 74 68 65 00 00 00 00 00 00 00 00 |g Northe........|
008a3f80 00 00 00 00 00 00 00 00 32 30 30 38 43 72 65 61 |........2008Crea|
008a3f90 74 65 64 20 62 79 20 47 72 69 70 00 00 00 00 00 |ted by Grip.....|
008a3fa0 00 00 00 00 00 00 00 00 00 06 09 |...........|
008a3fab

Revision history for this message
Adam Olsen (arolsen) wrote :

Do you know what encoding the filename is in? Exaile is complaining that it can't convert it to UTF-8, which is your default filesystem encoding.

The problem here is that the filename itself is not in UTF-8, but there is nowhere to tell it what the encoding actually is so that it can "do the right thing".

Revision history for this message
David Wagner (daw-bugzilla) wrote :

Good question! I'm a bit ignorant about encodings/charsets. Can you give me any idea how I would determine what encoding the filename is in? The raw bytes of the filename are listed above; to repeat, in hex they are ... 6e 61 72 f6 6b ... (...nar�k...), so the funny character is represented as the one-byte sequence 0xF6.

Revision history for this message
reacocard (reacocard) wrote :

>Good question! I'm a bit ignorant about encodings/charsets. Can you give me
>any idea how I would determine what encoding the filename is in? The raw bytes
>of the filename are listed above; to repeat, in hex they are ... 6e 61 72 f6 6b ...
>(...nar�k...), so the funny character is represented as the one-byte sequence 0xF6.

It would appear to be a latin-1 encoding, 0xF6 is ö (o with umlaut) in latin-1.

Changed in exaile:
importance: Undecided → Medium
milestone: none → 0.3.0
status: New → Confirmed
Revision history for this message
Steve Dodier-Lazaro (sidi) wrote :

We need to identify the consequences of such files on library and playback, and to make sure Exaile is robust to such errors.

Changed in exaile:
assignee: nobody → Exaile Bug Day Events (exaile-bugday)
Revision history for this message
Johannes Sasongko (sjohannes) wrote :

Committed code that should allow such file to play. Tags are still not displayed, though.

Revision history for this message
David Wagner (daw-bugzilla) wrote :

I'm afraid it doesn't seem to work for me. With the latest revision from bzr (r2288), I still can't play the track that caused the problems listed above, and I get this traceback instead:

Traceback (most recent call last):
  File "/usr/local/lib/exaile/xl/player/engine_normal.py", line 84, in on_message
    self.eof_func()
  File "/usr/local/lib/exaile/xl/player/engine_normal.py", line 68, in eof_func
    self._queue.next()
  File "/usr/local/lib/exaile/xl/player/queue.py", line 88, in next
    self.player.play(track)
  File "/usr/local/lib/exaile/xl/player/engine_normal.py", line 192, in play
    uri = self._get_track_uri(track)
  File "/usr/local/lib/exaile/xl/player/engine_normal.py", line 157, in _get_track_uri
    path = common.local_file_from_url(uri).encode()
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 70-73: invalid data

Revision history for this message
Skye (sberghel+launchpad) wrote :

I am also having this problem (using the version in portage in Gentoo). The file in question is called "Déjà vu". A copy of the file with the name "Deja vu" instead works. I'm getting a slightly different traceback though:

Traceback (most recent call last):
  File "/usr/lib64/exaile/xlgui/menu.py", line 204, in <lambda>
    self.on_append_items(), 'gtk-add')
  File "/usr/lib64/exaile/xlgui/menu.py", line 219, in on_append_items
    selected = self.widget.get_selected_tracks()
  File "/usr/lib64/exaile/xlgui/panel/files.py", line 304, in get_selected_tracks
    self.append_recursive(tracks, value)
  File "/usr/lib64/exaile/xlgui/panel/files.py", line 320, in append_recursive
    if os.path.isdir(value):
  File "/usr/lib64/python2.6/genericpath.py", line 41, in isdir
    st = os.stat(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 119: ordinal not in range(128)

Revision history for this message
reacocard (reacocard) wrote :

moving to gio.File might fix this, so I'm appointing this for 0.3.1 since we'll begin our migration to gio then.

Changed in exaile:
assignee: Exaile Bug Day Events (exaile-bugday) → nobody
importance: Medium → High
milestone: 0.3.0 → 0.3.1
Revision history for this message
reacocard (reacocard) wrote :

We now use gio for scanning in trunk, and initial testing shows good results on odd pathname encodings. If you could try out a version of the latest bzr and confirm that it fixes the issue, that would be great. (note that you should back up your ~/.local/share/exaile/music.db first, otherwise you wont be able to return to stable.)

Revision history for this message
David Wagner (daw-bugzilla) wrote :

Cool. This bug is fixed for me in the latest bzr head (revision 2486). The song Ragnarök now plays fine. Thanks!

Changed in exaile:
status: Confirmed → Fix Committed
reacocard (reacocard)
Changed in exaile:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.