UnicodeDecodeError: problem with funny character in filename/track title
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Exaile |
Fix Released
|
High
|
Unassigned |
Bug Description
Exaile was unable to play a track with a non-7-bit ASCII character in its track title and its filename.
In particular, the tracks on the album are something like this:
...
Iron Gods
Ragnarök
...
I added the entire album to the playlist, and started listening to it. When "Iron Gods" finished playing, Exaile was unable to start playing "Ragnarök". All audio stopped, an error appeared in ~/.xsession-errors, and Exaile entered a funny state. In other words, Exaile seems to be unable to play the track "Ragnarök". Note that both the track title and the filename contain a special character, so I'm not sure which causes the problem.
Here's an excerpt of what I got in my ~/.xsession-errors:
INFO : Playing file://
Traceback (most recent call last):
File "/usr/local/
lambda *e: self.queue.next(),
File "/usr/local/
self.
File "/usr/local/
uri = self._get_
File "/usr/local/
path = common.
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 70-73: invalid data
The Python Traceback appeared as soon as it started trying to play "Ragnarök". The latter is located at the following file:
/home/daw/
The funny character in the file name is 0xF6:
echo ~/more/
00000000 2f 68 6f 6d 65 2f 64 61 77 2f 6d 6f 72 65 2f 6d |/home/daw/more/m|
00000010 75 73 69 63 2f 74 61 76 65 72 6e 65 72 2f 6d 75 |usic/taverner/mu|
00000020 73 69 63 2f 73 74 6f 72 6d 77 61 72 72 69 6f 72 |sic/stormwarrior|
00000030 2f 68 65 61 64 69 6e 67 5f 6e 6f 72 74 68 65 2f |/heading_northe/|
00000040 52 61 67 6e 61 72 f6 6b 2e 6d 70 33 0a |Ragnar.k.mp3.|
0000004d
I believe the ID3 tag in the file containing the title of the track also has a 0xF6 byte after "Ragnar" and before "k", just as in the filename. I show my detailed reasoning below, so that you can check my work. Let me know if you'd like me to upload the file, or try to come up with a minimized test case.
This is repeatable.
I'm using Exaile compiled from the latest bzr head (bzr version 2190), on Fedora 11 x86_64 with python-
Here's some more info about that file:
$ mutagen-inspect ~/more/
-- /home/daw/
- MPEG 1 layer 3, 284239 bps, 44100 Hz, 255.00 seconds (audio/mp3)
TDRC=2008
TIT2=Ragnarök
TRCK=6
TPE1=Stormwarrior
TALB=Heading Northe
COMM=ID3v1 Comment=
TCON=Metal
$ mutagen-inspect ~/more/
00000000 54 49 54 32 3d 52 61 67 6e 61 72 c3 b6 6b 0a |TIT2=Ragnar..k.|
0000000f
$ mp3info ~/more/
File: /home/daw/
Title: Ragnar�k Track: 6
Artist: Stormwarrior
Album: Heading Northe Year: 2008
Comment: Created by Grip Genre: Metal [9]
$ mp3info ~/more/
00000000 54 69 74 6c 65 3a 20 20 20 52 61 67 6e 61 72 f6 |Title: Ragnar.|
00000010 6b 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |k |
00000020 20 20 20 20 20 20 20 20 54 72 61 63 6b 3a 20 36 | Track: 6|
00000030 0a |.|
00000031
So in mutagen-inspect, the character after "Ragnar" and before "k" (the ö character, we hope) is showing up as 0xB6 0x6B, whereas in mp3info, that character is showing up as 0xF6.
Looking at the file with "view -b", I see the following contents:
... <aa><aa>
and with "hexdump -C":
008a3ef0 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa |................|
*
008a3f20 aa aa aa aa aa aa aa aa aa aa aa 54 41 47 52 61 |...........TAGRa|
008a3f30 67 6e 61 72 f6 6b 00 00 00 00 00 00 00 00 00 00 |gnar.k..........|
008a3f40 00 00 00 00 00 00 00 00 00 00 00 00 53 74 6f 72 |............Stor|
008a3f50 6d 77 61 72 72 69 6f 72 00 00 00 00 00 00 00 00 |mwarrior........|
008a3f60 00 00 00 00 00 00 00 00 00 00 48 65 61 64 69 6e |..........Headin|
008a3f70 67 20 4e 6f 72 74 68 65 00 00 00 00 00 00 00 00 |g Northe........|
008a3f80 00 00 00 00 00 00 00 00 32 30 30 38 43 72 65 61 |........2008Crea|
008a3f90 74 65 64 20 62 79 20 47 72 69 70 00 00 00 00 00 |ted by Grip.....|
008a3fa0 00 00 00 00 00 00 00 00 00 06 09 |...........|
008a3fab
Changed in exaile: | |
status: | Fix Committed → Fix Released |
Do you know what encoding the filename is in? Exaile is complaining that it can't convert it to UTF-8, which is your default filesystem encoding.
The problem here is that the filename itself is not in UTF-8, but there is nowhere to tell it what the encoding actually is so that it can "do the right thing".