Activity log for bug #135985

Date Who What changed Old value New value Message
2007-08-30 19:55:59 Adam Olsen bug added bug
2007-08-31 14:19:36 Adam Olsen exaile: importance Undecided High
2007-08-31 14:19:36 Adam Olsen exaile: status New Confirmed
2007-09-21 13:05:05 Mathias Brodala description I see that there have been a number of bugs opened and fixed recently that deal with character encoding ([ticket:90], [ticket:165], [ticket:247], [ticket:292]). It's really a pain for a variety of historical reasons, but Exaile is going to have to deal with it. I'm going to try to explain the problems as I see them, and present the design for a solution. It's going to be more complicated than you want... The two major problems are with '''MP3 tags''' and with '''filenames'''. First, filenames. UNIX never had any encoding definition for filenames. The only restriction was that the name couldn't contain any slash (/). But it meant that for people who wanted to save files with non-English names had to make something up. It's a long story, but fairly recently the GNOME/gtk++ folks decided to assume UTF-8 and allow users to override this if needed. Check out the [http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html GLib docs] for a little explanation. Look especially at the '''Checklist for Application Writers''' section. To handle '''filename encoding''' robustly, Exaile will have to allow for a per-file option for which encoding to use. With nothing set, use UTF-8, but would be possible for someone to have a French song in Latin-1 and a Polish song in Latin-2. It might be nice to have a per-directory setting. But then again, these files are non-standard, so perhaps just an easy way to set the filename encoding for a lot of files at once. Yes, both filenames should be in UTF-8 (and a conversion option would be nice), but Exaile should be able to open the file no matter what. A simpler option would be to not care, but I think there are problems with SQLite and non-UTF8 strings. Something else to consider is external album art. For '''media tags''', the problem is more complicated. For id3v1 and v1.1, the tags are "supposed" to be in ISO-8859-1 (Latin-1), but they are often not. id3v2.0 v2.1, v2.2 and v2.3 "should" be in ISO-8859-1, while the uncommon id3v2.4 should be in UTF8. But lots of these text strings are not encoded correctly. Check out [http://en.wikipedia.org/wiki/ID3 the Wikipedia entry] for more. One option would be to disallow these files, but that wouldn't be very nice. The other would be to have a field for each track like above, but this would have to be distinct from the filename field above! It would be likely that someone has renamed files to comply with GNOME standards, but leave the contents of the file alone. APEv1 tags are rare (and ASCII only) and APEv2 and Vorbis comments (for ogg, flac and Speex) are all UTF-8 all the time. WMA and ACC have their own tagging standard (See the [http://www.id3.org/FAQ id3 FAQ]). For '''radio streams''' the filename option is unneeded, but the tag option should be kept. So, in short: ||Type||Default Encoding||Overridable?|| ||audio filenames||UTF-8||per file, per directory(?)|| ||coverart filenames||UTF-8||per file, per directory(?)|| ||mp3 id3v1||ISO-8859-1||per track, per directory, global for tag type(?)|| ||mp3 id3v1.1||ISO-8859-1||per track, per directory, global for tag type(?)|| ||mp3 id3v2||ISO-8859-1||per track, per directory, global for tag type(?)|| ||mp3 id3v2.3||ISO-8859-1||per track, per directory, global for tag type(?)|| ||mp3 id3v2.4||UTF-8||per track, per directory, global for tag type(?)|| ||APE & Vorbis||UTF-8||per track?|| ||WMA||Unknown||per track?|| ||AAC||Unknown||per track?|| ||Radio Streams||From above tag type||per track, global for tag type(?)|| Yes, this is a pain. But either you have to make it easy for users to use their existing data, or make it easy to change it. I hope it's not too discouraging! This ticket was migrated from the old trac: re #293 I see that there have been a number of bugs opened and fixed recently that deal with character encoding ([ticket:90], [ticket:165], [ticket:247], [ticket:292]). It's really a pain for a variety of historical reasons, but Exaile is going to have to deal with it. I'm going to try to explain the problems as I see them, and present the design for a solution. It's going to be more complicated than you want... The two major problems are with '''MP3 tags''' and with '''filenames'''. First, filenames. UNIX never had any encoding definition for filenames. The only restriction was that the name couldn't contain any slash (/). But it meant that for people who wanted to save files with non-English names had to make something up. It's a long story, but fairly recently the GNOME/gtk++ folks decided to assume UTF-8 and allow users to override this if needed. Check out the [http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html GLib docs] for a little explanation. Look especially at the '''Checklist for Application Writers''' section. To handle '''filename encoding''' robustly, Exaile will have to allow for a per-file option for which encoding to use. With nothing set, use UTF-8, but would be possible for someone to have a French song in Latin-1 and a Polish song in Latin-2. It might be nice to have a per-directory setting. But then again, these files are non-standard, so perhaps just an easy way to set the filename encoding for a lot of files at once. Yes, both filenames should be in UTF-8 (and a conversion option would be nice), but Exaile should be able to open the file no matter what. A simpler option would be to not care, but I think there are problems with SQLite and non-UTF8 strings. Something else to consider is external album art. For '''media tags''', the problem is more complicated. For id3v1 and v1.1, the tags are "supposed" to be in ISO-8859-1 (Latin-1), but they are often not. id3v2.0 v2.1, v2.2 and v2.3 "should" be in ISO-8859-1, while the uncommon id3v2.4 should be in UTF8. But lots of these text strings are not encoded correctly. Check out [http://en.wikipedia.org/wiki/ID3 the Wikipedia entry] for more. One option would be to disallow these files, but that wouldn't be very nice. The other would be to have a field for each track like above, but this would have to be distinct from the filename field above! It would be likely that someone has renamed files to comply with GNOME standards, but leave the contents of the file alone. APEv1 tags are rare (and ASCII only) and APEv2 and Vorbis comments (for ogg, flac and Speex) are all UTF-8 all the time. WMA and ACC have their own tagging standard (See the [http://www.id3.org/FAQ id3 FAQ]). For '''radio streams''' the filename option is unneeded, but the tag option should be kept. So, in short: || Type || Default Encoding || Overridable? || || audio filenames || UTF-8 || per file, per directory(?) || || coverart filenames || UTF-8 || per file, per directory(?) || || mp3 id3v1 || ISO-8859-1 || per track, per directory, global for tag type(?) || || mp3 id3v1.1 || ISO-8859-1 || per track, per directory, global for tag type(?) || || mp3 id3v2 || ISO-8859-1 || per track, per directory, global for tag type(?) || || mp3 id3v2.3 || ISO-8859-1 || per track, per directory, global for tag type(?) || || mp3 id3v2.4 || UTF-8 || per track, per directory, global for tag type(?) || || APE & Vorbis || UTF-8 || per track? || || WMA || Unknown || per track? || || AAC || Unknown || per track? || || Radio Streams || From above tag type|| per track, global for tag type(?) || Yes, this is a pain. But either you have to make it easy for users to use their existing data, or make it easy to change it. I hope it's not too discouraging! This ticket was migrated from the old trac: re #293
2009-10-09 11:44:22 Jiahua Huang attachment added converts ID3 tags from legacy encodings to Unicode http://launchpadlibrarian.net/33352388/legacy-encodings-support-exaile-r2551.diff
2009-10-09 12:01:01 Jiahua Huang attachment added converts ID3 tags from legacy encodings to Unicode http://launchpadlibrarian.net/33352998/legacy-encodings-support-exaile-r2551.diff
2010-01-09 18:04:43 Aron Xu attachment added legacy-encodings-support.patch http://launchpadlibrarian.net/37650459/legacy-encodings-support.patch
2010-01-09 18:14:12 Aron Xu branch linked lp:exaile
2010-01-09 19:04:45 reacocard branch unlinked lp:exaile
2010-01-10 01:57:06 Aron Xu attachment added screenshots.tar.gz http://launchpadlibrarian.net/37664544/screenshots.tar.gz
2011-02-07 05:59:07 muzuiget attachment added exaile_tag_encode.patch https://bugs.launchpad.net/exaile/+bug/135985/+attachment/1835177/+files/exaile_tag_encode.patch
2013-02-06 07:29:29 muzuiget attachment added Screenshot from 2013-02-06 15:15:32.png https://bugs.launchpad.net/exaile/+bug/135985/+attachment/3516706/+files/Screenshot%20from%202013-02-06%2015%3A15%3A32.png
2013-02-07 05:22:28 Chen Tao bug added subscriber Chen Tao
2013-02-11 21:27:10 Chen Tao removed subscriber Chen Tao