eyeD3 doesn't parse certain id3 tags

Bug #507132 reported by David Ashford on 2010-01-13
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
eyed3 (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: eyed3

Ubuntu release: 9.10 (Karmic)
Package: eyeD3 0.6.17-1

Recently I retagged all of my music with MusicBrainz's Picard. Where possible, Picard will include 'Album Artist', 'Album Artist Sort Order', 'Artist', and 'Artist Sort Order' in the id3 tags. I use eyeD3 in a bash script that parses each music file and subsequently collects certain information into a text file. I encountered problems with this when eyeD3 didn't parse particular files providing an error message along the lines of "'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)"

The only files eyeD3 produced this error with were files containing non-English characters, examples of some of the artists: múm, Sigur Rós, Stafrænn Hákon. Furthermore, when I examined it a bit more I found that if I removed the 'Album Artist Sort Order' tag and left all other tags intact then eyeD3 would parse all files correctly including the files that had failed before and even artists who were tagged entirely in a different script e.g. Пётр Ильич Чайковский

Steps to reproduce:
1) Add 'Album Artist Sort Order' tag (some taggers have a different name for this tag) to a music file and provide a value containing any non-English character e.g. æ, á
2) Scan the file with eyeD3 which should result in similar error mentioned above

ProblemType: Bug
Architecture: i386
CheckboxSubmission: 57367fa8b4b12332b40a8bb2eba1be33
CheckboxSystem: d00f84de8a555815fa1c4660280da308
Date: Wed Jan 13 18:26:17 2010
DistroRelease: Ubuntu 9.10
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release Candidate i386 (20091020.3)
Package: eyed3 0.6.17-1
PackageArchitecture: all
ProcEnviron:
 LANG=en_IE.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-17.54-generic
SourcePackage: eyed3
Uname: Linux 2.6.31-17-generic i686

David Ashford (ashford-david) wrote :
description: updated
Joe Bormolini (j-bormolini) wrote :

I had the same problem with special characters in a UserTextFrame (ALBUMARTISTSORT in my case). I verified that ENCODING is correctly inherited from my environment ("UTF-8"), yet for some reason the string convert() is still trying to use the ascii codec.

Inexplicably I fixed it by separating the printout of the lines into two calls to print instead of one with a "\n" in the middle. I would love it if someone explained why the heck it works!

I only tested UserTextFrame but I assume comments and lyrics would have the same problem so I changed those too.

Joe Bormolini (j-bormolini) wrote :

Oops I said "string convert()" but I meant "string encode()"

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in eyed3 (Ubuntu):
status: New → Confirmed
Guillaume Millet (guimillet) wrote :

The problem appears with UserTextFrame and, when the option --strict is on, it appears also with LyricsFrame and CommentFrame. I had a hard time to find the reason. Here are the explanations taking for example CommentFrame.
Actually, the error does not raise from encode() but decode() which seems to be called by sys.stdout.write (called by printMsg, function which I don't see the usefulness compared to print) in eyeD3, line 995:
    printMsg("%s: [Description: %s] [Lang: %s]\n%s" %\
                     (boldText("Comment"), cDesc, cLang,
                      cText.encode(ENCODING,"replace")));
with printMsg(s) = sys.stdout.write(s + '\n').

The problem is linked to cDesc. The strings cDesc and cText are set as Unicode strings in frames.py, line 1076:
    self.description = unicode(d, id3EncodingToString(self.encoding));
    self.comment = unicode(c, id3EncodingToString(self.encoding));
but then,
    if not strictID3():
        self.description = cleanNulls(self.description)
        self.comment = cleanNulls(self.comment)
with cleanNulls(s) = "/".join([x for x in s.split('\x00') if x]), which does not return a Unicode string. Therefore, with the option --strict, at the printing, cDesc is a Unicode string but cText.encode(ENCODING,"replace") is a byte string. A sample command showing the error is
    >>> print "%s %s" %(u'', (u'é').encode("utf-8","replace"))
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
whereas
    >>> print "%s %s" %('', (u'é').encode("utf-8","replace"))
     é
    >>> (u'é').encode("utf-8","replace") # returns a byte string
    '\xc3\xa9'

In Python 2.x (maybe different in 3.x with the new str type), if there is at least one Unicode string, the print formatting apparently tries to convert all the byte strings, if any, to Unicode with decode() which by default uses 'ascii' encoding, hence the UnicodeDecodeError.

I see two (explainable ;) ) ways out of the bug, either by modifying cleanNulls(s) to return a Unicode string (maybe contrary to the purpose of cleanNulls(s), I don't know), or by encoding cDesc at the printing with cDesc.encode(ENCODING,"replace"), which the attached patch accomplishes.

For UserTextFrame, the bug always appears because description is not processed through cleanNulls() whatever --strict, which seems to be another default compared to the behavior chosen for LyricsFrame and CommentFrame.

The attachment "fix-unexplained-ascii-codec-conversion.patch" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch

please try again with the last version that uses python3

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers