Comment 1 for bug 1416339

Revision history for this message
scoder (scoder) wrote :

My guess is that it's a Py2.x problem. IIRC, repr() is expected to return a byte string in Py2.x, and lxml returns a unicode string. Python then fails to encode it to a byte string. So the error happens outside of lxml, inside of Python. This has been fixed in Python 3.x, which properly supports (and in fact requires) a unicode text string as result of repr().

Given that the tag name may not be representable with an ASCII encoded byte string (and clearly is not in this case), there isn't really a correct way to do this. I mean, lxml could return something like "unprintable tag name" for non-ascii tag names in repr(), but that wouldn't really be satisfactory... Although it would still be better than letting Python raise an exception.