Comment 12 for bug 605543

Revision history for this message
Captain Chaos (launchpad-chaos) wrote :

The plot thickens. I've been looking at the code, and this happens while decoding an HTML entity tag (such as á). Apparently the problem is not that the contents of the tweet are being decoded with the wrong character encoding, but that the tweet contains entity tags, and Python's htmllib is failing at converting those.

The line that fails is:

self.savedata = self.savedata + data

Where savedata is the content of the tweet so far, and data is the character that corresponds to the entity tag, for instance an ë. I wonder why Python feels the need to perform a UTF-8 conversion to perform that concatenation. Any Python experts care to comment?