Comment 18 for bug 605543

Revision history for this message
Foppe Hemminga (foppe) wrote :

The 'latin1' part in my proposed code (/usr/lib/python2.6/dist-packages/gwibber/microblog/twitter.py line 64) should be 'utf8'

   m["text"] = unescape(data["text"].encode("utf8"))

The rationale is as follows: htmllib.HTMLParser from function unescape in /usr/lib/python2.6/dist-packages/gwibber/microblog/twitter.py line 48 assumes unicode strings and won't guess character encoding if they're not.
The Twitter API supports UTF-8 [1]. So if the text strings aren't manipulated along the way they still are in UTF-8.

[1] http://apiwiki.twitter.com/Things-Every-Developer-Should-Know#7Encodingaffectsstatuscharactercount