python-twitter should count characters not bytes.

Bug #337671 reported by Rory McCann
6
Affects Status Importance Assigned to Milestone
python-twitter (Debian)
New
Undecided
Unassigned
python-twitter (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

I wanted to post this as a tweet "Had lighttpd installed ∴ it was running ∴ apache couldn't restart ∴ roundcube couln't be upgraded ∴ software upgrade keep complaining", notice the ∴ unicode characters.

The tweet command complained that tweet must be under 140 characters. However that *is* under 140 characters. However it is not under 140 bytes. Those unicode ∴'s are 1 character but are not 1 byte.

echo -n "Had lighttpd installed ∴ it was running ∴ apache couldn't restart ∴ roundcube couln't be upgraded ∴ software upgrade keep complaining" | wc -m
133

echo -n "Had lighttpd installed ∴ it was running ∴ apache couldn't restart ∴ roundcube couln't be upgraded ∴ software upgrade keep complaining" | wc -c
141

As proof, the twitter web interface does support this as a tweet:
http://twitter.com/lalonde/status/1278018325

The error message I was getting was:

 tweet "Had lighttpd installed ∴ it was running ∴ apache couldn't restart ∴ roundcube couln't be upgraded ∴ software upgrade keep complaining"
Traceback (most recent call last):
  File "/usr/bin/tweet", line 116, in <module>
    main()
  File "/usr/bin/tweet", line 108, in main
    status = api.PostUpdate(message)
  File "/var/lib/python-support/python2.5/twitter.py", line 1049, in PostUpdate
    raise TwitterError("Text must be less than or equal to 140 characters.")
twitter.TwitterError: Text must be less than or equal to 140 characters.

Revision history for this message
Michael McKinley (m-mckinley) wrote :

The problem seems to be deeper than that. In Jaunty (python-twitter 0.5), it balks at any non-ASCII characters.

Changed in python-twitter:
status: New → Confirmed
Revision history for this message
Michael McKinley (m-mckinley) wrote :

Found the bug. examples/tweet.py wasn't decoding the message to be posted as utf-8. This patch should fix it.

Revision history for this message
Rory McCann (rorymcc) wrote :

Interesting. However do we know for sure that people will be using UTF-8? Is there some sort of 'Right Way' to detect character encoding?

Having said that UTF-8 is probably a much better default than ASCII.

Revision history for this message
Michael McKinley (m-mckinley) wrote :

Since tweet.py just contains the code for handling the tweet command (all the logic of posting to twitter is handled by twitter.py), I think it's safe to assume that all input to it will be in UTF-8. Somebody more knowledgeable should definitely confirm this.

Revision history for this message
Andreas Moog (ampelbein) wrote :

I'm going to mark this Fix Released for Ubuntu. python-twitter 0.6 has the "--encoding" option to specify the encoding of the message. A quick test showed it's working as expected now.

Changed in python-twitter (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.