Ubuntu
feedparser package

[patch] Python-feedparser does not parse http://www.democracynow.org/podcast.xml correctly

Bug #179208 reported by Thomas Perl on 2007-12-29

Affects		Status	Importance	Assigned to	Milestone
	feedparser (Ubuntu)	Fix Released	Medium	Emmet Hikory

Bug Description

This patch fixes two issues in upstream's bug tracker:
http://code.google.com/p/feedparser/issues/detail?id=28 and
http://code.google.com/p/feedparser/issues/detail?id=80

This feed doesn't get parsed correctly: http://www.democracynow.org/podcast.xml
What doesn't work: The titles for all Thursday episodes are wrong

You can try to parse it right away - the feedparser will not display the title of feeds that contain the word "Thursday". Looking into feedparser's code and the RSS file, I see that the feed has type="plain" and mapContentType() doesn't map this one currectly, so I've added a mapping for "plain" to "text/plain".

The problem is that it doesn't happen for other feeds because the base64 decoder doesn't produce valid results for a normal string that isn't base64 encoded (i.e. raises binascii.Error), but does so when the string contains "Thursday":

Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import base64
>>> base64.decodestring('Feedparser does not work')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/base64.py", line 321, in decodestring
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
>>> base64.decodestring('Thursday')
'N\x1b\xab\xb1\xd6\xb2'

Please include the patch or (as an alternative) validate the output of the base64 decoded string, to see if the string is really base64-encoded. Or is this a bug in the base64 module?