[patch] Python-feedparser does not parse http://www.democracynow.org/podcast.xml correctly
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
feedparser (Ubuntu) |
Fix Released
|
Medium
|
Emmet Hikory |
Bug Description
This patch fixes two issues in upstream's bug tracker:
http://
http://
This feed doesn't get parsed correctly: http://
What doesn't work: The titles for all Thursday episodes are wrong
You can try to parse it right away - the feedparser will not display the title of feeds that contain the word "Thursday". Looking into feedparser's code and the RSS file, I see that the feed has type="plain" and mapContentType() doesn't map this one currectly, so I've added a mapping for "plain" to "text/plain".
The problem is that it doesn't happen for other feeds because the base64 decoder doesn't produce valid results for a normal string that isn't base64 encoded (i.e. raises binascii.Error), but does so when the string contains "Thursday":
Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import base64
>>> base64.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/
return binascii.
binascii.Error: Incorrect padding
>>> base64.
'N\x1b\
Please include the patch or (as an alternative) validate the output of the base64 decoded string, to see if the string is really base64-encoded. Or is this a bug in the base64 module?
Changed in feedparser: | |
importance: | Undecided → Medium |
status: | New → Confirmed |
Changed in feedparser: | |
assignee: | nobody → persia |
status: | Confirmed → In Progress |
Providing a debdiff for the patch.