planet venus fail to parse some feeds

Bug #475961 reported by andrey i. mavlyanov
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
planet-venus (Debian)
New
Undecided
Unassigned
planet-venus (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: planet-venus

Ubuntu 9.10

planet-venus from packages

http://iimci.blogspot.com/feeds/posts/default (attached)

Erorr:
===
ERROR:planet.runner:Error processing http://iimci.blogspot.com/feeds/posts/default
ERROR:planet.runner:HTMLParseError: malformed start tag, at line 1, column 1971
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/planet/spider.py", line 441, in spiderPlanet
    data = feedparser.parse(feed, **options)
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/planet/vendor/feedparser.py", line 3525, in parse
    feedparser.feed(data)
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/planet/vendor/feedparser.py", line 1662, in feed
    sgmllib.SGMLParser.feed(self, data)
ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 104, in feed
    self.goahead(0)
ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 143, in goahead
    k = self.parse_endtag(i)
ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 320, in parse_endtag
    self.finish_endtag(tag)
ERROR:planet.runner: File "/usr/lib/python2.6/sgmllib.py", line 360, in finish_endtag
    self.unknown_endtag(tag)
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/planet/vendor/feedparser.py", line 569, in unknown_endtag
    method()
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/planet/vendor/feedparser.py", line 1512, in _end_content
    value = self.popContent('content')
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/planet/vendor/feedparser.py", line 849, in popContent
    value = self.pop(tag)
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/planet/vendor/feedparser.py", line 764, in pop
    mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/planet/vendor/feedparser.py", line 2218, in _parseMicroformats
    p = _MicroformatsParser(htmlSource, baseURI, encoding)
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/planet/vendor/feedparser.py", line 1823, in __init__
    self.document = BeautifulSoup.BeautifulSoup(data)
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
    self._feed(isHTML=isHTML)
ERROR:planet.runner: File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1263, in _feed
    self.builder.feed(markup)
ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
    self.goahead(0)
ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
    k = self.parse_starttag(i)
ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
    endpos = self.check_for_whole_start_tag(i)
ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
    self.error("malformed start tag")
ERROR:planet.runner: File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
===

Revision history for this message
andrey i. mavlyanov (andrey-mavlyanov) wrote :
Revision history for this message
Serge Matveenko (lig) wrote :

It seems like HTMLParser tries to parse content of the tag like part of markup.
And i see no XML CDATA in the feed.

Revision history for this message
Serge Matveenko (lig) wrote :

Couldn't reproduce using lp:ubuntu/karmic/planet-venus on python2.5 and python2.6 both.

Changed in planet-venus (Ubuntu):
status: New → Invalid
Revision history for this message
andrey i. mavlyanov (andrey-mavlyanov) wrote :

#3 have you tried the package version? why do you make a decision on the bug status for the _package_ if the branch is working for you?!

Changed in planet-venus (Ubuntu):
status: Invalid → New
Revision history for this message
Mattias Holmlund (u219) wrote :

I was hit by this bug when I tried to move from planet to planet-venus. Based on a hint in http://www.jpichon.net/blog/2010/6/installing-planet-venus/ i removed the python-beautifulsoup package and now all my feeds are parsed without errors. I have no idea what python-beautifulsoup does; it is included in Recommends: for planet-venus, but for me planet-venus works better without it.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

I got around this on Ubuntu 10.10 by upgrading to Beautiful Soup 3.2.0.
Fetch tar file from
http://www.crummy.com/software/BeautifulSoup/download/3.x/BeautifulSoup-3.2.0.tar.gz

Extract from it and overwrite the broken 3.1 version that Ubuntu 10.10 comes with:
sudo cp BeautifulSoup.py /usr/share/pyshared/BeautifulSoup.py
sudo cp BeautifulSoupTests.py /usr/share/pyshared/BeautifulSoupTests.py

Presto Planet Venus works like a charm on feeds that used to croak it off.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.