lxml.html.HTMLParser doesn't like html with frameset

Bug #599318 reported by sergiomb on 2010-06-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Low
Stefan Seelmann

Bug Description

In [4]: print("%-20s: %s" % ('Python', sys.version_info))
Python : (2, 6, 4, 'final', 0)

In [5]: print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
lxml.etree : (2, 2, 6, 0)

In [6]: print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
libxml used : (2, 7, 7)

In [7]: print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
libxml compiled : (2, 7, 6)

In [8]: print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
libxslt used : (1, 1, 26)

In [9]: print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))
libxslt compiled : (1, 1, 26)

import lxml.html
hparser = lxml.html.HTMLParser(encoding='utf-8', remove_comments=True)

content="""<frameset>

        <frame src="main.php" name="srcpg"
id="srcpg"
        frameborder="0" scrolling="Auto" marginwidth=""
marginheight="0">

</frameset>"""

etree_document = lxml.html.fromstring(content, parser=hparser)

TypeError Traceback (most recent call last)

/home/sergio/<ipython console> in <module>()

/usr/lib/python2.6/site-packages/lxml/html/__init__.pyc in fromstring(html, base_url, parser, **kw)
    634 other_head.drop_tree()
    635 return doc
--> 636 if (len(body) == 1 and (not body.text or not body.text.strip())
    637 and (not body[-1].tail or not body[-1].tail.strip())):
    638 # The body has just one element, so it was probably a single

TypeError: object of type 'NoneType' has no len()

Changed in lxml:
status: New → In Progress
scoder (scoder) wrote :
Changed in lxml:
assignee: nobody → Stefan Seelmann (2-ubuntu-d)
importance: Undecided → Low
status: In Progress → Fix Committed
scoder (scoder) wrote :

Fixed in lxml 3.2.0.

Changed in lxml:
status: Fix Committed → Fix Released
scoder (scoder) on 2013-04-28
Changed in lxml:
milestone: none → 3.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers