lxml.html.HTMLParser doesn't like html with frameset

Bug #599318 reported by sergiomb
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Low
Stefan Seelmann

Bug Description

In [4]: print("%-20s: %s" % ('Python', sys.version_info))
Python : (2, 6, 4, 'final', 0)

In [5]: print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
lxml.etree : (2, 2, 6, 0)

In [6]: print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
libxml used : (2, 7, 7)

In [7]: print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
libxml compiled : (2, 7, 6)

In [8]: print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
libxslt used : (1, 1, 26)

In [9]: print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))
libxslt compiled : (1, 1, 26)

import lxml.html
hparser = lxml.html.HTMLParser(encoding='utf-8', remove_comments=True)

content="""<frameset>

        <frame src="main.php" name="srcpg"
id="srcpg"
        frameborder="0" scrolling="Auto" marginwidth=""
marginheight="0">

</frameset>"""

etree_document = lxml.html.fromstring(content, parser=hparser)

TypeError Traceback (most recent call last)

/home/sergio/<ipython console> in <module>()

/usr/lib/python2.6/site-packages/lxml/html/__init__.pyc in fromstring(html, base_url, parser, **kw)
    634 other_head.drop_tree()
    635 return doc
--> 636 if (len(body) == 1 and (not body.text or not body.text.strip())
    637 and (not body[-1].tail or not body[-1].tail.strip())):
    638 # The body has just one element, so it was probably a single

TypeError: object of type 'NoneType' has no len()

Changed in lxml:
status: New → In Progress
Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
assignee: nobody → Stefan Seelmann (2-ubuntu-d)
importance: Undecided → Low
status: In Progress → Fix Committed
Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 3.2.0.

Changed in lxml:
status: Fix Committed → Fix Released
scoder (scoder)
Changed in lxml:
milestone: none → 3.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.