TypeError with html5parser.tostring()

Bug #780642 reported by Drew Smathers
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Undecided
scoder

Bug Description

System info:

Python : (2, 6, 1, 'final', 0)
lxml.etree : (2, 3, 0, 0)
libxml used : (2, 7, 3)
libxml compiled : (2, 7, 3)
libxslt used : (1, 1, 24)
libxslt compiled : (1, 1, 24)
html5 used : 0.90
Operating System: Mac OS X 10.6.7

With this example given in documentation:

  from lxml.html import tostring, html5parser
  tostring(html5parser.fromstring("<table><td>foo"))

I get the following error:

/Users/dsmathers/Envs/temp/lib/python2.6/site-packages/lxml/html/__init__.pyc in Element(*args, **kw)
   1561 This can also be used for XHTML documents.
   1562 """
-> 1563 v = html_parser.makeelement(*args, **kw)
   1564 return v
   1565

/Users/dsmathers/Envs/temp/lib/python2.6/site-packages/lxml/etree.so in lxml.etree._BaseParser.makeelement (src/lxml/lxml.etree.c:74798)()

/Users/dsmathers/Envs/temp/lib/python2.6/site-packages/lxml/etree.so in lxml.etree._makeElement (src/lxml/lxml.etree.c:11828)()

/Users/dsmathers/Envs/temp/lib/python2.6/site-packages/lxml/etree.so in lxml.etree._getNsTag (src/lxml/lxml.etree.c:23247)()

/Users/dsmathers/Envs/temp/lib/python2.6/site-packages/lxml/etree.so in lxml.etree.__getNsTag (src/lxml/lxml.etree.c:23373)()

/Users/dsmathers/Envs/temp/lib/python2.6/site-packages/lxml/etree.so in lxml.etree._utf8 (src/lxml/lxml.etree.c:22190)()
TypeError: Argument must be bytes or unicode, got 'dict'

Revision history for this message
Jimmy Yuen Ho Wong (wyuenho) wrote :

html5 parsing seems to be completely broken because of this bug

Revision history for this message
Jimmy Yuen Ho Wong (wyuenho) wrote :

Just did some digging and it seems the html5lib 0.9 has a treebuilder module called etree_lxml.py. Replacing the entire content of lxml/html/_html5builder.py with just this line to delegate the names seems to have fixed this bug:

from html5lib.treebuilders.etree_lxml import *

Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
assignee: nobody → Stefan Behnel (scoder)
status: New → Fix Committed
Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 2.3.1.

Changed in lxml:
milestone: none → 2.3.1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.