Obviously, this isn't valid HTML (and neither is the one that Johannes posted). I can reproduce this at the interactive prompt:
>>> from lxml.html import fromstring
>>> h = fromstring('<ZyNETERROR>Parsing form data failed.</ZyNETERROR>')
>>> h
<Element zyneterror at 0xb71caaac>
>>> from lxml.html import clean
>>> c = clean.Cleaner()
>>> c.clean_html(h)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/dist-packages/lxml/html/clean.py", line 488, in clean_html
self(doc)
File "/usr/lib/python2.6/dist-packages/lxml/html/clean.py", line 390, in __call__
el.drop_tag()
File "/usr/lib/python2.6/dist-packages/lxml/html/__init__.py", line 191, in drop_tag
assert parent is not None
AssertionError
So it definitely seems that this is a case of needing better error messages.
Here's a simple page that causes this error: http:// nepis.epa. gov/Exe/ ZyNET.exe/ 20017XVV. TXT
Obviously, this isn't valid HTML (and neither is the one that Johannes posted). I can reproduce this at the interactive prompt:
>>> from lxml.html import fromstring '<ZyNETERROR> Parsing form data failed. </ZyNETERROR> ') python2. 6/dist- packages/ lxml/html/ clean.py" , line 488, in clean_html python2. 6/dist- packages/ lxml/html/ clean.py" , line 390, in __call__ python2. 6/dist- packages/ lxml/html/ __init_ _.py", line 191, in drop_tag
>>> h = fromstring(
>>> h
<Element zyneterror at 0xb71caaac>
>>> from lxml.html import clean
>>> c = clean.Cleaner()
>>> c.clean_html(h)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/
self(doc)
File "/usr/lib/
el.drop_tag()
File "/usr/lib/
assert parent is not None
AssertionError
So it definitely seems that this is a case of needing better error messages.