Comment 2 for bug 773715

Revision history for this message
Jason Baker (jbaker) wrote :

Here's a simple page that causes this error: http://nepis.epa.gov/Exe/ZyNET.exe/20017XVV.TXT

Obviously, this isn't valid HTML (and neither is the one that Johannes posted). I can reproduce this at the interactive prompt:

>>> from lxml.html import fromstring
>>> h = fromstring('<ZyNETERROR>Parsing form data failed.</ZyNETERROR>')
>>> h
<Element zyneterror at 0xb71caaac>
>>> from lxml.html import clean
>>> c = clean.Cleaner()
>>> c.clean_html(h)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/dist-packages/lxml/html/clean.py", line 488, in clean_html
    self(doc)
  File "/usr/lib/python2.6/dist-packages/lxml/html/clean.py", line 390, in __call__
    el.drop_tag()
  File "/usr/lib/python2.6/dist-packages/lxml/html/__init__.py", line 191, in drop_tag
    assert parent is not None
AssertionError

So it definitely seems that this is a case of needing better error messages.