Assertion error when cleaning

Bug #773715 reported by Jason Baker
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
scoder

Bug Description

Python : (2, 6, 6, 'final', 0)
lxml.etree : (2, 3, -99, 0)
libxml used : (2, 6, 26)
libxml compiled : (2, 6, 26)
libxslt used : (1, 1, 17)
libxslt compiled : (1, 1, 17)

This is an error that's being sent out by our production process. Thus far, I haven't been able to reproduce it:

Traceback (most recent call last):
  File "/home/csar/current/apture/main/workers/base.py", line 144, in _wrappedMethod
    return method(logger, queueName, decoded)
  File "/home/csar/current/apture/main/workers/magiclinker.py", line 115, in execute
    cleaner.clean_html(html)
  File "/usr/local/lib/python2.6/site-packages/lxml/html/clean.py", line 488, in clean_html
    self(doc)
  File "/usr/local/lib/python2.6/site-packages/lxml/html/clean.py", line 390, in __call__
    el.drop_tag()
  File "/usr/local/lib/python2.6/site-packages/lxml/html/__init__.py", line 191, in drop_tag
    assert parent is not None
AssertionError

If nothing else, this belongs in the "needs a better error message" category.

Revision history for this message
Johannes (johtso) wrote :

I seem to be experiencing this bug.

Is this code reproducing the same problem?

Johannes (johtso)
Changed in lxml:
status: New → Confirmed
Revision history for this message
Jason Baker (jbaker) wrote :

Here's a simple page that causes this error: http://nepis.epa.gov/Exe/ZyNET.exe/20017XVV.TXT

Obviously, this isn't valid HTML (and neither is the one that Johannes posted). I can reproduce this at the interactive prompt:

>>> from lxml.html import fromstring
>>> h = fromstring('<ZyNETERROR>Parsing form data failed.</ZyNETERROR>')
>>> h
<Element zyneterror at 0xb71caaac>
>>> from lxml.html import clean
>>> c = clean.Cleaner()
>>> c.clean_html(h)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/dist-packages/lxml/html/clean.py", line 488, in clean_html
    self(doc)
  File "/usr/lib/python2.6/dist-packages/lxml/html/clean.py", line 390, in __call__
    el.drop_tag()
  File "/usr/lib/python2.6/dist-packages/lxml/html/__init__.py", line 191, in drop_tag
    assert parent is not None
AssertionError

So it definitely seems that this is a case of needing better error messages.

Revision history for this message
scoder (scoder) wrote :

Thanks for the report and the test case. Here's a fix:

https://github.com/lxml/lxml/commit/c531d3326cd1e6fd888299518d49acad5fd3b627

Changed in lxml:
assignee: nobody → Stefan Behnel (scoder)
importance: Undecided → Medium
status: Confirmed → Fix Committed
Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 2.3.1.

Changed in lxml:
status: Fix Committed → Fix Released
scoder (scoder)
Changed in lxml:
milestone: none → 2.3.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.