lxml.html.clean.Cleaner crushes on some HTMLs

Bug #1838497 reported by Nikita Vostretsov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Confirmed
Undecided
Unassigned

Bug Description

When I run clean_reproduce.py script exception is raised

python ~/tmp/clean_reproduce.py
Executing python /home/nikita/tmp/clean_reproduce.py in /home/nikita/ves/lxml-4.4.0/
Python : sys.version_info(major=3, minor=6, micro=7, releaselevel='final', serial=0)
lxml.etree : (4, 4, 0, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 9)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 33)
Traceback (most recent call last):
  File "/home/nikita/tmp/clean_reproduce.py", line 30, in <module>
    main()
  File "/home/nikita/tmp/clean_reproduce.py", line 26, in main
    cleaner.clean_html(node)
  File "src/lxml/html/clean.py", line 520, in lxml.html.clean.Cleaner.clean_html
  File "src/lxml/html/clean.py", line 394, in lxml.html.clean.Cleaner.__call__
  File "/home/nikita/ves/lxml-4.4.0/lib/python3.6/site-packages/lxml/html/__init__.py", line 339, in drop_tree
    assert parent is not None

Revision history for this message
Nikita Vostretsov (whalebot-helmsman) wrote :
Revision history for this message
scoder (scoder) wrote :

I agree that it shouldn't run into that assertion. PR welcome.

Changed in lxml:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.