lxml.html.clean: RuntimeError: dictionary changed size during iteration

Bug #1369362 reported by Milorad Pop-Tosic
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
scoder

Bug Description

This error is raised for certain inputs to clean_html method in class Cleaner, when iterating over attributes dictionary in order to remove non-safe ones, on Python3. See attached file for details and steps to reproduce.

lxml.etree: (3, 4, 0, 0)
libxml used: (2, 9, 1)
libxml compiled: (2, 9, 1)
libxslt used: (1, 1, 28)
libxslt compiled: (1, 1, 28)

Revision history for this message
Milorad Pop-Tosic (pop-0) wrote :
description: updated
Revision history for this message
scoder (scoder) wrote :

Works for me in latest lxml.

Changed in lxml:
status: New → Triaged
scoder (scoder)
Changed in lxml:
status: Triaged → Invalid
Revision history for this message
Milorad Pop-Tosic (pop-0) wrote :

I am still getting this error on the latest lxml 3.4.1 on Python 3.4.0.

lxml.etree: (3, 4, 1, 0)
libxml used: (2, 9, 1)
libxml compiled: (2, 9, 1)
libxslt used: (1, 1, 28)
libxslt compiled: (1, 1, 28)
Traceback (most recent call last):
  File "test.py", line 24, in <module>
    cleaned = cleaner.clean_html(html_string)
  File "/home/milorad/envs/hiri-py3/lib/python3.4/site-packages/lxml/html/clean.py", line 504, in clean_html
    self(doc)
  File "/home/milorad/envs/hiri-py3/lib/python3.4/site-packages/lxml/html/clean.py", line 262, in __call__
    for aname in attrib.keys():
RuntimeError: dictionary changed size during iteration

This should simply be a matter of iterating over list(attrib.keys()) here:
https://github.com/lxml/lxml/blob/50a8cbbb4fb35a794977c3d8ecc43ddac4f4413e/src/lxml/html/clean.py#L271

And possibly here:
https://github.com/lxml/lxml/blob/50a8cbbb4fb35a794977c3d8ecc43ddac4f4413e/src/lxml/html/clean.py#L262

Revision history for this message
scoder (scoder) wrote :

Ah, right. It was a Py3-only bug related to pseudo-attributes on processing instructions. No need to clean those up.

https://github.com/lxml/lxml/commit/54a8bfedcd0f32274a4ebf9e2d8e391fe759aba5

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Medium
milestone: none → 3.4
status: Invalid → Fix Committed
Revision history for this message
scoder (scoder) wrote :

Fix released in lxml 3.4.2.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.