lxml.html.clean: RuntimeError: dictionary changed size during iteration

Bug #1369362 reported by Milorad Pop-Tosic on 2014-09-15
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Medium
scoder

Bug Description

This error is raised for certain inputs to clean_html method in class Cleaner, when iterating over attributes dictionary in order to remove non-safe ones, on Python3. See attached file for details and steps to reproduce.

lxml.etree: (3, 4, 0, 0)
libxml used: (2, 9, 1)
libxml compiled: (2, 9, 1)
libxslt used: (1, 1, 28)
libxslt compiled: (1, 1, 28)

Milorad Pop-Tosic (pop-0) wrote :
description: updated
scoder (scoder) wrote :

Works for me in latest lxml.

Changed in lxml:
status: New → Triaged
scoder (scoder) on 2014-12-06
Changed in lxml:
status: Triaged → Invalid
Milorad Pop-Tosic (pop-0) wrote :

I am still getting this error on the latest lxml 3.4.1 on Python 3.4.0.

lxml.etree: (3, 4, 1, 0)
libxml used: (2, 9, 1)
libxml compiled: (2, 9, 1)
libxslt used: (1, 1, 28)
libxslt compiled: (1, 1, 28)
Traceback (most recent call last):
  File "test.py", line 24, in <module>
    cleaned = cleaner.clean_html(html_string)
  File "/home/milorad/envs/hiri-py3/lib/python3.4/site-packages/lxml/html/clean.py", line 504, in clean_html
    self(doc)
  File "/home/milorad/envs/hiri-py3/lib/python3.4/site-packages/lxml/html/clean.py", line 262, in __call__
    for aname in attrib.keys():
RuntimeError: dictionary changed size during iteration

This should simply be a matter of iterating over list(attrib.keys()) here:
https://github.com/lxml/lxml/blob/50a8cbbb4fb35a794977c3d8ecc43ddac4f4413e/src/lxml/html/clean.py#L271

And possibly here:
https://github.com/lxml/lxml/blob/50a8cbbb4fb35a794977c3d8ecc43ddac4f4413e/src/lxml/html/clean.py#L262

scoder (scoder) wrote :

Ah, right. It was a Py3-only bug related to pseudo-attributes on processing instructions. No need to clean those up.

https://github.com/lxml/lxml/commit/54a8bfedcd0f32274a4ebf9e2d8e391fe759aba5

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Medium
milestone: none → 3.4
status: Invalid → Fix Committed
scoder (scoder) wrote :

Fix released in lxml 3.4.2.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments