Deppcopying a NavigableString in a large tree can exceed the recursion limit

Bug #1709837 reported by Ra on 2017-08-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Medium
Unassigned

Bug Description

I encountered a strange bug, unfortunately I was not able to build a minimum HTML code that replicates the bug. So I am attaching the full HTML code that replicates it.

The fact is that, in certain cases, using copy.deepcopy() on a NavigableString causes a "RecursionError: maximum recursion depth exceeded"
This also happens when you use the jsonpickle library to serialize that string. This is how I encountered the bug, but I am using the copy.deepcopy() function so to not make the MWE rely on another external library

Python: 3.5.2 (Intel Distribution)
OS: Windows 10
BeautifulSoup: 4.5.3
lxml: 3.7.3

Here is the MWE, please download the attached HTML file too

import bs4
import copy

htmlfile = r"alinto_overview.html"

with open(htmlfile,"r",encoding="utf-8") as fileh:
    soup = bs4.BeautifulSoup(fileh, 'lxml')

pretty=soup.prettify()
with open("pretty.html","w",encoding="utf-8") as fileh:
    fileh.write(pretty)

dt = soup.find('dt',string="Acquisitions")
t = dt.find_next('dd').a.string
print("'"+t+"'")
print(t.__dict__)
print("Deep copying...")
t2 = copy.deepcopy(t)
print(t2)
print(type(t2))

Here is the output from the interpreter

Deep copying...
Traceback (most recent call last):
  File "C:\data\progetti_miei\python\jsonpickle\bug_report2.py", line 18, in <module>
    t2 = copy.deepcopy(t)
  File "C:\IntelPython35\lib\copy.py", line 182, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "C:\IntelPython35\lib\copy.py", line 297, in _reconstruct
    state = deepcopy(state, memo)
  File "C:\IntelPython35\lib\copy.py", line 155, in deepcopy
    y = copier(x, memo)
  File "C:\IntelPython35\lib\copy.py", line 243, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "C:\IntelPython35\lib\copy.py", line 182, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
[....]
RecursionError: maximum recursion depth exceeded

Tags: bug Edit Tag help
Ra (raffamaiden) wrote :
Leonard Richardson (leonardr) wrote :

I can duplicate this, but not reliably, so I think the problem isn't an infinite recursion, just a very large recursion. Making a deepcopy of an Element that's connected to a Beautiful Soup tree will make a deepcopy of every single element in the tree. I had assumed deepcopy memoized elements and reused the memos, but I don't think that's the case -- I would have to implement __deepcopy__ and take care of the memoization myself.

Changed in beautifulsoup:
status: New → Confirmed
summary: - Deppcopying a NavigableString enters an infinite loop
+ Deppcopying a NavigableString in a large tree can exceed the recursion
+ limit
tags: added: bug
Changed in beautifulsoup:
importance: Undecided → Critical
importance: Critical → High
importance: High → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers