Deppcopying a NavigableString in a large tree can exceed the recursion limit

Bug #1709837 reported by Ra on 2017-08-10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup

Bug Description

I encountered a strange bug, unfortunately I was not able to build a minimum HTML code that replicates the bug. So I am attaching the full HTML code that replicates it.

The fact is that, in certain cases, using copy.deepcopy() on a NavigableString causes a "RecursionError: maximum recursion depth exceeded"
This also happens when you use the jsonpickle library to serialize that string. This is how I encountered the bug, but I am using the copy.deepcopy() function so to not make the MWE rely on another external library

Python: 3.5.2 (Intel Distribution)
OS: Windows 10
BeautifulSoup: 4.5.3
lxml: 3.7.3

Here is the MWE, please download the attached HTML file too

import bs4
import copy

htmlfile = r"alinto_overview.html"

with open(htmlfile,"r",encoding="utf-8") as fileh:
    soup = bs4.BeautifulSoup(fileh, 'lxml')

with open("pretty.html","w",encoding="utf-8") as fileh:

dt = soup.find('dt',string="Acquisitions")
t = dt.find_next('dd').a.string
print("Deep copying...")
t2 = copy.deepcopy(t)

Here is the output from the interpreter

Deep copying...
Traceback (most recent call last):
  File "C:\data\progetti_miei\python\jsonpickle\", line 18, in <module>
    t2 = copy.deepcopy(t)
  File "C:\IntelPython35\lib\", line 182, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "C:\IntelPython35\lib\", line 297, in _reconstruct
    state = deepcopy(state, memo)
  File "C:\IntelPython35\lib\", line 155, in deepcopy
    y = copier(x, memo)
  File "C:\IntelPython35\lib\", line 243, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "C:\IntelPython35\lib\", line 182, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
RecursionError: maximum recursion depth exceeded

Tags: bug Edit Tag help
Ra (raffamaiden) wrote :
Leonard Richardson (leonardr) wrote :

I can duplicate this, but not reliably, so I think the problem isn't an infinite recursion, just a very large recursion. Making a deepcopy of an Element that's connected to a Beautiful Soup tree will make a deepcopy of every single element in the tree. I had assumed deepcopy memoized elements and reused the memos, but I don't think that's the case -- I would have to implement __deepcopy__ and take care of the memoization myself.

Changed in beautifulsoup:
status: New → Confirmed
summary: - Deppcopying a NavigableString enters an infinite loop
+ Deppcopying a NavigableString in a large tree can exceed the recursion
+ limit
tags: added: bug
Changed in beautifulsoup:
importance: Undecided → Critical
importance: Critical → High
importance: High → Medium
Agustin Barto (abarto) wrote :

Any updates on this? Is there a way to mitigate the issue at least?

Leonard Richardson (leonardr) wrote :

No, I haven't found a way around this short of implementing my own version of __deepcopy__. Mitigating the problem depends on your situation, but two solutions I can think of are to extract() the item you're about to copy (so that copying it doesn't copy the entire tree), or convert it to a string with decode() _instead_ of copying it.

Agustin Barto (abarto) wrote :

That's the thing, I need to copy the entire document, and decode sometimes raises the exception as well.

Leonard Richardson (leonardr) wrote :
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers