Add a clone method to Tag and NavigableString

Bug #1307490 reported by Martijn Pieters
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

BeautifulSoup does not support cloning an element (with all contained child page elements); use cases include creating multiple copies of a sub-tree to generate a larger document quickly.

You cannot use copy.deepcopy() for this task as elements maintain relationships to their tree parent and siblings; a deepcopy would needlessly copy those elements along. NavigableString also implements an incorrect __copy__ method that returns `self` without taking the mutable parent and sibling references making this approach unworkable.

I've attached a patch that adds explicit cloning support; new `Tag.clone()` and `NavigableString.clone()` methods which create copies that are *not attached to the document*. With it you can repeatedly create copies of subtrees without affecting the original tree.

This patch assumes that the patch in bug #1307471 has been applied; in my testing (with lxml as the parser), a freshly created BeautifulSoup tree does not have `.builder` set on the created elements.

This code was initially written up as an answer to a question on Stack Overflow, see http://stackoverflow.com/a/23058678/100297

Revision history for this message
Martijn Pieters (mjpieters) wrote :
Revision history for this message
Leonard Richardson (leonardr) wrote :

I've adapted this patch to change the behavior of __copy__.

Changed in beautifulsoup:
status: New → Fix Committed
Revision history for this message
Leonard Richardson (leonardr) wrote :

Revision number: 381.

Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.