Provide a public clone() method for elements
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
New
|
Undecided
|
Unassigned |
Bug Description
I needed a way to create a copy of a bare Tag without its contents, and the hidden _clone() method worked perfectly!
My use case was that I need to extract an arbitrary element from an HTML document, then replicate its enclosing hierarchy all the way to the top. The _clone() method was perfect for creating the chain of parent elements and inserting the lower element into each successive higher element.
I'm doing various other types of slicing and dicing of HTML content, and _clone() has been useful for that too.
Would you consider creating a public version of the _clone() method? It could be named clone() or something else you prefer. I'd be happy to take a shot at a merge request (including documentation) if you want.
I ran into an issue using _clone() in my own code. For whatever bizarre reason, code running inside a "pytest" test does not see the _clone() method.
For example, consider the following "test_clone.py" file:
====
#!/usr/bin/env python
import bs4
# my own copy of _clone()
self.prefix, self.attrs, is_xml= self._is_ xml,
sourceline= self.sourceline , sourcepos= self.sourcepos,
can_be_ empty_element= self.can_ be_empty_ element,
cdata_ list_attributes =self.cdata_ list_attributes ,
preserve_ whitespace_ tags=self. preserve_ whitespace_ tags,
interesting_ string_ types=self. interesting_ string_ types empty_element' , 'hidden'):
setattr( clone, attr, getattr(self, attr))
def _myclone(self):
clone = type(self)(
None, None, self.name, self.namespace,
)
for attr in ('can_be_
return clone
def test_foo(): up('<body foo="bar"/>', 'lxml') .find(" body"). extract( ) _clone) }")
body = bs4.BeautifulSo
print(f'1: {body}')
print(f'2: {_myclone(body)}')
print(f"3: {type(body.
print(f'4: {body._clone()}')
test_foo()
====
If I run this script manually, it works as expected:
====
$ test_clone.py
1: <body foo="bar"></body>
2: <body foo="bar"></body>
3: <class 'method'>
4: <body foo="bar"></body>
====
But if I run it via pytest, the _clone() method is undefined:
====
============== ERRORS ==============
__ ERROR collecting test_clone.py __
test_clone.py:26: in <module>
test_foo()
test_clone.py:24: in test_foo
print(f'4: {body._clone()}')
E TypeError: 'NoneType' object is not callable
--------- Captured stdout ----------
1: <body foo="bar"></body>
2: <body foo="bar"></body>
3: <class 'NoneType'>
===== short test summary info ======
ERROR test_clone.py - TypeError: 'NoneType' object is not callable
====
It took me awhile to figure this out... and I still don't understand the why behind it...