Tag.interesting_string_types property is not being propagated when a tag is copied
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Requirements
- Python 3.9
- beautifulsoup4 == "4.11.1"
- lxml == "4.9.1"
This should be a self-contained test that fails.
```python
import copy
from bs4 import BeautifulSoup
offending_html = """
<html>
<body>
<div class="animals">
<!-- These are animals -->
<p class="a">Cat</p>
</div>
</body>
</html>
"""
original_soup = BeautifulSoup(
dom = copy.copy(
assert "".join(
```
The advised way of copying a tag is copy.copy (as in https:/
replacing copy.copy by copy.deepcopy solves the problem, but deepcopy comes with its own issues.
is this a bug or is it intended ?
description: | updated |
summary: |
- stripped_strings if the copy of a tag is not identical to the original + stripped_strings of the copy of a tag is not identical to the original |
description: | updated |
Thanks for filing this issue. The problem is fixed in revision 12ad184.
The issue is with the Tag.interesting _string_ types property, which was not being propagated when a tag was copied. Normally a Comment is not considered an interesting part of the textual portion of the markup, but when a Tag was copied, the interesting_ string_ types of the new Tag object was not set, meaning that all strings within it were considered interesting.