Tag.interesting_string_types property is not being propagated when a tag is copied

Bug #1990400 reported by Nicolas Rolin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

Requirements
  - Python 3.9
  - beautifulsoup4 == "4.11.1"
  - lxml == "4.9.1"

This should be a self-contained test that fails.

```python
import copy
from bs4 import BeautifulSoup

offending_html = """
<html>
    <body>
        <div class="animals">
        <!-- These are animals -->
        <p class="a">Cat</p>
        </div>
    </body>
</html>
"""
original_soup = BeautifulSoup(offending_html, features="lxml").div
dom = copy.copy(original_soup)
assert "".join(original_soup.stripped_strings) == "".join(dom.stripped_strings)
```

The advised way of copying a tag is copy.copy (as in https://www.crummy.com/software/BeautifulSoup/bs4/doc/#copying-beautiful-soup-objects), however in this case it seems the "comment" property got lost in the copy, and the commented part of the html is displayed in the copy.

replacing copy.copy by copy.deepcopy solves the problem, but deepcopy comes with its own issues.

is this a bug or is it intended ?

description: updated
summary: - stripped_strings if the copy of a tag is not identical to the original
+ stripped_strings of the copy of a tag is not identical to the original
description: updated
Revision history for this message
Leonard Richardson (leonardr) wrote : Re: stripped_strings of the copy of a tag is not identical to the original

Thanks for filing this issue. The problem is fixed in revision 12ad184.

The issue is with the Tag.interesting_string_types property, which was not being propagated when a tag was copied. Normally a Comment is not considered an interesting part of the textual portion of the markup, but when a Tag was copied, the interesting_string_types of the new Tag object was not set, meaning that all strings within it were considered interesting.

Changed in beautifulsoup:
status: New → Fix Committed
Revision history for this message
Leonard Richardson (leonardr) wrote :

Fix released in version 4.11.2.

Changed in beautifulsoup:
status: Fix Committed → Fix Released
summary: - stripped_strings of the copy of a tag is not identical to the original
+ Tag.interesting_string_types property is not being propagated when a tag
+ is copied
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.