find and find_all: string matching doesn't work when comments are present.

Bug #1713129 reported by Thomas Proctor
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
New
Undecided
Unassigned

Bug Description

The attached file contains an example of the bug. When using a string/text match with the `string` or `text` arguments for `find` and `find_all`, no matches will be returned if the element contains a comment in the html and the `name` argument is specified.

My example shows a generic regex matching (matching the regex ".*", which should match all text), but I believe this bug also shows up with normal text matching as well.

See Bug example for an example.

Versioning:
bs4: 4.6.0
python: 3.4.3
lxml parser

Revision history for this message
Thomas Proctor (theproctonator) wrote :
Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for filing this issue. The behavior you're seeing is a side effect of the way .string works. This is close enough to issue 1698990 that I'm going to mark it as a duplicate.

Passing 'tag' and 'string' into a find() method makes it look for a tag whose .string value is that tag. If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None. (https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string)

The <p> tag that contains both a string and a comment has its .string set to None, so your problem_soup.find('p', string=re.compile(r".*")) doesn't match anything -- there's a string inside the tag, but there's other stuff as well, so .string is undefined and an attempt to match on .string will match nothing.

You might be interested in issue 1645513, a proposal to change the behavior of find() when given both a tag and a string. You're not the first person to expect find() to behave differently than it does, but it seems like the solution I propose in https://bugs.launchpad.net/beautifulsoup/+bug/1645513/comments/3 would also not behave the way you expect it to.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.