Comment 2 for bug 1713129

Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for filing this issue. The behavior you're seeing is a side effect of the way .string works. This is close enough to issue 1698990 that I'm going to mark it as a duplicate.

Passing 'tag' and 'string' into a find() method makes it look for a tag whose .string value is that tag. If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None. (https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string)

The <p> tag that contains both a string and a comment has its .string set to None, so your problem_soup.find('p', string=re.compile(r".*")) doesn't match anything -- there's a string inside the tag, but there's other stuff as well, so .string is undefined and an attempt to match on .string will match nothing.

You might be interested in issue 1645513, a proposal to change the behavior of find() when given both a tag and a string. You're not the first person to expect find() to behave differently than it does, but it seems like the solution I propose in https://bugs.launchpad.net/beautifulsoup/+bug/1645513/comments/3 would also not behave the way you expect it to.