Change find() behavior when searching for both a tag and a string
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Confirmed
|
Wishlist
|
Unassigned |
Bug Description
Requirements
- Python 3.5.2
- beautifulsoup4=
I'm trying to execute find_all by passing name and string values. I'm seeing what I believe are inconsistent results. Can someone please verify if this is a bug or not, or clarify if I'm misunderstanding the functionality.
In the example below, I expect each find_all to return 1 match. The first example does not
html_var1 = "<a href="https:/
html_var2 = "<a class="nav-opener" href="#" id="showMenu"
BeautifulSoup.
name='a'
string=
)
Output []
BeautifulSoup(
name='a',
string='menu'
)
Output ["menu"]
summary: |
- Strange Behavior w/ find_all (name=str, string=str) + Change find() behavior when searching for both a tag and a string |
tags: | added: feature |
Changed in beautifulsoup: | |
importance: | Undecided → Wishlist |
The behavior you're seeing is by design.
"Although string is for finding strings, you can combine it with arguments that find tags: Beautiful Soup will find all tags whose .string matches your value for string."
https:/ /www.crummy. com/software/ BeautifulSoup/ bs4/doc/ #the-string- argument
So you're searching for a tag with a special .string. How does .string work?
https:/ /www.crummy. com/software/ BeautifulSoup/ bs4/doc/ #string
"If a tag’s only child is another tag, and that tag has a .string, then the parent tag is considered to have the same .string as its child."
"If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None."
The first <a> tag contains more than one thing ("Juggernaut" and a <span> tag that contains "Store"), so its .string is defined to be None.
The second <a> tag contains one thing, a <span> tag, which contains one thing, "menu", so its .string is defined to be "menu".