passing a function to the class_ attribute uses only first class of the tag
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Environment:
- Python 3.6
- bs4 4.4
- Windows 7 64x
Expected behavior
- When using a function in the class_ attribute the whole css class of a Tag element should be considered.
Actual behavior
- When using a function in the class_ attribute only the first class of the Tag element is used.
Steps to reproduce
- Load a page and put the contents in a variable "dom".
- Define a function as per documentation, which accepts the css class of a tag
```
def exclude_
print(
return True # this is just to test
```
- Use the find_all method: ```div = dom.find_all('li', class_=
- Run the code to see that only one class per Tag is output from the exclude_class function
Thanks for filing this report. This is a reasonable interpretation of what should happen to a function that's matching against a multi-valued attribute, but I think the current implementation is more consistent with the rest of the API, and there's another way to do what you want, so for reasons of backwards compatibility I'm not going to make this change.
The current implementation calls the match function once for *each* value of a multi-valued attribute. This is analogous to the other types of matches: soup.find( class_= ["a", "b"]), finds tags that have *either* "a" or "b" in their class list, not tags whose class list is "a", "b", and nothing else. This, in turn, is analogous to soup.find(["a", "b"]), which looks for tags that are either <a> or <b>.
This logic won't help in your situation, because you want to look at the entire list of values at once. You can do this by writing a function that takes a Tag object and completely overrides the "does this tag match?" check. Pass it into the first argument to find_all():
### include" ><div>< /div><p> '
from bs4 import BeautifulSoup
soup = BeautifulSoup(
'<p class="include exclude">some text<p class="
)
def exclude_class(tag):
return tag.name == 'p' and 'exclude' not in tag.get('class', [])
print(soup. find_all( exclude_ class))
###