passing a function to the class_ attribute uses only first class of the tag

Bug #1774156 reported by Boris Rizov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Won't Fix
Undecided
Unassigned

Bug Description

Environment:
 - Python 3.6
 - bs4 4.4
 - Windows 7 64x

Expected behavior
 - When using a function in the class_ attribute the whole css class of a Tag element should be considered.

Actual behavior
 - When using a function in the class_ attribute only the first class of the Tag element is used.

Steps to reproduce
 - Load a page and put the contents in a variable "dom".
 - Define a function as per documentation, which accepts the css class of a tag
```
   def exclude_class(selector):
     print(selector) # <- here it will output only the first class
     return True # this is just to test
```
 - Use the find_all method: ```div = dom.find_all('li', class_=exclude_class)```
 - Run the code to see that only one class per Tag is output from the exclude_class function

Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for filing this report. This is a reasonable interpretation of what should happen to a function that's matching against a multi-valued attribute, but I think the current implementation is more consistent with the rest of the API, and there's another way to do what you want, so for reasons of backwards compatibility I'm not going to make this change.

The current implementation calls the match function once for *each* value of a multi-valued attribute. This is analogous to the other types of matches: soup.find(class_=["a", "b"]), finds tags that have *either* "a" or "b" in their class list, not tags whose class list is "a", "b", and nothing else. This, in turn, is analogous to soup.find(["a", "b"]), which looks for tags that are either <a> or <b>.

This logic won't help in your situation, because you want to look at the entire list of values at once. You can do this by writing a function that takes a Tag object and completely overrides the "does this tag match?" check. Pass it into the first argument to find_all():

###
from bs4 import BeautifulSoup
soup = BeautifulSoup(
    '<p class="include exclude">some text<p class="include"><div></div><p>'
)
def exclude_class(tag):
    return tag.name == 'p' and 'exclude' not in tag.get('class', [])

print(soup.find_all(exclude_class))
###

Changed in beautifulsoup:
status: New → Won't Fix
Revision history for this message
Boris Rizov (borisuu) wrote :

Thank you for the great response, I will use the suggested method.

Regards

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.