Beautiful Soup

passing a function to the class_ attribute uses only first class of the tag

Bug #1774156 reported by Boris Rizov on 2018-05-30

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Beautiful Soup	Won't Fix	Undecided	Unassigned

Bug Description

Environment:
- Python 3.6
- bs4 4.4
- Windows 7 64x

Expected behavior
- When using a function in the class_ attribute the whole css class of a Tag element should be considered.

Actual behavior
- When using a function in the class_ attribute only the first class of the Tag element is used.

Steps to reproduce
- Load a page and put the contents in a variable "dom".
- Define a function as per documentation, which accepts the css class of a tag
```
   def exclude_class(selector):
     print(selector) # <- here it will output only the first class
     return True # this is just to test
```
- Use the find_all method: ```div = dom.find_all('li', class_=exclude_class)```
- Run the code to see that only one class per Tag is output from the exclude_class function

Revision history for this message

Leonard Richardson (leonardr) wrote on 2018-07-19:

Thanks for filing this report. This is a reasonable interpretation of what should happen to a function that's matching against a multi-valued attribute, but I think the current implementation is more consistent with the rest of the API, and there's another way to do what you want, so for reasons of backwards compatibility I'm not going to make this change.

The current implementation calls the match function once for *each* value of a multi-valued attribute. This is analogous to the other types of matches: soup.find(class_=["a", "b"]), finds tags that have *either* "a" or "b" in their class list, not tags whose class list is "a", "b", and nothing else. This, in turn, is analogous to soup.find(["a", "b"]), which looks for tags that are either <a> or <b>.

This logic won't help in your situation, because you want to look at the entire list of values at once. You can do this by writing a function that takes a Tag object and completely overrides the "does this tag match?" check. Pass it into the first argument to find_all():

###
from bs4 import BeautifulSoup
soup = BeautifulSoup(
'<p class="include exclude">some text<p class="include"><div></div><p>'
)
def exclude_class(tag):
return tag.name == 'p' and 'exclude' not in tag.get('class', [])

print(soup.find_all(exclude_class))
###

Changed in beautifulsoup:
status:	New → Won't Fix

Revision history for this message

Boris Rizov (borisuu) wrote on 2018-07-19:

Thank you for the great response, I will use the suggested method.

Regards

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.