Can't select by CSS class if element has more than one class

Reported by Endolith on 2009-08-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Undecided
Unassigned

Bug Description

If a page has <p class="class1">, then soup.findAll('p', 'class1') will find it.

If it has <p class="class1 class2">, though, it will not be found. BeautifulSoup treats this as a single class with a space in it 'class1 class2' rather than two classes ['class1','class2'].

A workaround is to use a regular expression to search for the class instead of a string:

soup.findAll('p', {'class': re.compile(r'\bclass1\b')})

But I think it should understand that an object has multiple classes.

MORITA Hajime (morrita) wrote :

Hi,
I made a patch to try fixing this problem.
In the patch, attribute name prefixed by "@" means HTML class-like search,
for example, findAll("span", {"@class": "foo"}) will work as such.

MORITA Hajime (morrita) wrote :

Oops, the patch is totally irrelevant! I've removed that. Sorry to disturb you...

Endolith (endolith) wrote :

(I've just been using LXML instead.)

Leonard Richardson (leonardr) wrote :

Beautiful Soup 4 beta 5 deals with multi-valued attributes (of which 'class' is the most common) a lot better, and also improves searching by CSS class.

Leonard Richardson (leonardr) wrote :

I'm satisfied with the behavior of beta 6 w/r/t searching by CSS class. There's still a small hack, but I can get rid of the hack without changing the API.

Changed in beautifulsoup:
status: New → Fix Committed
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers