_findAll() ignores "text" keyword argument

Bug #493722 reported by Darcy Parks on 2009-12-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Undecided
Unassigned

Bug Description

Version: 3.0.8

_findAll has two special cases that avoid creating a SoupStrainer: when tag=True and findAll('tag-name'). These two cases check that "limit", "attrs" and "kwargs" are unspecified but they don't check "text". So if "limit", "attrs" and "kwargs" are unspecified, "text" gets ignored and the findAll('tag-name') case occurs.

To fix:

Line 340 of BeautifulSoup.py:

Change
    elif not limit and name is True and not attrs and not kwargs:
To
    elif not limit and name is True and not attrs and not kwargs \
        and not text:

Similarly, on line 345:

Change
    elif not limit and isinstance(name, basestring) and not attrs \
        and not kwargs:
To
    elif not limit and isinstance(name, basestring) and not attrs \
        and not kwargs and not text:

Aaron DeVore (aaron-devore) wrote :

Good catch! This is a case where the user has made a mistake by using findAll('tag-name', text=...) or findAll(True, text=...). According to the API, text=... should override the tag searching.

I put a fix in a private branch. I also combined some of the checks in the two elif tests.

There was one possible issue with using 'not text': it matches for an empty string (""). That is fixed by using 'text is None' instead.

Leonard Richardson (leonardr) wrote :

Thanks to Aaron's branch, fixed in 3.0.8.1.

Changed in beautifulsoup:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers