Behavior change (regression?) when class_ provided as find argument contains trailing whitespace
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
New
|
Undecided
|
Unassigned |
Bug Description
On one of my projects, there is a element that has the following class attribute:
'fa btn btn-outline-
and that I used to retrieve using:
soup.find('a', class_='fa btn btn-outline-
Recently, someone reported that it didn't work for them. After a bit of investigation, we tried:
soup.find('a', class_='fa btn btn-outline-
and it did work for them.
That behavior change does not seem expected to me.
I had a quick look at the source code/history for BeautifulSoup and here are my findings:
- could it be related to https:/
- I tried to add things in "test_multivalu
- would it make sense to "clean up" the argument provided as the class?
I am currently away from my dev computer but I can try to have a deeper look in the future to:
- pinpoint the exact commit that changed the behavior
- write corresponding unit test cases
- suggest a fix
Thanks again for the great work you did on this project :)
(More details about the original bug: https:/
After a deeper investigation with the following unit-test added in class HTMLTreeBuilder SmokeTest( object) :
# Based on test_multivalue d_attribute_ with_whitespace class_with_ whitespace( self):
self.assertEqu al(soup. div, soup.find('div', class_=" foo bar "))
def test_find_
markup = '<div class=" foo bar "></a>'
soup = self.soup(markup)
I got the following results on the different revisions:
497 KO preserves_ encoding error) preserves_ encoding)
450 KO
425 KO
417 KO <- behavior change (+ test_copy_
416 OK (except for test_copy_
414 OK
410 OK
408 OK
400 OK
This would correspond to https:/ /bazaar. launchpad. net/~leonardr/ beautifulsoup/ bs4/revision/ 417 .