find_all() and select() behave differently on markup containing duplicate elements

Bug #1770596 reported by BLKSerene
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Low
Unassigned

Bug Description

While find_all() can find all duplicate elements, select() cannot.
I'm using Python 3.6.2 with BeautifulSoup 4.6.2 and lxml parser

The test snippet:

import os

from bs4 import BeautifulSoup

markup = '<span class="test1">test1</span><span class="test2">test2</span><span class="test3">test3</span><span class="test1">test1</span>'
soup = BeautifulSoup(markup, 'lxml')

print('find_all:')
for element in soup.find_all(class_ = ['test1', 'test2', 'test3']):
 print(element) # Print 4 elements including duplicate elements
print('Length of "find_all": ' + str(len(soup.find_all(class_ = ['test1', 'test2', 'test3']))))

print('select:')
for element in soup.select('.test1, .test2, .test3'):
 print(element) # Print only 3 elements
print('Length of "select": ' + str(len(soup.select('.test1, .test2, .test3'))))

os.system('pause')

Tags: css
xiaohu nian (xiaohunian)
tags: added: error
Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for reporting this bug and providing an easy way to duplicate it. The CSS selector system is contributed code and for my own sanity I only add to it when a patch and test are contributed. I'm going to leave this issue open in a 'confirmed' state and if someone provides a patch or pull request I'll merge it.

Changed in beautifulsoup:
status: New → Confirmed
tags: added: css
removed: error
Changed in beautifulsoup:
importance: Undecided → Low
Revision history for this message
BLKSerene (blkserene-deactivatedaccount-deactivatedaccount) wrote :

I've found the problem.

In element.py:
remove line 1357 'if candidate not in context:'
I tested it again and the bug has disappeared.

Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for isolating the part that needs to be changed. Revision 472 fixes this behavior.

Changed in beautifulsoup:
status: Confirmed → Fix Committed
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.