Beautiful Soup

find_all() and select() behave differently on markup containing duplicate elements

Bug #1770596 reported by BLKSerene on 2018-05-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Beautiful Soup	Fix Released	Low	Unassigned

Bug Description

While find_all() can find all duplicate elements, select() cannot.
I'm using Python 3.6.2 with BeautifulSoup 4.6.2 and lxml parser

The test snippet:

import os

from bs4 import BeautifulSoup

markup = 'test1test2test3test1'
soup = BeautifulSoup(markup, 'lxml')

print('find_all:')
for element in soup.find_all(class_ = ['test1', 'test2', 'test3']):
print(element) # Print 4 elements including duplicate elements
print('Length of "find_all": ' + str(len(soup.find_all(class_ = ['test1', 'test2', 'test3']))))

print('select:')
for element in soup.select('.test1, .test2, .test3'):
print(element) # Print only 3 elements
print('Length of "select": ' + str(len(soup.select('.test1, .test2, .test3'))))

os.system('pause')

Tags:

xiaohu nian (xiaohunian) on 2018-05-21

tags:

added: error

Revision history for this message

Leonard Richardson (leonardr) wrote on 2018-07-16:

Thanks for reporting this bug and providing an easy way to duplicate it. The CSS selector system is contributed code and for my own sanity I only add to it when a patch and test are contributed. I'm going to leave this issue open in a 'confirmed' state and if someone provides a patch or pull request I'll merge it.

Changed in beautifulsoup:
status:	New → Confirmed

Leonard Richardson (leonardr) on 2018-07-19

tags:

added: css
removed: error

Leonard Richardson (leonardr) on 2018-07-21

Changed in beautifulsoup:
importance:	Undecided → Low

Revision history for this message

BLKSerene (blkserene-deactivatedaccount-deactivatedaccount) wrote on 2018-07-24:

I've found the problem.

In element.py:
remove line 1357 'if candidate not in context:'
I tested it again and the bug has disappeared.

Revision history for this message

Leonard Richardson (leonardr) wrote on 2018-07-28:

Thanks for isolating the part that needs to be changed. Revision 472 fixes this behavior.

Changed in beautifulsoup:
status:	Confirmed → Fix Committed

Leonard Richardson (leonardr) on 2018-07-28

Changed in beautifulsoup:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.