Replace select() implementation with Soup Sieve dependency
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I decided to create an issue to discuss the possibility of replace the current "select" implementation.
Let me start off by saying there would be no hard feelings if the idea was rejected, I am more just making
a proposal in order to get an official answer on whether I should work towards this goal or not.
It seems that the current select implementation was user contributed and doesn't get much support as it is
fixed via pull requests only. I can understand as the code is a bit complex in its approach.
My proposal is to use an external library that I wrote called Soup Sieve. This would push support for CSS
selectors outside of Beautiful Soup, and bring in a number of new selectors and bug fixes.
I have all bs4 select tests passing with Soup Sieve 1.1 except `:nth-of-type(0)` which bs4 expects to fail. In SoupSieve this is completely valid and simply returns no tags. The only changes made to other tests were to capture SyntaxErrors instead of ValueErrors. In addition to passing all the tests, it also fixes a number of currently open CSS bugs in the bs4 issue tracker.
This is basicall the change:
def select(self, selector, namespaces=None, limit=None):
"""Perform a CSS selection operation on the current element."""
return soupsieve.
In order for Soup Sieve to be used though, I assume I would have to add Python 2 support. Soup Sieve is currently tested on Python 3.4+. If it is desired to have Beautiful Soup use Soup Sieve, I would add
Python 2.7 support up until the time that Beautiful Soup drops Python 2 (hopefully at Python EOL in 2020 :)).
If Beautiful Soup wanted to have more control over the "select" versions, it could be vendored in the package if desired. Then which version is used could be controlled.
I guess you could even make selector logic as an optional feature. If Soup Sieve is not installed, no select. You could even fallback to the old version.
Soup Sieve doesn't require being implemented inside Beautiful Soup and can live on as a separate companion
package, but I think it could solve a number of CSS selector headaches here. This would allow you to just
send users upstream for CSS related issues.
Related branches
- Leonard Richardson: Pending requested
-
Diff: 87 lines (+19/-8)2 files modifiedbs4/formatter.py (+13/-8)
bs4/tests/test_html5lib.py (+6/-0)
I like this idea a lot.
soupselect was incorporated into Beautiful Soup in 2012 so that a popular piece of functionality wouldn't be lost with the switch from BS3 to BS4. Since you've got a high-quality engine that provides the same functionality, and nearly all instances of Beautiful Soup are now installed through pip (with its easy dependency management), it makes sense to move that functionality out into an official dependency.
I will probably be maintaining BS4 on Python 2 past the official EOL, just because it's a piece of software designed for abnormal situations. But we can freeze the Python 2 releases of BS4 at the final version of Soup Sieve to support Soup Select.