Replace select() implementation with Soup Sieve dependency

Bug #1809035 reported by Isaac Muse
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

I decided to create an issue to discuss the possibility of replace the current "select" implementation.

Let me start off by saying there would be no hard feelings if the idea was rejected, I am more just making
a proposal in order to get an official answer on whether I should work towards this goal or not.

It seems that the current select implementation was user contributed and doesn't get much support as it is
fixed via pull requests only. I can understand as the code is a bit complex in its approach.

My proposal is to use an external library that I wrote called Soup Sieve. This would push support for CSS
selectors outside of Beautiful Soup, and bring in a number of new selectors and bug fixes.

I have all bs4 select tests passing with Soup Sieve 1.1 except `:nth-of-type(0)` which bs4 expects to fail. In SoupSieve this is completely valid and simply returns no tags. The only changes made to other tests were to capture SyntaxErrors instead of ValueErrors. In addition to passing all the tests, it also fixes a number of currently open CSS bugs in the bs4 issue tracker.

This is basicall the change:

    def select(self, selector, namespaces=None, limit=None):
        """Perform a CSS selection operation on the current element."""

        return soupsieve.select(selector, self, namespaces, 0 if limit is None else limit)

In order for Soup Sieve to be used though, I assume I would have to add Python 2 support. Soup Sieve is currently tested on Python 3.4+. If it is desired to have Beautiful Soup use Soup Sieve, I would add
Python 2.7 support up until the time that Beautiful Soup drops Python 2 (hopefully at Python EOL in 2020 :)).

If Beautiful Soup wanted to have more control over the "select" versions, it could be vendored in the package if desired. Then which version is used could be controlled.

I guess you could even make selector logic as an optional feature. If Soup Sieve is not installed, no select. You could even fallback to the old version.

Soup Sieve doesn't require being implemented inside Beautiful Soup and can live on as a separate companion
package, but I think it could solve a number of CSS selector headaches here. This would allow you to just
send users upstream for CSS related issues.

Related branches

Revision history for this message
Leonard Richardson (leonardr) wrote :

I like this idea a lot.

soupselect was incorporated into Beautiful Soup in 2012 so that a popular piece of functionality wouldn't be lost with the switch from BS3 to BS4. Since you've got a high-quality engine that provides the same functionality, and nearly all instances of Beautiful Soup are now installed through pip (with its easy dependency management), it makes sense to move that functionality out into an official dependency.

I will probably be maintaining BS4 on Python 2 past the official EOL, just because it's a piece of software designed for abnormal situations. But we can freeze the Python 2 releases of BS4 at the final version of Soup Sieve to support Soup Select.

Revision history for this message
Leonard Richardson (leonardr) wrote :

s/to support Soup Select/to support Python 2/

Revision history for this message
Isaac Muse (facelessuser) wrote :

Great! I will work towards getting Python 2 support added, and then we can work towards getting it integrated.

I'll touch back once I have PY2/PY3 functionality.

Revision history for this message
Isaac Muse (facelessuser) wrote :

Soup Sieve 1.2 is released with Python 2 support. I've attached a patch that replaces the current select with SoupSieve's select. I made it from a local git repo of the recent revision. I'm happy to submit a pull request, but I'll need a little guidance as I'm not very familiar with how to properly fork/branch and submit. I've installed bzr on my mac, but I need to figure out how to use it. If you're able to give some basic commands so I can submit this patch formally, I'm happy to give it a go.

Revision history for this message
Isaac Muse (facelessuser) wrote :

This should close out the following issues. If for whatever reason the bugs still manifest, they can be referred to the Soup Sieve Issues page.

#1507608
#1664660
#1684968
#1692137
#1717850
#1717851
#1607476

I'm not really sure what this one is about: #1808797. If only one is wanted, I figure "select_one" can be used or "select(selector, limit=1)". And if limit == 1, a pattern such as "ol, ul" will not return 1 "ol" *and* one "ul", it will return either "ol" or "ul", whichever comes first. It can probably be closed right along with all the others. They can be redirected to Soup Sieve's Issues if they'd like to discuss it, but I have no plans of altering the behavior. They can implement their own "select" with Soup Sieve as I expose the 'match' command in the API which select is built on top of.

Revision history for this message
Isaac Muse (facelessuser) wrote :

I think I figured out setting up a merge request. Hopefully I did it correctly. Merge request available at https://code.launchpad.net/~facelessuser/beautifulsoup/beautifulsoup/+merge/361174.

Revision history for this message
Leonard Richardson (leonardr) wrote :

This will be in the 4.7.0 release. Isaac, should I refer to the project as Soup Sieve or SoupSieve? I've seen it both ways. (Beautiful Soup has the same problem -- most people call it BeautifulSoup.)

summary: - Possibility of replace current select implementation?
+ Replace select() implementation with Soup Sieve dependency
Changed in beautifulsoup:
status: New → Fix Committed
Revision history for this message
Isaac Muse (facelessuser) wrote :

I just saw this. I actually call it "Soup Sieve", but it's fine either way.

Revision history for this message
Leonard Richardson (leonardr) wrote :

In 4.7.0 release.

Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.