Selector grouping in select() does not meet the spec

Bug #1484543 reported by Orangain on 2015-08-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Undecided
Unassigned

Bug Description

Selector grouping in select() introduced in Beautiful Soup 4.4.0 is a great improvement, but as far as I know, it does not meet the spec. For example, a selector "x y, z" means ("x y" OR "z"), but select() try to find ("x y" OR "x z").

Some test codes, e.g. [1], have a wrong expected-value:

    def test_multiple_select(self):
        self.assertSelects('x, y > z', ['zida', 'zidb', 'zidab', 'zidac'])

This should be:

    def test_multiple_select(self):
        self.assertSelects('x, y > z', ['xid', 'zidb'])

Though I know Beautiful Soup does not aim to support the entire CSS Selector spec, this behavior is value confusing.

Compared with the CSS2 spec[2] referred in #1191917, the CSS3 spec[3] provides more detailed explanation as following:

> A comma-separated list of selectors represents the union of all elements selected by each of the individual selectors in the list.

More strict syntax is defined in [4]. The below is an excerpt. As defined, order of COMMA is the lowest.

> selectors_group
> : selector [ COMMA S* selector ]*
> ;
>
> selector
> : simple_selector_sequence [ combinator simple_selector_sequence ]*
> ;
>
> combinator
> /* combinators can be surrounded by whitespace */
> : PLUS S* | GREATER S* | TILDE S* | S+
> ;
>
> simple_selector_sequence
> : [ type_selector | universal ]
> [ HASH | class | attrib | pseudo | negation ]*
> | [ HASH | class | attrib | pseudo | negation ]+
> ;
>
> ...
>
> {w}"," return COMMA;

[1] http://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/bs4/tests/test_tree.py#L1956
[2] http://www.w3.org/TR/CSS2/selector.html#grouping
[3] http://www.w3.org/TR/css3-selectors/#grouping
[4] http://www.w3.org/TR/css3-selectors/#w3cselgrammar

My environment:

* Beautiful Soup 4.4.0 with html.parser
* Python 3.4.2

Orangain (orangain) wrote :

The attached patch fix the issue.
Though the number of affected lines is large, most of the diff are un-indentaion.

Leonard Richardson (leonardr) wrote :

Patch applied in revision 394.

Changed in beautifulsoup:
status: New → Fix Committed
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers