Beautiful Soup

Bug #1828188
Comment #1

Comment 1 for bug 1828188

Revision history for this message

Leonard Richardson (leonardr) wrote on 2019-05-08:

Thanks for writing with this suggestion. I'm assuming you're not at a loss as to how to do this; you just think it could be easier. Things could always be easier, but I don't think this is the way to do it.

In Beautiful Soup, soup.a.href already has a meaning -- it directs Beautiful Soup to extract the <href> tag from this markup:

That doesn't make sense for HTML markup, but that could be a valid XML document. That's the first issue -- the dot operator already means something, and changing what it means would break a lot of scripts.

The other issue is that I think it's very important that all the Beautiful Soup operators and methods have a _consistent_ meaning. To make soup.a.href return the href _attributes_ of the each <a> _tag_, the dot operator would have to change its meaning halfway through a line of code. It would start out meaning "get all the <a> tags" (which is itself different from its current meaning) and then start meaning "get all the 'href' values for each tag in this list".

I'd have to come up with rules about when the dot operator means 'find the first child tag' (as it does now), when it means 'find all the child tags', and when it means 'find the value of an attribute'. All of these things are already part of the Beautiful Soup API under separate names, so the library would get no new capabilities. There would be many disagreements about which meaning should apply in which case, which I'd have to judge.

There are similar cases, like bug #1768330, which deals with text extraction, where the potential payoff might be worth adding extra complexity. Even then I'm very reluctant to add that complexity. In this case, it's already pretty easy to find all the href attributes in all the <a> tags. Making it into a simple one-liner doesn't seem worth the disruption it would cause to the API as a whole.