BeautifulSoup incorrectly warns me that I'm an idiot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
New
|
Undecided
|
Unassigned |
Bug Description
"UserWarning: "http://
Um, no. That's not what happened, I just happened to pass in user-generated content that looks like a URL.
There's nothing strange at all about this - a URL is also a perfectly well-formed piece of XML content.
You are wasting my cpu time doing this test and clogging up my logs with incorrect trash because you're wrongly assuming I don't know what I'm doing.
A software library's job is to do as it's told and get out of my way, not to uselessly tell me about some incorrect assumption the developer has made.
This warning is a bug because it's surprise behaviour. If you want this feature in the library, it should have to be explicitly enabled with some setting, or at the very least there should be a very simple way to disable this useless test and warning.
I'd suggest "dumbass_mode" as a good setting name ;).
Sorry if my lecture-ish tone offends, but you thought it was perfectly fine to condescendingly lecture me about the difference between a URL and a piece of HTML, so I say it's fine ;)
Beautiful Soup generally takes the approach of trying to give "helpful" error/warning codes so that a user understands why things are not working the way they expect. While every developer may have a different opinion on how helpful error/warnings should be done, Beautiful Soup has taken a more ambitious approach.
> There's nothing strange at all about this - a URL is also a perfectly well-formed piece of XML content.
It is *only* well-formed if it is inside an XML tag. Running a URL through the XML parser will yield nothing if not provided as content of an actual tag:
>>> from bs4 import BeautifulSoup Soup("http:// example. com", 'lxml-xml'))
>>> print(Beautiful
<?xml version="1.0" encoding="utf-8"?>
Now, if you are using an HTML parser, those are known to be quite forgiving and will often "correct" a user's HTML to be valid in some circumstances. HTML5lib, for instance, is known to do this quite heavily and matches more closely to how modern browsers work.
Generally, Beautiful Soup expects you are feeding it proper content within tags, not stray text fragments. The fact that some of the HTML parsers will take it does not change this fact. As noted above, XML will do nothing with it. That is not based on Beautiful Soup's behavior, but the underlying lxml parser when in XML mode.
So, personally, I think the warning is fine. While I am not the maintainer of Beautiful Soup, I do maintain a number of open-source libraries (such as the CSS selector library used by Beautiful Soup), and anything that helps me not answer the same question over and over again when someone uses the tool in an unintended way I view as a good thing.
It may annoy you that Beautiful Soup alerts you to something you think you know, but from a maintainer's perspective that understands exactly why such a warning is there, no doubt due to the same question being asked over and over, it makes perfect sense to me. And anything that makes the maintainer's life easier is okay by me. If you do not like it, I imagine you can fork it and maintain a version that does not annoy you so much.
Considering that open source maintainers are often doing this in their free time at no cost to you, I would consider checking your tone when making a request.