Comment 1 for bug 1955450

Revision history for this message
Isaac Muse (facelessuser) wrote :

Beautiful Soup generally takes the approach of trying to give "helpful" error/warning codes so that a user understands why things are not working the way they expect. While every developer may have a different opinion on how helpful error/warnings should be done, Beautiful Soup has taken a more ambitious approach.

> There's nothing strange at all about this - a URL is also a perfectly well-formed piece of XML content.

It is *only* well-formed if it is inside an XML tag. Running a URL through the XML parser will yield nothing if not provided as content of an actual tag:

>>> from bs4 import BeautifulSoup
>>> print(BeautifulSoup("http://example.com", 'lxml-xml'))
<?xml version="1.0" encoding="utf-8"?>

Now, if you are using an HTML parser, those are known to be quite forgiving and will often "correct" a user's HTML to be valid in some circumstances. HTML5lib, for instance, is known to do this quite heavily and matches more closely to how modern browsers work.

Generally, Beautiful Soup expects you are feeding it proper content within tags, not stray text fragments. The fact that some of the HTML parsers will take it does not change this fact. As noted above, XML will do nothing with it. That is not based on Beautiful Soup's behavior, but the underlying lxml parser when in XML mode.

So, personally, I think the warning is fine. While I am not the maintainer of Beautiful Soup, I do maintain a number of open-source libraries (such as the CSS selector library used by Beautiful Soup), and anything that helps me not answer the same question over and over again when someone uses the tool in an unintended way I view as a good thing.

It may annoy you that Beautiful Soup alerts you to something you think you know, but from a maintainer's perspective that understands exactly why such a warning is there, no doubt due to the same question being asked over and over, it makes perfect sense to me. And anything that makes the maintainer's life easier is okay by me. If you do not like it, I imagine you can fork it and maintain a version that does not annoy you so much.

Considering that open source maintainers are often doing this in their free time at no cost to you, I would consider checking your tone when making a request.