Beautiful Soup

Bug #1955450
Comment #3

Comment 3 for bug 1955450

Revision history for this message

Isaac Muse (facelessuser) wrote on 2021-12-21:

> Interestingly, the default parser would seem to disagree with your assessment on what is and isn't valid:

Sigh, I very clearly stated that HTML parsers are generally more forgiving. And yes, I know what a text node is, and they are wrapped in tags. That's also not the default parser, that is most likely lxml or html5lib. The default parser that ships with Python just gives you back the URL, which isn't even valid HTML.

>>> print(BeautifulSoup("http://example.com", 'html.parser'))
/usr/local/lib/python3.9/site-packages/bs4/__init__.py:431: MarkupResemblesLocatorWarning: "http://example.com" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an HTTP client like requests to get the document behind the URL, and feed that document to Beautiful Soup.
warnings.warn(
http://example.com

> A text node is perfectly valid.

Not in XML, which has a very strict spec. Text nodes are perfectly valid within the context of a tag only. This is also true in HTML as well, but most browsers are very forgiving, and some parsers mimic that forgiving behavior as well.

> Aah, the good old "you can always fork if you don't like it" panacea,

No, that is simply the response I give to entitled people who cannot communicate like adults. I'm not sure why you think people are going to be amenable to child-like rantings.

Again, I'm not the maintainer, so I'm moving on. Maybe Leonard will have more patience with you than me :).