Beautiful Soup

Bug #1955450
Comment #5

Comment 5 for bug 1955450

Revision history for this message

Leonard Richardson (leonardr) wrote on 2021-12-21 (last edit on 2021-12-21):

Dale,

Before instituting this warning, I got many support requests from people who didn't understand why passing a filename or URL into the BeautifulSoup constructor doesn't read the file or download the URL. I don't think these people are idiots, but there's a particular thing they didn't understand, and they couldn't continue their work without an understanding.

As a maintainer, I can't be there with everyone using the library, so to handle large numbers of support requests on a given theme, I have to change the software's behavior for everyone. When I do, I have two choices: take it on myself to just make Beautiful Soup work in all situations, or add a warning that gives an explanation.

"Make it work in all situations" is a non-starter here because there _is_ no correct behavior for all situations. Most people who run BeautifulSoup("http://domain/") want, on a high level, to download the representation of that URL and parse it. But some people, like you, really do want to parse the URL as markup.

That leaves the other option: add a warning giving an explanation. To quote the documentation of Python's 'warnings' module (https://docs.python.org/3/library/warnings.html):

"Warning messages are typically issued in situations where it is useful to alert the user of some condition in a program, where that condition (normally) doesn’t warrant raising an exception and terminating the program."

That fits the situation here. It's useful to alert the user as to precisely what will happen when the code they wrote is executed, because most users of Beautiful Soup don't intend that behavior, but it doesn't warrant raising an exception, because some users _do_ intend it.

When the behavior is intentional, the warning is irrelevant and -- as you discovered -- can read as condescending. The last time someone brought this up was in bug 1873787 (https://bugs.launchpad.net/beautifulsoup/+bug/1873787). The case was very similar to yours: Beautiful Soup was being used to process text entered by users of another application, not text input by the programmer.

Bug 1873787 has a longer explanation of my thinking based on the "When to use logging" section of the Python documentation (https://docs.python.org/3/howto/logging.html#when-to-use-logging). In the end, I made warnings of this type instances of a distinct class, MarkupResemblesLocatorWarning. This allows you to use Python's standard mechanisms to filter out warnings you know to be irrelevant to your application:

---
from bs4 import BeautifulSoup, MarkupResemblesLocatorWarning
import warnings

warnings.filterwarnings("ignore", category=MarkupResemblesLocatorWarning)
BeautifulSoup("http://domain/") # no warning
---

This meets your request for a simple way to disable the warning.

I'm not going to add an option to disable the test itself, because the time saved is not worth the additional API complexity. If your application is performance-sensitive to the point that this test is a serious issue for you, I recommend you write your application directly against lxml's HTML parser, which is much faster than lxml plus Beautiful Soup.

Dale,

That leaves the other option: add a warning giving an explanation. To quote the documentation of Python's 'warnings' module (https://docs.python.org/3/library/warnings.html):

"Warning messages are typically issued in situations where it is useful to alert the user  of some condition in a program, where that condition (normally) doesn’t warrant raising an exception and terminating the program."

---
from bs4 import BeautifulSoup, MarkupResemblesLocatorWarning
import warnings

warnings.filterwarnings("ignore", category=MarkupResemblesLocatorWarning)
BeautifulSoup("http://domain/") # no warning
---

This meets your request for a simple way to disable the warning.