Beautiful Soup

Bug #2052988
Comment #3

Comment 3 for bug 2052988

Revision history for this message

Matija Nalis (mnalis) wrote on 2024-02-13:

Thanks for that background, Leonard, it's much appreciated!

I can see why the change was done -- although I probably would've done it differently - e.g. only use special handling if the string starts with regex `^https?://` or `^.:\\` or ends with `\.[a-z]{3,4}$`). But as you said, there would always be some false positives when trying to "automagically" handle such values.

However, I was somewhat surprised that `warnings.filterwarnings` is the officially recommended way to handle it. I personally would only consider such ignoring of warnings as a quick kludge/workaround, and to be revisited as soon as properly fixed package is released. (IOW, IMHO warnings are something which one should find a root cause of and fix it, instead of ignoring them if they do not seem related to their case)

If one can get over the rudeness of the poster in mentioned issue, I'd too feel much cleaner solution would be something akin to `BeautifulSoup("http://example.com", force_html=True)` or `BeautifulSoup("http://example.com", ignore_urls=False)` or similar, to allow user to *explicitly* specify what handling they want.

While I get your concerns about documenting and supporting it, I'd find such solution much cleaner and preferable. `filterwarnings()` sounds almost as *dirty* as library dying in the middle of parsing, and caller having to handle it with try/except.

But as the saying goes, "nothing is ever hard for the man who doesn't have to do it himself", so I'll leave the final decision to you.