BeautifulSoup4 not reading local addresses

Bug #1407988 reported by Michael Courtney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Invalid
Undecided
Unassigned

Bug Description

When I try to create a soup of a locally stored HTML file with Beautiful Soup 4 I get raft of error messages which end with 'maximum recurison depth exceeded'.

The command BeautifulSoup(open("http://www.nytimes.com/")) creates a soup of that page

whereas if I download the file and send BeautifulSoup to the local address with

BeautifulSoup(open("file:///C:/The%20New%20York%20Times%20-%20Breaking%20News,%20World%20News%20%26%20Multimedia.html")

or

BeautifulSoup(open("C:/The%20New%20York%20Times%20-%20Breaking%20News,%20World%20News%20%26%20Multimedia.html")

I get the error messages.

I am using Python 3.4.2.

This error does not occur for Beautiful Soup 3.

Revision history for this message
Leonard Richardson (leonardr) wrote :

I can't duplicate this and I have a number of questions.

The core of the problem: I don't understand what your open() function does. You're using it like normal Python open() but it's acting more like urllib.urlopen(). Normal Python open() can't open http: or file: URLs. I'm not a Python 3 expert but that doesn't seem to have changed. Maybe it's different on Windows?

If you still have your saved copy of the file, please upload it as an attachment to this issue. The New York Times home page is one of the fastest-changing web pages in the world, and if the problem is caused by bad markup, that markup is probably long gone.

It would also be useful to see the raft of error messages you mentioned. This would help me determine if the problem is a problem with markup, a problem with your open(), or the Beautiful Soup constructor.

I would also like to know which parser backend you are using. Try passing 'html.parser', 'lxml', and 'html5lib' as the second argument to the BeautifulSoup constructor, and tell me if they all have the same problem.

What does the open() method return? What if you call read() on the return value of open() before passing it into Beautiful Soup?

Changed in beautifulsoup:
status: New → Incomplete
Revision history for this message
Leonard Richardson (leonardr) wrote :

I'm closing this issue because it's gone a long time without the information that would help me diagnose the problem. In the interim a number of tree-builder fixes have been made that could have conceivably resolved the problem.

Changed in beautifulsoup:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.