Beautiful Soup

BeautifulSoup4 not reading local addresses

Bug #1407988 reported by Michael Courtney on 2015-01-06

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Beautiful Soup	Invalid	Undecided	Unassigned

Bug Description

When I try to create a soup of a locally stored HTML file with Beautiful Soup 4 I get raft of error messages which end with 'maximum recurison depth exceeded'.

The command BeautifulSoup(open("http://www.nytimes.com/")) creates a soup of that page

whereas if I download the file and send BeautifulSoup to the local address with

BeautifulSoup(open("file:///C:/The%20New%20York%20Times%20-%20Breaking%20News,%20World%20News%20%26%20Multimedia.html")

BeautifulSoup(open("C:/The%20New%20York%20Times%20-%20Breaking%20News,%20World%20News%20%26%20Multimedia.html")

I get the error messages.

I am using Python 3.4.2.

This error does not occur for Beautiful Soup 3.

Revision history for this message

Leonard Richardson (leonardr) wrote on 2015-06-24:

I can't duplicate this and I have a number of questions.

The core of the problem: I don't understand what your open() function does. You're using it like normal Python open() but it's acting more like urllib.urlopen(). Normal Python open() can't open http: or file: URLs. I'm not a Python 3 expert but that doesn't seem to have changed. Maybe it's different on Windows?

If you still have your saved copy of the file, please upload it as an attachment to this issue. The New York Times home page is one of the fastest-changing web pages in the world, and if the problem is caused by bad markup, that markup is probably long gone.

It would also be useful to see the raft of error messages you mentioned. This would help me determine if the problem is a problem with markup, a problem with your open(), or the Beautiful Soup constructor.

I would also like to know which parser backend you are using. Try passing 'html.parser', 'lxml', and 'html5lib' as the second argument to the BeautifulSoup constructor, and tell me if they all have the same problem.

What does the open() method return? What if you call read() on the return value of open() before passing it into Beautiful Soup?

Changed in beautifulsoup:
status:	New → Incomplete

Revision history for this message

Leonard Richardson (leonardr) wrote on 2016-12-11:

I'm closing this issue because it's gone a long time without the information that would help me diagnose the problem. In the interim a number of tree-builder fixes have been made that could have conceivably resolved the problem.

Changed in beautifulsoup:
status:	Incomplete → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.