Confusing exception when non-markup passed into BeautifulSoup constructor

Bug #2071530 reported by Thomas E Tresch
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Committed
Undecided
Unassigned

Bug Description

D:\tresc\pyTest\Scraper>python scraper.py
<bound method HTTPResponse.read of <http.client.HTTPResponse object at 0x032758B0>>
Traceback (most recent call last):
  File "D:\tresc\pyTest\Scraper\scraper.py", line 22, in <module>
    Scraper(news).scrape()
  File "D:\tresc\pyTest\Scraper\scraper.py", line 13, in scrape
    sp = BeautifulSoup(xml, parser)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\tresc\pyTest\Lib\site-packages\bs4\__init__.py", line 315, in __init__
    elif len(markup) <= 256 and (
         ^^^^^^^^^^^
TypeError: object of type 'method' has no len()

Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for taking the time to file this issue. Based on the first line of your program's output, you are passing a method (http.client.HTTPResponse.read) into the BeautifulSoup constructor as the `markup` argument. This won't work. The contract of the BeautifulSoup constructor is that `markup` can be "a string or an open filehandle." (https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup)

Calling the read() method would work (it returns a string), and passing in the HTTPResponse object would work (it acts like a filehandle), but passing in a method isn't supported.

For more information, see this Stack Overflow question:
https://stackoverflow.com/questions/57301959/beautiful-soup-error-object-of-type-method-has-no-len

I'm leaving this issue open until I decide whether or not to make a change improving the error message in situations where `markup` doesn't fulfill the necessary contract.

Revision history for this message
Leonard Richardson (leonardr) wrote :

I've improved the exception message (revision 8abc137 in the 4.13 branch).

summary: - bs4 __init__ doesn't like the len(markup) ; says module "markup" is a
- method
+ Confusing exception when non-markup passed into BeautifulSoup
+ constructor\
summary: Confusing exception when non-markup passed into BeautifulSoup
- constructor\
+ constructor
Changed in beautifulsoup:
status: New → Fix Committed
Revision history for this message
maverickwhites (maverickwhites) wrote (last edit ):

Fix for BeautifulSoup Issue

Issue:
You passed `self.news.read` (a method) instead of calling it to get the content for BeautifulSoup.

Fix:
Call the method to get the content:

Before:
```python
xml = self.news.read # Incorrect
sp = BeautifulSoup(xml, parser)
```

After:
```python
xml = self.news.read() # Correct
sp = BeautifulSoup(xml, parser)
```

Improvement:
Check if `xml` is a string before passing it:
```python
xml = self.news.read()
if not isinstance(xml, str):
    raise TypeError("Expected a string for BeautifulSoup markup")
sp = BeautifulSoup(xml, parser)
```

For more information on similar topics, check out Grass fed beef https://meathousegourmet.com/collections/beef.

Revision history for this message
Leonard Richardson (leonardr) wrote :

The previous comment is unhelpful spam generated by an LLM. I don't have a way to report this so I'm making a note in the issue itself.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.