diagnose() crashes when passed the name of an existing directory

Bug #2007344 reported by adashwy
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

Hi, I find a small bug in diagnose() method from diagnose.py.
I have listed below the payload, crash information, crash causes, and fix suggestions.

The code is
from bs4.diagnose import diagnose
diagnose(".")

The crash information is
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    diagnose(".")
  File "/home/server1/.cache/pypoetry/virtualenvs/pyfuzzgen-A1dD_9Bu-py3.8/lib/python3.8/site-packages/bs4/diagnose.py", line 70, in diagnose
    with open(data) as fp:
IsADirectoryError: [Errno 21] Is a directory: '.'

The crash cause is,
the diagnose function in the bs4 package does not handle the input data correctly and will report an error. In the following code, it only determines whether the path exists, but not whether the path is a file or a directory.
try:
  if os.path.exists(data):
    print(('"%s" looks like a filename. Reading data from the file.' % data))
    with open(data) as fp:
      data = fp.read()
except ValueError:
  # This can happen on some platforms when the 'filename' is
  # too long. Assume it's data and not a filename.
  pass

The fix suggestion is
try:
- if os.path.exists(data):
+ if os.path.isfile(data):
    print(('"%s" looks like a filename. Reading data from the file.' % data))
    with open(data) as fp:
      data = fp.read()
except ValueError:
  # This can happen on some platforms when the 'filename' is
  # too long. Assume it's data and not a filename.
  pass

Revision history for this message
Leonard Richardson (leonardr) wrote :

This code can simply be removed. I added very similar checking to the BeautifulSoup constructor, and I also decided not to prevent a user from parsing any string they really want to parse, so markup that looks like a URL or filename now only results in a warning.

021d430 is the revision that removes this redundant check.

Changed in beautifulsoup:
status: New → Fix Committed
summary: - IsADirectoryError
+ diagnose() crashes when passed the name of an existing directory
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.