diagnose() crashes when passed the name of an existing directory
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Hi, I find a small bug in diagnose() method from diagnose.py.
I have listed below the payload, crash information, crash causes, and fix suggestions.
The code is
from bs4.diagnose import diagnose
diagnose(".")
The crash information is
Traceback (most recent call last):
File "test.py", line 3, in <module>
diagnose(".")
File "/home/
with open(data) as fp:
IsADirectoryError: [Errno 21] Is a directory: '.'
The crash cause is,
the diagnose function in the bs4 package does not handle the input data correctly and will report an error. In the following code, it only determines whether the path exists, but not whether the path is a file or a directory.
try:
if os.path.
print(('"%s" looks like a filename. Reading data from the file.' % data))
with open(data) as fp:
data = fp.read()
except ValueError:
# This can happen on some platforms when the 'filename' is
# too long. Assume it's data and not a filename.
pass
The fix suggestion is
try:
- if os.path.
+ if os.path.
print(('"%s" looks like a filename. Reading data from the file.' % data))
with open(data) as fp:
data = fp.read()
except ValueError:
# This can happen on some platforms when the 'filename' is
# too long. Assume it's data and not a filename.
pass
Changed in beautifulsoup: | |
status: | Fix Committed → Fix Released |
This code can simply be removed. I added very similar checking to the BeautifulSoup constructor, and I also decided not to prevent a user from parsing any string they really want to parse, so markup that looks like a URL or filename now only results in a warning.
021d430 is the revision that removes this redundant check.