SyntaxError is no longer helpful when running unconverted code on Python 3
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
BS4 is not compatible with Python 3.3.
First there are some problems with print usage (like on the end of __init__.py)
if __name__ == '__main__':
import sys
soup = BeautifulSoup(
print soup.prettify()
Tried running 2to3 on it and still got the following error
Traceback (most recent call last):
File "C:\Users\
main()
File "C:\Users\
print(
File "C:\Users\
soup = bs4.BeautifulSo
File "C:\Python33\
self.
File "C:\Python33\
dammit = UnicodeDammit(
File "C:\Python33\
self.
File "C:\Python33\
xml_
TypeError: expected string or buffer
Running Python 3.3.2 64 on Windows 8
Python 3.3.2 (v3.3.2:
D64)] on win32
description: | updated |
Changed in beautifulsoup: | |
status: | Fix Committed → Fix Released |
This code is supposed to trigger a SyntaxError if you run the code under Python 3 without converting it:
# The very first thing we do is give a useful error if someone is
# running this code under Python 3 without converting it.
syntax_error = u'You are trying to run the Python 2 version of Beautiful Soup under Python 3. This will not work. You need to convert the code, either by installing it (`python setup.py install`) or by running 2to3 (`2to3 -w bs4`).'
The u'' construction is now valid in Python 3. This is good in general but it means that running the unconverted code under Python 3 will give an unhelpful error at the print statement.
The other error usually happens because the user passed something other than a string into Beautiful Soup. It's usually an HTTP response object or a parse tree from some other library. Examples:
http:// stackoverflow. com/questions/ 12478965/ cannot- run-beautifulso up-using- requests- geturl stackoverflow. com/questions/ 886381/ beautiful- soup-and- utidy
http://
In this case, "expected string or buffer" is an accurate error message, although it could be better. Please let me know if I'm wrong and "page" in your code example is a string. If this is the case, please show me the string.
I'm leaving this bug open for purposes of improving the error reporting.