SyntaxError is no longer helpful when running unconverted code on Python 3

Bug #1213387 reported by Augusto Santos on 2013-08-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Undecided
Unassigned

Bug Description

BS4 is not compatible with Python 3.3.
First there are some problems with print usage (like on the end of __init__.py)

if __name__ == '__main__':
    import sys
    soup = BeautifulSoup(sys.stdin)
    print soup.prettify()

Tried running 2to3 on it and still got the following error

Traceback (most recent call last):
  File "C:\Users\AntonioAugusto\Google Drive\Migrado Dropbox\Game Searcher - Shell\busca_jogos.py", line 537, in <module>
    main()
  File "C:\Users\AntonioAugusto\Google Drive\Migrado Dropbox\Game Searcher - Shell\busca_jogos.py", line 510, in main
    print(getCotacao())
  File "C:\Users\AntonioAugusto\Google Drive\Migrado Dropbox\Game Searcher - Shell\busca_jogos.py", line 40, in getCotacao
    soup = bs4.BeautifulSoup(page)
  File "C:\Python33\lib\site-packages\bs4\__init__.py", line 169, in __init__
    self.builder.prepare_markup(markup, from_encoding))
  File "C:\Python33\lib\site-packages\bs4\builder\_htmlparser.py", line 141, in prepare_markup
    dammit = UnicodeDammit(markup, try_encodings, is_html=True)
  File "C:\Python33\lib\site-packages\bs4\dammit.py", line 228, in __init__
    self._detectEncoding(markup, is_html)
  File "C:\Python33\lib\site-packages\bs4\dammit.py", line 397, in _detectEncoding
    xml_encoding_match = xml_encoding_re.match(xml_data)
TypeError: expected string or buffer

Running Python 3.3.2 64 on Windows 8
Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AM
D64)] on win32

Augusto Santos (mkhaos7) on 2013-08-17
description: updated
Leonard Richardson (leonardr) wrote :

This code is supposed to trigger a SyntaxError if you run the code under Python 3 without converting it:

# The very first thing we do is give a useful error if someone is
# running this code under Python 3 without converting it.
syntax_error = u'You are trying to run the Python 2 version of Beautiful Soup under Python 3. This will not work. You need to convert the code, either by installing it (`python setup.py install`) or by running 2to3 (`2to3 -w bs4`).'

The u'' construction is now valid in Python 3. This is good in general but it means that running the unconverted code under Python 3 will give an unhelpful error at the print statement.

The other error usually happens because the user passed something other than a string into Beautiful Soup. It's usually an HTTP response object or a parse tree from some other library. Examples:

http://stackoverflow.com/questions/12478965/cannot-run-beautifulsoup-using-requests-geturl
http://stackoverflow.com/questions/886381/beautiful-soup-and-utidy

In this case, "expected string or buffer" is an accurate error message, although it could be better. Please let me know if I'm wrong and "page" in your code example is a string. If this is the case, please show me the string.

I'm leaving this bug open for purposes of improving the error reporting.

summary: - Print function on __init__.py not compatible with Python 3
+ SyntaxError is no longer helpful when running unconverted code on Python
+ 3
Leonard Richardson (leonardr) wrote :

Due to other work, the error message will now be something more helpful, like "TypeError: object of type 'Response' has no len()". This makes it clear that you are passing the wrong kind of object into BS.

Still got a problem with silence when Python 2 code imported under Python 3.

Leonard Richardson (leonardr) wrote :

I found another syntax error: the <> operator is present in Python 2 but not in Python 3. So this can be fixed now.

Changed in beautifulsoup:
status: New → Fix Committed
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers