UnicodeEncodeError on passing a unicode URL with non-ASCII chars

Bug #1640853 reported by Tobias Krönke
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Won't Fix
Undecided
Unassigned

Bug Description

Very simple to replicate (bug is in the warning call of _check_markup_is_url):

>>> BeautifulSoup(u'http://ü', features='lxml')
...Traceback...
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 8: ordinal not in range(128)

>>> bs4.__version__
'4.5.1'

python 2.7

Revision history for this message
Tobias Krönke (tobias-kroenke) wrote :

Might this be a bug of python? Don't have it with 2.7.11 (but with <=2.7.10), but also the warning is "swallowed". And more interestingly, if trying in interactive shell, only the first call will fail. Consecutive calls with the same string don't.

Revision history for this message
Leonard Richardson (leonardr) wrote :

This error happens when your default Python encoding is 'ascii' and you try to write non-ASCII characters to an output source that uses the default encoding (which the warnings module does). It's possible that Python 2.7.11 changes the default behavior, though I don't see anything in the changelog.

I hate this error, and although it's not Beautiful Soup's problem, I'm not opposed to changing Beautiful Soup to avoid it. But after running some experiments, I wasn't able to get a solution that worked on both Python 2 and Python 3. I'm going to mark this issue "Won't Fix" but if you can come up with a solution that works I'll revisit it.

Changed in beautifulsoup:
status: New → Won't Fix
Revision history for this message
Tobias Krönke (tobias-kroenke) wrote :

Would you consider to use `%r` instead of `"%s"`? That would fix it for me.

                warnings.warn(
                    '%r looks like a URL. Beautiful Soup is not an'
                    ' HTTP client. You should probably use an HTTP client like'
                    ' requests to get the document behind the URL, and feed'
                    ' that document to Beautiful Soup.' % decoded_markup
                )

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.