BeautifulSoup Selector Doesn't Support Non-ASCII characters
Bug #1455778 reported by
Lumit
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Platform: Win8.1 64bit, Spyder IDE with Anaconda distribution
Reproduce the bug:
1.Open the ipython pane in Spyder IDE
2.Type the following codes:
>>> import requests, bs4
>>> res = requests.get('http://
>>> res.raise_
>>> noStarchSoup = bs4.BeautifulSo
>>> noStarchSoup.
...(lots of traceback )
...UnicodeDecod
To post a comment you must log in.
Thanks for your bug report. This is potentially a very serious bug. Unfortunately there's not enough information here for me to solve the problem.
1. I need the actual markup that caused the problem. Websites change all the time. I can't duplicate your bug and I don't know if it's because http:// nostarch. com has changed or if there's some other problem.
I realize that the instructions for filing bugs against Beautiful Soup said "at least mention the URL to the web page", so this is my fault. As of this bug I've changed those instructions to insist on the actual HTML.
If you still encounter this issue on http:// nostarch. com, or you ever encounter it again, please upload an attachment containing the actual HTML you're feeding to Beautiful Soup.
2. I need to know which version of Beautiful Soup you're using (bs4.__version__). Maybe Anaconda has an old version of Beautiful Soup packaged and you're encountering a problem I've already fixed? I don't know.
3. Since you edited out the traceback, I have no idea where in Beautiful Soup the problem happened, so I can't go through the code to try and spot an obvious problem.
Thanks again for filing this bug.