Unknown Status Keyword in ParserBase raises Python Not Implemented
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I am using Python 3.6 I do have lxml 3.8.0 installed
I'm using requests to get response content and parsing with soup.
Soup is set to use "html.parser"
I'm sorry I don't know what URL caused the issue because it runs as a batch process that I'm using to crawl some web sites. I will try different parser settings to see if I get the same error or not.
The implementation of soup is here in this code snip.
response = requests.get(url,
if response.
soup = BeautifulSoup(
-------
Traceback (most recent call last):
File "/usr/local/
"__main__", mod_spec)
File "/usr/local/
exec(code, run_globals)
File "crawler/
main()
File "crawler/
crawl_
File "crawler/
start_
File "crawler/
result = get_domain_
File "/home/
soup = BeautifulSoup(
File "/usr/local/
self._feed()
File "/usr/local/
self.
File "/usr/local/
parser.
File "/usr/local/
self.goahead(0)
File "/usr/local/
k = self.parse_
File "/usr/local/
return self.parse_
File "/usr/local/
self.
File "/usr/local/
"subclasses of ParserBase must override error()")
NotImplementedE
Changed in beautifulsoup: | |
status: | Fix Committed → Fix Released |
A number of others have reported this problem in the past year (https:/ /groups. google. com/forum/ #!topic/ beautifulsoup/ EFNH2oxOX4A, https:/ /stackoverflow. com/questions/ 49786893/ python- beautifulsoup- error-while- scraping) but none of the reports included the specific markup that caused the problem, and I haven't been able to able to duplicate it. However the solution is pretty clear -- BeautifulSoupHT MLParser should implement error() and do something with the error message rather than raise an exception.
This change is in revision 454. Since I can't reproduce the issue I can't guarantee that Beautiful Soup will turn such a document into anything useful, but it will no longer raise an exception.