Thanks for taking the time to file this issue. The problem here is in Python 3's html.parser library. Here's a script that duplicates the error without using any Beautiful Soup code:
--- from html.parser import HTMLParser import warnings
bad_markup = '\nht\\ntouchendml><Body>\x7fÿÿÿd>\x00\x02i>-ulYt<trt><<dt>><Õre</li><![- -<\x10</hlre><hr>mlonreset>\n<dt><p®e><hr>'
class MyParser(HTMLParser): def error(self, msg): warnings.warn(msg)
parser = MyParser() parser.feed(bad_markup) ---
Someone else filed this issue against Python last year: https://bugs.python.org/issue37747
I've updated it with a link to this ticket and a copy of my duplication script.
Thanks for taking the time to file this issue. The problem here is in Python 3's html.parser library. Here's a script that duplicates the error without using any Beautiful Soup code:
---
from html.parser import HTMLParser
import warnings
bad_markup = '\nht\\ ntouchendml> <Body>\ x7fÿÿÿd> \x00\x02i> -ulYt<trt> <<dt>>< Õre</li> <![- -<\x10< /hlre>< hr>mlonreset> \n<dt>< p®e><hr> '
class MyParser( HTMLParser) :
warnings. warn(msg)
def error(self, msg):
parser = MyParser() feed(bad_ markup)
parser.
---
Someone else filed this issue against Python last year: https:/ /bugs.python. org/issue37747
I've updated it with a link to this ticket and a copy of my duplication script.