Comment 2 for bug 1883104

Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for taking the time to file this issue. The problem here is in Python 3's html.parser library. Here's a script that duplicates the error without using any Beautiful Soup code:

---
from html.parser import HTMLParser
import warnings

bad_markup = '\nht\\ntouchendml><Body>\x7fÿÿÿd>\x00\x02i>-ulYt<trt><<dt>><Õre</li><![- -<\x10</hlre><hr>mlonreset>\n<dt><p®e><hr>'

class MyParser(HTMLParser):
    def error(self, msg):
        warnings.warn(msg)

parser = MyParser()
parser.feed(bad_markup)
---

Someone else filed this issue against Python last year: https://bugs.python.org/issue37747

I've updated it with a link to this ticket and a copy of my duplication script.