Beautiful Soup

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #1883104
Comment #2

Comment 2 for bug 1883104

Revision history for this message

Leonard Richardson (leonardr) wrote on 2020-06-11:

Thanks for taking the time to file this issue. The problem here is in Python 3's html.parser library. Here's a script that duplicates the error without using any Beautiful Soup code:

---
from html.parser import HTMLParser
import warnings

bad_markup = '\nht\\ntouchendml><Body>\x7fÿÿÿd>\x00\x02i>-ulYt<trt><<dt>><Õre</li><![- -<\x10</hlre><hr>mlonreset>\n<dt><p®e><hr>'

class MyParser(HTMLParser):
def error(self, msg):
warnings.warn(msg)

parser = MyParser()
parser.feed(bad_markup)
---

Someone else filed this issue against Python last year: https://bugs.python.org/issue37747

I've updated it with a link to this ticket and a copy of my duplication script.