Comment 2 for bug 2058508

Revision history for this message
Leonard Richardson (leonardr) wrote :

This is a strange failure, since superficially similar tests like test_entities_converted_on_the_way_out don't have the problem.

Are you running this against Python 3.9.19, which was released yesterday? Are you also running it against any other Python releases, with or without this problem? I can't duplicate your setup exactly, but I created a fresh 3.9.19 environment that resembles yours, and ran the test suite successfully.

Here's a diagnostic I'd like you to run:

---
data = b"<p>\x91Foo\x92</p>"
from bs4.diagnose import diagnose
diagnose(data)

print("\nBEGINNING HTMLPARSER TRACE")
from bs4.diagnose import htmlparser_trace
from bs4.dammit import UnicodeDammit
u = UnicodeDammit(data).unicode_markup
print(u)
htmlparser_trace(u)
---

This will show what markup the html.parser parser is receiving and how it handles that markup.