4.12.3: pytest fails in bs4/tests/test_htmlparser.py::TestHTMLParserTreeBuilder::test_smart_quotes_converted_on_the_way_in

Bug #2058508 reported by Tomasz Kloczko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
New
Undecided
Unassigned

Bug Description

I'm packaging your module as an rpm package so I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.
- `python3 -sBm build -w --no-isolation`
- because I'm calling `build` with `--no-isolation` I'm using during all processes only locally installed modules
- install .whl file in </install/prefix> using `installer` module
- run pytest with $PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>
- build is performed in env which is *`cut off from access to the public network`* (pytest is executed with `-m "not network"`)

```console
+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-beautifulsoup4-4.12.3-3.fc36.x86_64/usr/lib64/python3.9/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-beautifulsoup4-4.12.3-3.fc36.x86_64/usr/lib/python3.9/site-packages
+ /usr/bin/pytest -ra -m 'not network'
============================= test session starts ==============================
platform linux -- Python 3.9.18, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/tkloczko/rpmbuild/BUILD/beautifulsoup4-4.12.3
configfile: pyproject.toml
collected 664 items

bs4/tests/test_builder.py ..... [ 0%]
bs4/tests/test_builder_registry.py ........... [ 2%]
bs4/tests/test_css.py .................................................. [ 9%]
.......... [ 11%]
bs4/tests/test_dammit.py .................................. [ 16%]
bs4/tests/test_element.py ..... [ 17%]
bs4/tests/test_formatter.py .............. [ 19%]
bs4/tests/test_fuzz.py ssssssssssssssssss [ 22%]
bs4/tests/test_html5lib.py sssssssssssssssssssssssssssssssssssssssssssss [ 28%]
ssssssssssssssssssssssssssssssssssssss [ 34%]
bs4/tests/test_htmlparser.py ........................................... [ 41%]
...........F.................... [ 45%]
bs4/tests/test_lxml.py ................................................. [ 53%]
.................................................... [ 61%]
bs4/tests/test_navigablestring.py ........ [ 62%]
bs4/tests/test_pageelement.py .................................... [ 67%]
bs4/tests/test_soup.py ................................................. [ 75%]
......... [ 76%]
bs4/tests/test_tag.py ........................ [ 80%]
bs4/tests/test_tree.py ................................................. [ 87%]
........................................................................ [ 98%]
........... [100%]

=================================== FAILURES ===================================
_____ TestHTMLParserTreeBuilder.test_smart_quotes_converted_on_the_way_in ______

self = <bs4.tests.test_htmlparser.TestHTMLParserTreeBuilder object at 0x7fb138646460>

    def test_smart_quotes_converted_on_the_way_in(self):
        # Microsoft smart quotes are converted to Unicode characters during
        # parsing.
        quote = b"<p>\x91Foo\x92</p>"
        soup = self.soup(quote)
> assert soup.p.string == "\N{LEFT SINGLE QUOTATION MARK}Foo\N{RIGHT SINGLE QUOTATION MARK}"
E AttributeError: 'NoneType' object has no attribute 'string'

bs4/tests/__init__.py:808: AttributeError
=========================== short test summary info ============================
SKIPPED [2] bs4/tests/test_fuzz.py:68: Prerequisites for fuzz tests are not installed.
SKIPPED [4] bs4/tests/test_fuzz.py:86: Prerequisites for fuzz tests are not installed.
SKIPPED [3] bs4/tests/test_fuzz.py:105: Prerequisites for fuzz tests are not installed.
SKIPPED [2] bs4/tests/test_fuzz.py:115: Prerequisites for fuzz tests are not installed.
SKIPPED [6] bs4/tests/test_fuzz.py:131: Prerequisites for fuzz tests are not installed.
SKIPPED [1] bs4/tests/test_fuzz.py:161: Prerequisites for fuzz tests are not installed.
SKIPPED [4] bs4/tests/__init__.py:282: html5lib seems not to be present, not testing its tree builder.
SKIPPED [2] bs4/tests/__init__.py:291: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:301: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:694: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:344: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:374: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:380: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:385: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:403: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:407: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:410: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:414: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:435: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:448: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:477: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:485: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:495: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:505: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:515: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:518: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:535: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:550: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:568: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:579: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:606: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:618: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:626: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:634: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:637: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:645: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:651: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:663: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:670: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:677: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:681: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:687: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:701: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:712: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:729: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:742: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:754: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:769: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:773: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:777: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:785: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:793: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:796: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:803: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:810: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:814: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:820: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:848: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:865: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:878: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:906: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:930: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:953: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:958: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:966: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:1147: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:1152: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:1157: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:1165: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/__init__.py:1172: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:26: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:38: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:58: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:72: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:79: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:85: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:96: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:112: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:118: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:125: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:130: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:148: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:170: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:184: html5lib seems not to be present, not testing its tree builder.
SKIPPED [1] bs4/tests/test_html5lib.py:190: html5lib seems not to be present, not testing its tree builder.
FAILED bs4/tests/test_htmlparser.py::TestHTMLParserTreeBuilder::test_smart_quotes_converted_on_the_way_in
================== 1 failed, 562 passed, 101 skipped in 2.15s ==================
```

List of installed modules in build env:

```console
Package Version
----------------------------- -----------
alabaster 0.7.16
attrs 23.2.0
Babel 2.14.0
build 1.1.1
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
distro 1.9.0
dnf 4.19.0
docutils 0.20.1
editables 0.5
exceptiongroup 1.1.3
gpg 1.23.2
hatch-fancy-pypi-readme 24.1.0
hatch-vcs 0.4.0
hatchling 1.21.1
hypothesis 6.99.6
idna 3.6
imagesize 1.4.1
importlib_metadata 7.0.1
importlib_resources 6.3.1
incremental 22.10.0
iniconfig 2.0.0
installer 0.7.0
Jinja2 3.1.3
libdnf 0.73.0
lxml 5.1.0
markdown-it-py 3.0.0
MarkupSafe 2.1.3
mdit-py-plugins 0.4.0
mdurl 0.1.2
myst-parser 2.0.0
packaging 24.0
pathspec 0.12.1
pluggy 1.4.0
Pygments 2.17.2
pyproject_hooks 1.0.0
pytest 8.1.1
python-dateutil 2.9.0.post0
PyYAML 6.0.1
requests 2.31.0
setuptools 69.1.1
setuptools-scm 8.0.4
snowballstemmer 2.2.0
sortedcontainers 2.4.0
soupsieve 2.5
Sphinx 7.2.6
sphinxcontrib-applehelp 1.0.8
sphinxcontrib-devhelp 1.0.5
sphinxcontrib-htmlhelp 2.0.5
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.7
sphinxcontrib-serializinghtml 1.1.10
sphinxcontrib-towncrier 0.4.0a0
tokenize_rt 5.2.0
tomli 2.0.1
towncrier 23.11.0
trove-classifiers 2024.3.13
typing_extensions 4.10.0
urllib3 1.26.18
wheel 0.43.0
zipp 3.17.0
zope.event 5.0
zope.interface 6.2
```

Please let me know if you need more details or want me to perform some diagnostics.

Revision history for this message
Tomasz Kloczko (kloczek) wrote :

BTW .. is it possible to move VCS and bug tracking system to github or some of the gitlabs?
launchpad interface really sucks .. :)

Revision history for this message
Leonard Richardson (leonardr) wrote :

This is a strange failure, since superficially similar tests like test_entities_converted_on_the_way_out don't have the problem.

Are you running this against Python 3.9.19, which was released yesterday? Are you also running it against any other Python releases, with or without this problem? I can't duplicate your setup exactly, but I created a fresh 3.9.19 environment that resembles yours, and ran the test suite successfully.

Here's a diagnostic I'd like you to run:

---
data = b"<p>\x91Foo\x92</p>"
from bs4.diagnose import diagnose
diagnose(data)

print("\nBEGINNING HTMLPARSER TRACE")
from bs4.diagnose import htmlparser_trace
from bs4.dammit import UnicodeDammit
u = UnicodeDammit(data).unicode_markup
print(u)
htmlparser_trace(u)
---

This will show what markup the html.parser parser is receiving and how it handles that markup.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.