Ubuntu
soupsieve package

FTBFS 2.2.1-1 HTML CDATA handling

Bug #1945788 reported by Athos Ribeiro on 2021-10-01

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	soupsieve (Debian)	Fix Released	Unknown	debbugs #995459
	soupsieve (Ubuntu)	Fix Released	Undecided	Unassigned	Ubuntu ubuntu-21.10

Bug Description

soupsieve FTBFS due to test suite failures [1].

As per upstream's bug report [2], we arrive at [3] to understand the root cause is due to lxml being built against libxml2 >= 2.9.11, where CDATA is no longer stripped, causing parsing inconsistencies in BeautifulSoup.

[4] has been reported for lxml and, once it is fixed, rebuilding this package should be enough to close this bug. In the meanwhile, skipping the tests based on the libxml2 version as suggested in [2] should be safe.

[1] https://launchpad.net/ubuntu/+archive/test-rebuild-20210927-impish/+build/22213221
[2] https://github.com/facelessuser/soupsieve/issues/220
[3] https://bugs.launchpad.net/beautifulsoup/+bug/1930164
[4] https://bugs.launchpad.net/lxml/+bug/1930224

Failed tests report:

=================================== FAILURES ===================================
__________________ TestSoupContains.test_contains_cdata_html ___________________

self = <tests.test_extra.test_soup_contains.TestSoupContains testMethod=test_contains_cdata_html>

def test_contains_cdata_html(self):
"""Test contains CDATA in HTML5."""

        markup = """
        <body><div id="1">Testing that <span id="2"><![CDATA[that]]></span>contains works.</div></body>
        """

> self.assert_selector(
            markup,
            'body *:-soup-contains("that")',
            ['1'],
            flags=util.HTML
        )

tests/test_extra/test_soup_contains.py:154:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/util.py:122: in assert_selector
self.assertEqual(sorted(ids), sorted(expected_ids))
E AssertionError: Lists differ: ['1', '2'] != ['1']
E
E First list contains 1 additional elements.
E First extra element 1:
E '2'
E
E - ['1', '2']
E + ['1']
----------------------------- Captured stdout call -----------------------------
----Running Selector Test----
PATTERN: body *:-soup-contains("that")
## PARSING: 'body *:-soup-contains("that")'
TOKEN: 'tag' --> 'body' at position 0
TOKEN: 'combine' --> ' ' at position 4
TOKEN: 'tag' --> '*' at position 5
TOKEN: 'pseudo_contains' --> ':-soup-contains("that")' at position 6
## END PARSING

====PARSER: html5lib
TAG: div

====PARSER: lxml
TAG: div
TAG: span
_______________ TestSoupContainsOwn.test_contains_own_cdata_html _______________

self = <tests.test_extra.test_soup_contains_own.TestSoupContainsOwn testMethod=test_contains_own_cdata_html>

def test_contains_own_cdata_html(self):
"""Test contains CDATA in HTML5."""

        markup = """
        <body><div id="1">Testing that <span id="2"><![CDATA[that]]></span>contains works.</div></body>
        """

> self.assert_selector(
            markup,
            'body *:-soup-contains-own("that")',
            ['1'],
            flags=util.HTML
        )

tests/test_extra/test_soup_contains_own.py:45:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/util.py:122: in assert_selector
self.assertEqual(sorted(ids), sorted(expected_ids))
E AssertionError: Lists differ: ['1', '2'] != ['1']
E
E First list contains 1 additional elements.
E First extra element 1:
E '2'
E
E - ['1', '2']
E + ['1']
----------------------------- Captured stdout call -----------------------------
----Running Selector Test----
PATTERN: body *:-soup-contains-own("that")
## PARSING: 'body *:-soup-contains-own("that")'
TOKEN: 'tag' --> 'body' at position 0
TOKEN: 'combine' --> ' ' at position 4
TOKEN: 'tag' --> '*' at position 5
TOKEN: 'pseudo_contains' --> ':-soup-contains-own("that")' at position 6
## END PARSING

====PARSER: html5lib
TAG: div

====PARSER: lxml
TAG: div
TAG: span
=========================== short test summary info ============================
FAILED tests/test_extra/test_soup_contains.py::TestSoupContains::test_contains_cdata_html
FAILED tests/test_extra/test_soup_contains_own.py::TestSoupContainsOwn::test_contains_own_cdata_html

Tags:

Bug Watch Updater (bug-watch-updater) on 2021-10-01

Changed in soupsieve (Debian):
status:	Unknown → New

Bug Watch Updater (bug-watch-updater) on 2021-10-03

Changed in soupsieve (Debian):
status:	New → Fix Released

Revision history for this message

Paride Legovini (paride) wrote on 2021-10-04:

Fixed in Debian in version 2.2.1-2 (same upstream version).

Paride Legovini (paride) on 2021-10-04

Changed in soupsieve (Ubuntu):
assignee:	nobody → Paride Legovini (paride)

Revision history for this message

Paride Legovini (paride) wrote on 2021-10-04:

My proposal is to:

syncpackage -r impish-proposed -d unstable -v soupsieve

considering that at the moment in Debian we have:

I'm tagging this server-next to get another pair of ubuntu-server-dev eyes on it.

tags:	added: server-next
Changed in soupsieve (Ubuntu):
assignee:	Paride Legovini (paride) → nobody

Paride Legovini (paride) on 2021-10-04

tags:

added: needs-sync
removed: server-next

Paride Legovini (paride) on 2021-10-05

Changed in soupsieve (Ubuntu):
milestone:	none → ubuntu-21.10

Revision history for this message

Bryce Harrington (bryce) wrote on 2021-10-05:

I've confirmed the sync is a targeted bug-fix for just this specific issue:

soupsieve (2.2.1-2) unstable; urgency=medium

  * Patch test suite to XFAIL test_contains_cdata_html, due to CDATA behaviour
    change in libxml2 >= 2.9.11. Fixing autopkgtests and FTBFS. Thanks Athos
    Ribeiro for the investigation and bug report. (Closes: #995459)
  * Bump Standards-Version to 4.6.0, no changes needed.
  * Migrate to Build-Depend on dh-sequence-python3.

-- Stefano Rivera <email address hidden> Sat, 02 Oct 2021 12:25:16 -0700

The patch itself is:

https://salsa.debian.org/python-team/packages/soupsieve/-/commit/7a94c302e39f8b86ac24e287678b7c12dd41dad1

The sync command I am using is:

syncpackage -v -b 1945788 soupsieve

Revision history for this message

Bryce Harrington (bryce) wrote on 2021-10-06:

This bug was fixed in the package soupsieve - 2.2.1-2

---------------
soupsieve (2.2.1-2) unstable; urgency=medium

-- Stefano Rivera <email address hidden> Sat, 02 Oct 2021 12:25:16 -0700

Changed in soupsieve (Ubuntu):
status:	New → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

debbugs #995459
[done serious ftbfs sid bookworm] Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntusoupsieve package

FTBFS 2.2.1-1 HTML CDATA handling

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
soupsieve package