FTBFS 2.2.1-1 HTML CDATA handling

Bug #1945788 reported by Athos Ribeiro
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
soupsieve (Debian)
Fix Released
Unknown
soupsieve (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

soupsieve FTBFS due to test suite failures [1].

As per upstream's bug report [2], we arrive at [3] to understand the root cause is due to lxml being built against libxml2 >= 2.9.11, where CDATA is no longer stripped, causing parsing inconsistencies in BeautifulSoup.

[4] has been reported for lxml and, once it is fixed, rebuilding this package should be enough to close this bug. In the meanwhile, skipping the tests based on the libxml2 version as suggested in [2] should be safe.

[1] https://launchpad.net/ubuntu/+archive/test-rebuild-20210927-impish/+build/22213221
[2] https://github.com/facelessuser/soupsieve/issues/220
[3] https://bugs.launchpad.net/beautifulsoup/+bug/1930164
[4] https://bugs.launchpad.net/lxml/+bug/1930224

Failed tests report:

=================================== FAILURES ===================================
__________________ TestSoupContains.test_contains_cdata_html ___________________

self = <tests.test_extra.test_soup_contains.TestSoupContains testMethod=test_contains_cdata_html>

    def test_contains_cdata_html(self):
        """Test contains CDATA in HTML5."""

        markup = """
        <body><div id="1">Testing that <span id="2"><![CDATA[that]]></span>contains works.</div></body>
        """

> self.assert_selector(
            markup,
            'body *:-soup-contains("that")',
            ['1'],
            flags=util.HTML
        )

tests/test_extra/test_soup_contains.py:154:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/util.py:122: in assert_selector
    self.assertEqual(sorted(ids), sorted(expected_ids))
E AssertionError: Lists differ: ['1', '2'] != ['1']
E
E First list contains 1 additional elements.
E First extra element 1:
E '2'
E
E - ['1', '2']
E + ['1']
----------------------------- Captured stdout call -----------------------------
----Running Selector Test----
PATTERN: body *:-soup-contains("that")
## PARSING: 'body *:-soup-contains("that")'
TOKEN: 'tag' --> 'body' at position 0
TOKEN: 'combine' --> ' ' at position 4
TOKEN: 'tag' --> '*' at position 5
TOKEN: 'pseudo_contains' --> ':-soup-contains("that")' at position 6
## END PARSING

====PARSER: html5lib
TAG: div

====PARSER: lxml
TAG: div
TAG: span
_______________ TestSoupContainsOwn.test_contains_own_cdata_html _______________

self = <tests.test_extra.test_soup_contains_own.TestSoupContainsOwn testMethod=test_contains_own_cdata_html>

    def test_contains_own_cdata_html(self):
        """Test contains CDATA in HTML5."""

        markup = """
        <body><div id="1">Testing that <span id="2"><![CDATA[that]]></span>contains works.</div></body>
        """

> self.assert_selector(
            markup,
            'body *:-soup-contains-own("that")',
            ['1'],
            flags=util.HTML
        )

tests/test_extra/test_soup_contains_own.py:45:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/util.py:122: in assert_selector
    self.assertEqual(sorted(ids), sorted(expected_ids))
E AssertionError: Lists differ: ['1', '2'] != ['1']
E
E First list contains 1 additional elements.
E First extra element 1:
E '2'
E
E - ['1', '2']
E + ['1']
----------------------------- Captured stdout call -----------------------------
----Running Selector Test----
PATTERN: body *:-soup-contains-own("that")
## PARSING: 'body *:-soup-contains-own("that")'
TOKEN: 'tag' --> 'body' at position 0
TOKEN: 'combine' --> ' ' at position 4
TOKEN: 'tag' --> '*' at position 5
TOKEN: 'pseudo_contains' --> ':-soup-contains-own("that")' at position 6
## END PARSING

====PARSER: html5lib
TAG: div

====PARSER: lxml
TAG: div
TAG: span
=========================== short test summary info ============================
FAILED tests/test_extra/test_soup_contains.py::TestSoupContains::test_contains_cdata_html
FAILED tests/test_extra/test_soup_contains_own.py::TestSoupContainsOwn::test_contains_own_cdata_html

Changed in soupsieve (Debian):
status: Unknown → New
Changed in soupsieve (Debian):
status: New → Fix Released
Revision history for this message
Paride Legovini (paride) wrote :

Fixed in Debian in version 2.2.1-2 (same upstream version).

Paride Legovini (paride)
Changed in soupsieve (Ubuntu):
assignee: nobody → Paride Legovini (paride)
Revision history for this message
Paride Legovini (paride) wrote :

My proposal is to:

  syncpackage -r impish-proposed -d unstable -v soupsieve

considering that at the moment in Debian we have:

soupsieve | 2.2.1-1 | testing | source
soupsieve | 2.2.1-2 | unstable | source

I'm tagging this server-next to get another pair of ubuntu-server-dev eyes on it.

tags: added: server-next
Changed in soupsieve (Ubuntu):
assignee: Paride Legovini (paride) → nobody
Paride Legovini (paride)
tags: added: needs-sync
removed: server-next
Paride Legovini (paride)
Changed in soupsieve (Ubuntu):
milestone: none → ubuntu-21.10
Revision history for this message
Bryce Harrington (bryce) wrote :

I've confirmed the sync is a targeted bug-fix for just this specific issue:

soupsieve (2.2.1-2) unstable; urgency=medium

  * Patch test suite to XFAIL test_contains_cdata_html, due to CDATA behaviour
    change in libxml2 >= 2.9.11. Fixing autopkgtests and FTBFS. Thanks Athos
    Ribeiro for the investigation and bug report. (Closes: #995459)
  * Bump Standards-Version to 4.6.0, no changes needed.
  * Migrate to Build-Depend on dh-sequence-python3.

 -- Stefano Rivera <email address hidden> Sat, 02 Oct 2021 12:25:16 -0700

The patch itself is:

  https://salsa.debian.org/python-team/packages/soupsieve/-/commit/7a94c302e39f8b86ac24e287678b7c12dd41dad1

The sync command I am using is:

  syncpackage -v -b 1945788 soupsieve

Revision history for this message
Bryce Harrington (bryce) wrote :

This bug was fixed in the package soupsieve - 2.2.1-2

---------------
soupsieve (2.2.1-2) unstable; urgency=medium

  * Patch test suite to XFAIL test_contains_cdata_html, due to CDATA behaviour
    change in libxml2 >= 2.9.11. Fixing autopkgtests and FTBFS. Thanks Athos
    Ribeiro for the investigation and bug report. (Closes: #995459)
  * Bump Standards-Version to 4.6.0, no changes needed.
  * Migrate to Build-Depend on dh-sequence-python3.

 -- Stefano Rivera <email address hidden> Sat, 02 Oct 2021 12:25:16 -0700

Changed in soupsieve (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.