serialisation with libxml2 2.9.12 returns extra closing tags

Bug #1928795 reported by Michał Górny
50
This bug affects 7 people
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Undecided
Unassigned

Bug Description

See also: https://gitlab.gnome.org/GNOME/libxml2/-/issues/255

The linked bug suggests that lxml uses libxml2 incorrectly. Due to internal impl changes in 2.9.12, this results in garbage being printed after XML elements. This causes both real-life breakage and a lot of test failures like:

```
======================================================================
FAIL: test_multiple_elementrees (lxml.tests.test_xslt.ETreeXSLTTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.9/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib/python3.9/unittest/case.py", line 593, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.9/unittest/case.py", line 550, in _callTestMethod
    method()
  File "/tmp/lxml/src/lxml/tests/test_xslt.py", line 674, in test_multiple_elementrees
    self.assertEqual(self._rootstring(b_tree),
  File "/usr/lib/python3.9/unittest/case.py", line 831, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/usr/lib/python3.9/unittest/case.py", line 824, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: b'<b>B</b><c>C</c></a>' != b'<b>B</b>'
```

I can reproduce reliably with git master (1ea55a8550ca123d9adb4ab9ebc82fa1527f0149) using 'tox -e py39'.

```
Python : sys.version_info(major=3, minor=9, micro=5, releaselevel='final', serial=0)
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 12)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)
```

Revision history for this message
scoder (scoder) wrote :

Confirmed. Not something to change lightly, so I'd stick with 2.9.10 for now (and possibly another while).

Changed in lxml:
status: New → Confirmed
Revision history for this message
scoder (scoder) wrote :

I guess an import time warning would help for now.

Revision history for this message
Jeremy (jmgore75) wrote :

I can add that this issue directly caught me when attempting to call etree.tostring on an html element, and that it starts with libxml2 v2.9.11 (2.9.10 is last version which works properly). This happens with a conda installation; the current libxml2 dependency version is 2.9.12. I could have used some documentation about this issue.

Revision history for this message
scoder (scoder) wrote :

The corresponding libxml2 ticket is at
https://gitlab.gnome.org/GNOME/libxml2/-/issues/255

Note that libxml2 is currently unmaintained and it is unclear when a release will be made.
lxml now special-cases a static build against libxml2 2.9.12 and uses a more recent git revision instead. This was first applied in lxml 4.7.1.

https://github.com/lxml/lxml/commit/7b941e58ab088a25a8e0a7f6e13e4e5b9dd93c37

summary: - Breakage/failing tests with libxml 2.9.12
+ serialisation with libxml2 2.9.12 returns extra closing tags
Revision history for this message
scoder (scoder) wrote :

> Note that libxml2 is currently unmaintained and it is unclear when a release will be made.

Development was resumed, apparently. Let's see when a new release comes out.
https://mail.gnome.org/archives/xml/2022-January/msg00001.html

Revision history for this message
scoder (scoder) wrote :

I added a warning to the build output when detecting libxml2 2.9.11 or 2.9.12. (This won't show in pip installs, but failing the build completely might be a bit tough on our users.)

https://github.com/lxml/lxml/commit/d56997b270c120893fbcfb777e170bf61691f262

There will also be a warning in the docs.

https://github.com/lxml/lxml/commit/5a5c7fb01d15af58def4bab2ba7b15c937042835

Revision history for this message
scoder (scoder) wrote :

A work-around has been applied to libxml2 2.9.13.
The binary wheels for lxml 4.8.0 include this version.

Changed in lxml:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.