IO_ENCODER error received during Serialisation

Bug #1873306 reported by Chad Dombrowski
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Triaged
Undecided
Unassigned

Bug Description

Overview:
This issue does not occur in 4.4.1, only 4.4.2-4.5.0.

System information:
Python : sys.version_info(major=3, minor=6, micro=5, releaselevel='final', serial=0)
lxml.etree : (4, 5, 0, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)

Error:
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "src/lxml/etree.pyx", line 3435, in lxml.etree.tostring
  File "src/lxml/serializer.pxi", line 139, in lxml.etree._tostring
  File "src/lxml/serializer.pxi", line 199, in lxml.etree._raiseSerialisationError
lxml.etree.SerialisationError: IO_ENCODER

Revision history for this message
Chad Dombrowski (cdombrowski) wrote :
Revision history for this message
Chad Dombrowski (cdombrowski) wrote :
description: updated
Revision history for this message
scoder (scoder) wrote :

Most likely a difference in the version of the libxml2 library, which implements the parser.

What do you need the "recover=True" for? The file seems to be correct XML.

Changed in lxml:
status: New → Triaged
Revision history for this message
Chad Dombrowski (cdombrowski) wrote :

In this particular test file, "recover=True" may not be necessary but we do run this code against thousands of XML files. This is the only example (so far) where this error occurs.

Revision history for this message
scoder (scoder) wrote :

The Linux binary wheels of lxml 4.4.2 use libxml2 2.9.10, 4.4.1 used 2.9.9. That's the most likely cause for this. There was no code change in between that looks related.

I could reproduce this locally with libxml2 2.9.10, but I noticed that it works if I pass "utf-8" as encoding into "tostring()". Only encoding="ascii" fails. Can't say why, though. Further investigations would help.

Revision history for this message
Xuan (aaronday) wrote :

I also encountered the same problem, normal in 4.4.1, there are problems in 4.4.2, 4.4.3 and 4.5.0.

    return etree.tostring(xml, pretty_print=pretty_print).decode(encoding).strip()
  File "src/lxml/etree.pyx", line 3385, in lxml.etree.tostring
  File "src/lxml/serializer.pxi", line 139, in lxml.etree._tostring
  File "src/lxml/serializer.pxi", line 199, in lxml.etree._raiseSerialisationError
lxml.etree.SerialisationError: IO_ENCODER

Revision history for this message
j (jmb06m) wrote :

I ran into this problem as well and ended up using a different encoding:

  et.tostring(xml, encoding='UTF-8')

before decoding the string. This worked for me.

Revision history for this message
nle (nle-odoo) wrote :

The issue seems solved in libxml2 as of this commit:

https://gitlab.gnome.org/GNOME/libxml2/-/commit/a697ed1e24234a9e6a4a4639555dcca230f752c1

If I use lxml with libxml2 built at this commit, the test script works. If I use the precedent commit the SerialisationError still happens.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.