LXML does not support unicode when building python3 and osx

Bug #1687236 reported by Matt Bachmann on 2017-04-30
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description


While working on https://bugs.launchpad.net/lxml/+bug/1658169 I noticed that mac builds just fail when building python 3. Below is the whole log but here is what I found in my investigation.

I'll help in anyway I can but i'm afraid i'm a bit out of my element.

iconv versions:

Mac 10.12.2
iconv (GNU libiconv 1.11) (Though the same result under libiconv 1.15)

iconv (Ubuntu EGLIBC 2.15-0ubuntu10.18) 2.15

I also inspected parser.pxi the function _setupPythonUnicode to see what the enc value was on various versions.

2.7.13 = UTF-16LE
3.3.6 = UCS-4LE

2.7.13 = UCS-4LE
3.3.6 = UCS-4LE

What perplexes me is that libiconv should be able to handle this (to my... limited understanding)

 iconv -l | grep UTF-16LE UTF-16LE

iconv -l | grep UCS-4LE UCS-4LE

Ive seen this behavior on my machine and travis CI.

Matt Bachmann (bachmann.matt) wrote :
scoder (scoder) wrote :

This isn't due to libiconv, it's an incomplete implementation in lxml. See the difference between




This isn't easy to fix, because the incremental parser can receive arbitrary Unicode strings in different memory buffer formats (PEP-393) across its lifetime, which means that the data might need copying into a 4-byte format before passing it into libxml2, as we cannot repeatedly switch encodings at a per-byte level while parsing.

Changed in lxml:
importance: Undecided → Medium
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers