lxml can't find encodings on Mac OS X

Bug #707396 reported by Pablo Hoffman
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
lxml
Invalid
Undecided
Unassigned

Bug Description

Python : (2, 5, 1, 'final', 0)
lxml.etree : (2, 3, -99, 0)
libxml used : (2, 7, 7)
libxml compiled : (2, 7, 7)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

Mac OS X version:

$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.5.5
BuildVersion: 9F33

lxml can't find cp1252 encoding:

$ python
Python 2.5.1 (r251:54863, Apr 15 2008, 22:57:26)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> etree.HTMLParser(encoding='cp1252')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parser.pxi", line 1401, in lxml.etree.HTMLParser.__init__ (src/lxml/lxml.etree.c:78208)
  File "parser.pxi", line 721, in lxml.etree._BaseParser.__init__ (src/lxml/lxml.etree.c:73153)
LookupError: unknown encoding: 'cp1252'
>>> ^D

But it's available on iconv:

$ iconv -l | grep -i cp1252
CP1252 MS-ANSI WINDOWS-1252

Revision history for this message
scoder (scoder) wrote : Re: [Bug 707396] [NEW] lxml can't find encodings on Mac OS X

Is this using the binary build from PyPI?

Does the same problem occur when you request upper case "CP1252"?

Stefan

Revision history for this message
Pablo Hoffman (pablohoffman) wrote :

Yes, using the binary build installed with: easy_install -U lxml

And yes, the same thing happens with upper case "CP1252".

Revision history for this message
scoder (scoder) wrote :

Pascal, can you comment on this?

Changed in lxml:
assignee: nobody → Pascal (p-oberndoerfer)
Revision history for this message
Pascal (p-oberndoerfer) wrote :

Sorry for the belated reply!

iconv caused grieve on the Mac due to "include"-problems. But the solution used to provide the binaries -- it turns out -- did only silence this problem...

Currently working on this again.

Revision history for this message
Pascal (p-oberndoerfer) wrote :

Strange BTW, this bug was assigned to me, but does not appear in my assigned list under my profile?

Changed in lxml:
status: New → In Progress
Revision history for this message
Tim Arnold (a-jtim) wrote :

This behavior is also happening for freebsd8.2 (amd64).
> Python 2.7.1 (r271:86832, Apr 5 2011, 13:19:14) [GCC 4.2.1 20070719
> [FreeBSD]] on freebsd8

> from lxml import etree
> parser = etree.HTMLParser(encoding='cp1252')
>
> Traceback (most recent call last):
> File "lxml_bug.py", line 11, in <module>
> parser = etree.HTMLParser(encoding='cp1252')
> File "parser.pxi", line 1423, in lxml.etree.HTMLParser.__init__ (src/lxml/lxml.etree.c:81303)
> File "parser.pxi", line 743, in lxml.etree._BaseParser.__init__
> (src/lxml/lxml.etree.c:76172)
> LookupError: unknown encoding: 'cp1252'

Here are my details:
Python : sys.version_info(major=2, minor=7, micro=1, releaselevel='final', serial=0)
lxml.etree : (2, 3, 1, 0)
libxml used : (2, 7, 8)
libxml compiled : (2, 7, 8)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

platform.architecture()
('64bit', 'ELF')

Also, this info on libxml2 and libiconv.
ldd libxml2.so
libxml2.so:
        libz.so.5 => /lib/libz.so.5 (0x800889000)
        libiconv.so.3 => /usr/local/lib/libiconv.so.3 (0x800e4c000)
        libm.so.5 => /lib/libm.so.5 (0x801046000)
        libc.so.7 => /lib/libc.so.7 (0x800647000)

ldd /usr/local/lib/libiconv.so.3
/usr/local/lib/libiconv.so.3:
        libc.so.7 => /lib/libc.so.7 (0x800647000)

iconv -l | grep -i cp1252
CP1252 MS-ANSI WINDOWS-1252

Revision history for this message
scoder (scoder) wrote :

Thanks. Two more questions right away:

I assume you already tried the upper case name for the encoding and it didn't work?

Are you using a static build of lxml? What does ldd tell you on the lxml/etree.so file? Does it actually depend on the libxml2 library that you presented above?

Revision history for this message
Tim Arnold (a-jtim) wrote :

Yes I tried the uppercase name CP1252.
I don't know how to determine whether lxml is statically built; I don't remember changing any defaults when I compiled.

ldd ./etree.so
./etree.so:
        libxslt.so.2 => /AppDocs/local/lib/libxslt.so.2 (0x800889000)
        libexslt.so.8 => /AppDocs/local/lib/libexslt.so.8 (0x800e5e000)
        libxml2.so.9 => /AppDocs/local/lib/libxml2.so.9 (0x800f77000)
        libz.so.5 => /lib/libz.so.5 (0x801238000)
        libm.so.5 => /lib/libm.so.5 (0x80134d000)
        libthr.so.3 => /lib/libthr.so.3 (0x80146d000)
        libc.so.7 => /lib/libc.so.7 (0x800647000)
        libgcrypt.so.17 => /usr/local/lib/libgcrypt.so.17 (0x801586000)
        libgpg-error.so.0 => /usr/local/lib/libgpg-error.so.0 (0x8016fb000)
        libintl.so.9 => /usr/local/lib/libintl.so.9 (0x8017fe000)
        libiconv.so.3 => /usr/local/lib/libiconv.so.3 (0x801907000)

Oh, here is the difference. I give 'ldd libxml2.so' and got the results above. I give 'ldd ./libxml2.s0' and get this:
ldd ./libxml2.so
./libxml2.so:
        libthr.so.3 => /lib/libthr.so.3 (0x800889000)
        libz.so.5 => /lib/libz.so.5 (0x800ec1000)
        libm.so.5 => /lib/libm.so.5 (0x800fd6000)
        libc.so.7 => /lib/libc.so.7 (0x800647000)

pwd
/AppDocs/local/lib

no, the libxml2 that etree depends on is not what I originally reported;
etree.so depends on the libiconv.so in /usr/local/lib, but libxml2 that etree depends on does not.

Revision history for this message
scoder (scoder) wrote :

That explains it then. I have no idea where your /AppDocs/local/lib/libxml2.so.9 came from, but clearly, it was built without libiconv support.

And your etree.so isn't statically built, thus the dependency on libxml2 and friends.

I also think it's good that this ticket collects information on what users who run into this problem should do in order to analyse it. That will allow them to either solve it for them or add further information for cases that this discussion does not yet cover.

Revision history for this message
Tim Arnold (a-jtim) wrote :

My situation is now fixed. I installed everything to /AppDocs/local.

Using the following environment variables:
    export PATH="/AppDocs/local/bin:$PATH"
    export LD_LIBRARY_PATH="/AppDocs/local/lib"
    export LD_RUN_PATH="/AppDocs/local/lib"
    export CFLAGS="-fPIC -L/AppDocs/local/lib -R/AppDocs/local/lib -I/AppDocs/local/include"

Installed
    libiconv-1.14
    I believe this was the key that was missing when I first installed the libxml2/libxslt/lxml libraries.
    The libiconv library exists in /usr/local/lib but was not used; I think because I had no *.h files. When I installed it in /AppDocs/local, the *.h files were included and used in the subsequent library build/installs.

Then reinstalled
    libxml2-2.7.8
    libxslt-1.1.26
    lxml-2.3.3

Now the test code above (encoding='cp1252') works with no errors.
ldd ./etree.so
./etree.so:
        libxslt.so.2 => /AppDocs/local/lib/libxslt.so.2 (0x800889000)
        libexslt.so.8 => /AppDocs/local/lib/libexslt.so.8 (0x800e5e000)
        libxml2.so.9 => /AppDocs/local/lib/libxml2.so.9 (0x800f77000)
        libz.so.5 => /lib/libz.so.5 (0x801235000)
        libm.so.5 => /lib/libm.so.5 (0x80134a000)
        libthr.so.3 => /lib/libthr.so.3 (0x80146a000)
        libc.so.7 => /lib/libc.so.7 (0x800647000)
        libiconv.so.7 => /AppDocs/local/lib/libiconv.so.7 (0x801583000)
        libgcrypt.so.17 => /usr/local/lib/libgcrypt.so.17 (0x801773000)
        libgpg-error.so.0 => /usr/local/lib/libgpg-error.so.0 (0x8018e8000)
        libintl.so.9 => /usr/local/lib/libintl.so.9 (0x8019eb000)
        libiconv.so.3 => /usr/local/lib/libiconv.so.3 (0x801af4000)

scoder (scoder)
Changed in lxml:
status: In Progress → Triaged
Revision history for this message
scoder (scoder) wrote :

I'm closing this as "invalid" (sorry, can't find a better tag) because the lxml project no longer provides binaries, so this is a mere installation problem.

Changed in lxml:
assignee: Pascal (p-oberndoerfer) → nobody
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.