"latin_1" encoding unknown on Windows
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
New
|
Undecided
|
Unassigned |
Bug Description
I just migrated a project from Linux to Windows 10. Here `etree.
Since the docs don't specify what strings are accepted for encodings, I assumed using the spelling from the codecs standard library would work. But it didn't. Neither does lxml find `"latin-1"` (with a hyphen). Setting the encoding to `"latin1"` solved my problem. However this is not satisfactory, as I couldn't find any guidance on how to spell the encoding or even what encodings are supported. I've spent more than an hour searching.
Python : sys.version_
lxml.etree : (4, 9, 1, 0)
libxml used : (2, 10, 3)
libxml compiled : (2, 9, 14)
libxslt used : (1, 1, 37)
libxslt compiled : (1, 1, 35)
I have the same issue from a Python:3.10 Docker image, but not from my laptop (macOS).
>>> import lxml.html HTMLParser( encoding= 'latin_ 1') lib/python3. 10/site- packages/ lxml/html/ __init_ _.py", line 1910, in __init__ HTMLParser, self)._ _init__ (**kwargs) parser. pxi", line 1728, in lxml.etree. HTMLParser. __init_ _ parser. pxi", line 840, in lxml.etree. _BaseParser. __init_ _
>>> lxml.html.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/
super(
File "src/lxml/
File "src/lxml/
LookupError: unknown encoding: 'b'latin_1''