"latin_1" encoding unknown on Windows

Bug #2001209 reported by Lukas Schreiber
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

I just migrated a project from Linux to Windows 10. Here `etree.iterparse(path, remove_blank_text=True, encoding="latin_1")` raises `LookupError: unknown encoding: 'b'latin_1''`.

Since the docs don't specify what strings are accepted for encodings, I assumed using the spelling from the codecs standard library would work. But it didn't. Neither does lxml find `"latin-1"` (with a hyphen). Setting the encoding to `"latin1"` solved my problem. However this is not satisfactory, as I couldn't find any guidance on how to spell the encoding or even what encodings are supported. I've spent more than an hour searching.

Python : sys.version_info(major=3, minor=9, micro=7, releaselevel='final', serial=0)
lxml.etree : (4, 9, 1, 0)
libxml used : (2, 10, 3)
libxml compiled : (2, 9, 14)
libxslt used : (1, 1, 37)
libxslt compiled : (1, 1, 35)

Revision history for this message
Hugo Bauer (hugoobauer) wrote :

I have the same issue from a Python:3.10 Docker image, but not from my laptop (macOS).

>>> import lxml.html
>>> lxml.html.HTMLParser(encoding='latin_1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/site-packages/lxml/html/__init__.py", line 1910, in __init__
    super(HTMLParser, self).__init__(**kwargs)
  File "src/lxml/parser.pxi", line 1728, in lxml.etree.HTMLParser.__init__
  File "src/lxml/parser.pxi", line 840, in lxml.etree._BaseParser.__init__
LookupError: unknown encoding: 'b'latin_1''

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.