lxml

"latin_1" encoding unknown on Windows

Bug #2001209 reported by Lukas Schreiber on 2023-01-03

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	lxml	New	Undecided	Unassigned

Bug Description

I just migrated a project from Linux to Windows 10. Here `etree.iterparse(path, remove_blank_text=True, encoding="latin_1")` raises `LookupError: unknown encoding: 'b'latin_1''`.

Since the docs don't specify what strings are accepted for encodings, I assumed using the spelling from the codecs standard library would work. But it didn't. Neither does lxml find `"latin-1"` (with a hyphen). Setting the encoding to `"latin1"` solved my problem. However this is not satisfactory, as I couldn't find any guidance on how to spell the encoding or even what encodings are supported. I've spent more than an hour searching.

Python : sys.version_info(major=3, minor=9, micro=7, releaselevel='final', serial=0)
lxml.etree : (4, 9, 1, 0)
libxml used : (2, 10, 3)
libxml compiled : (2, 9, 14)
libxslt used : (1, 1, 37)
libxslt compiled : (1, 1, 35)

Revision history for this message

Hugo Bauer (hugoobauer) wrote on 2023-03-02:

I have the same issue from a Python:3.10 Docker image, but not from my laptop (macOS).

>>> import lxml.html
>>> lxml.html.HTMLParser(encoding='latin_1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/site-packages/lxml/html/__init__.py", line 1910, in __init__
    super(HTMLParser, self).__init__(**kwargs)
  File "src/lxml/parser.pxi", line 1728, in lxml.etree.HTMLParser.__init__
  File "src/lxml/parser.pxi", line 840, in lxml.etree._BaseParser.__init__
LookupError: unknown encoding: 'b'latin_1''

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.