Doc omission: XMLParser encoding is an iconv encoding name, not a Python encoding name
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Low
|
Unassigned |
Bug Description
The documentation on https:/
> encoding - override the document encoding
It doesn't specify what encodings are valid, so it's reasonable to assume that it's the [list supported by Python](https:/
>>> lxml.etree.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/lxml/
File "src/lxml/
LookupError: unknown encoding: 'b'utf_8_sig''
From what I can tell, the encoding must be one of the ones reported by `iconv -l`, which is surprising to Python developers, so it should at least be documented.
If there's some way to accept Python encodings here too, then that would of course be even better.
Experimentally I found that `utf-8` has the effect I intended (ignoring any UTF-8 encoded BOM), but `utf8` without a hyphen fails:
>>> lxml.etree.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "<string>", line 1
lxml.etree.
I'm not sure what the deal is there, and it may be a separate issue.
Version info:
Python : sys.version_
lxml.etree : (4, 1, 1, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 8)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 32)
description: | updated |
description: | updated |
Changed in lxml: | |
status: | Fix Committed → Fix Released |
Right. I'll add a docstring note. Thanks.