etree.fromstring with a UTF-32 encoded string fails

Bug #1703810 reported by Dale P on 2017-07-12
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Low
Unassigned

Bug Description

I am trying to pass a UTF-32 encoded string (with the XML encoding declaration) to lxml.etree.fromstring This raises an lxml.etree.XMLSyntaxError exception (Document is empty).

Is this expected? Following the same process for all other encodings that I have tested works fine (UTF-8, UTF-16, ASCII, ISO-8859-1, ISO-8859-2, BIG5, EUC-JP).

I have not tested this in libxml2 as I do not know how to.

Python : sys.version_info(major=3, minor=6, micro=0, releaselevel='final', serial=0)
lxml.etree : (3, 8, 0, 0)
libxml used : (2, 9, 4)
libxml compiled : (2, 9, 4)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)

from lxml import etree
foo = """<?xml version='1.0' encoding='utf-32'?>\n<tag attrib='123'></tag>"""
etree.fromstring(foo.encode('utf-32'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/lxml/lxml.etree.pyx", line 3228, in lxml.etree.fromstring (src/lxml/lxml.etree.c:79594)
  File "src/lxml/parser.pxi", line 1848, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:119113)
  File "src/lxml/parser.pxi", line 1736, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:117793)
  File "src/lxml/parser.pxi", line 1102, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:112037)
  File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105881)
  File "src/lxml/parser.pxi", line 706, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:107589)
  File "src/lxml/parser.pxi", line 635, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:106443)
  File "<string>", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

scoder (scoder) on 2017-08-12
Changed in lxml:
importance: Undecided → Low
status: New → Confirmed
scoder (scoder) wrote :
Changed in lxml:
milestone: none → 3.9.0
status: Confirmed → Fix Committed
scoder (scoder) on 2017-09-19
Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers