etree.fromstring with a UTF-32 encoded string fails

Bug #1703811 reported by Dale P
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

I am trying to pass a UTF-32 encoded string (with the XML encoding declaration) to lxml.etree.fromstring This raises an lxml.etree.XMLSyntaxError exception (Document is empty).

Is this expected? Following the same process for all other encodings that I have tested works fine (UTF-8, UTF-16, ASCII, ISO-8859-1, ISO-8859-2, BIG5, EUC-JP).

I have not tested this in libxml2 as I do not know how to.

Python : sys.version_info(major=3, minor=6, micro=0, releaselevel='final', serial=0)
lxml.etree : (3, 8, 0, 0)
libxml used : (2, 9, 4)
libxml compiled : (2, 9, 4)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)

from lxml import etree
foo = """<?xml version='1.0' encoding='utf-32'?>\n<tag attrib='123'></tag>"""
etree.fromstring(foo.encode('utf-32'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/lxml/lxml.etree.pyx", line 3228, in lxml.etree.fromstring (src/lxml/lxml.etree.c:79594)
  File "src/lxml/parser.pxi", line 1848, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:119113)
  File "src/lxml/parser.pxi", line 1736, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:117793)
  File "src/lxml/parser.pxi", line 1102, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:112037)
  File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105881)
  File "src/lxml/parser.pxi", line 706, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:107589)
  File "src/lxml/parser.pxi", line 635, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:106443)
  File "<string>", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.