Unicode Emoji raise etree.XMLSyntaxError at etree.fromstring()

Bug #1538213 reported by Minho Ryang
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
lxml
Invalid
Undecided
Unassigned

Bug Description

OS X 10.11.2(15C50)
Python : sys.version_info(major=3, minor=5, micro=0, releaselevel='final', serial=0)
lxml.etree : (3, 5, 0, 0)
libxml used : (2, 9, 2)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

I want U+1F576 Sunglasses!
But this test.py won't worked.

```python
#!/usr/bin/env python3
import sys
from lxml import html, etree

print("%-20s: %s" % ('Python', sys.version_info))
print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))

uni = "<p>Unicode! \U0001F576 Sunglasses!</p>"
#t = html.fragment_fromstring(uni) # XXX: lxml.etree.ParserError: Document is empty
t = etree.fromstring(uni, parser=etree.XMLParser(encoding='unicode'))
print("B", etree.tostring(t))
print("U", etree.tostring(t, encoding='unicode'))
```

```pytb
Traceback (most recent call last):
  File "test.py", line 14, in <module>
    t = etree.fromstring(uni, parser=etree.XMLParser(encoding='unicode'))
  File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src/lxml/lxml.etree.c:82934)
  File "src/lxml/parser.pxi", line 1819, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:124533)
  File "src/lxml/parser.pxi", line 1700, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:122964)
  File "src/lxml/parser.pxi", line 1040, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:116705)
  File "src/lxml/parser.pxi", line 573, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:110510)
  File "src/lxml/parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:112276)
  File "src/lxml/parser.pxi", line 613, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:111124)
```

Revision history for this message
Blake Winton (bwinton) wrote :

I'm running into the same problem, but can verify that the same code works with Python 2.7.10, which should hopefully help narrow it down a little… :)

Revision history for this message
David D Lowe (flimm) wrote :

I also experience this bug with these version numbers, although the error message is a bit more helpful.

Python : sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
lxml.etree : (3, 7, 3, 0)
libxml used : (2, 9, 4)
libxml compiled : (2, 9, 4)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
Traceback (most recent call last):
  File "hi.py", line 14, in <module>
    t = etree.fromstring(uni, parser=etree.XMLParser(encoding='unicode'))
  File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src/lxml/lxml.etree.c:79010)
  File "src/lxml/parser.pxi", line 1848, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:118341)
  File "src/lxml/parser.pxi", line 1729, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:116899)
  File "src/lxml/parser.pxi", line 1063, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:110886)
  File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105109)
  File "src/lxml/parser.pxi", line 706, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:106817)
  File "src/lxml/parser.pxi", line 635, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:105671)
  File "<string>", line 1
lxml.etree.XMLSyntaxError: Char 0x0 out of allowed range, line 1, column 2

Revision history for this message
scoder (scoder) wrote :

Example code works for me on Linux.

Changed in lxml:
status: New → Triaged
Revision history for this message
scoder (scoder) wrote :

Closing, can't reproduce.

Changed in lxml:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.