Comment 14 for bug 1949271

Revision history for this message
Mike Edmunds (medmunds) wrote :

A workaround seems to be forcing the input to ascii with html entity encoding (xml character refs):

>>> from lxml import etree
>>> root = etree.fromstring("<p>🐻</p>")
...
lxml.etree.XMLSyntaxError: Char 0x0 out of allowed range, line 1, column 2

>>> root = etree.fromstring("<p>🐻</p>".encode("ascii", "xmlcharrefreplace").decode("ascii"))
>>> root.text
'🐻'

(lxml 4.9.2, Python 3.11.1)