A workaround seems to be forcing the input to ascii with html entity encoding (xml character refs):
>>> from lxml import etree >>> root = etree.fromstring("<p>🐻</p>") ... lxml.etree.XMLSyntaxError: Char 0x0 out of allowed range, line 1, column 2
>>> root = etree.fromstring("<p>🐻</p>".encode("ascii", "xmlcharrefreplace").decode("ascii")) >>> root.text '🐻'
(lxml 4.9.2, Python 3.11.1)
A workaround seems to be forcing the input to ascii with html entity encoding (xml character refs):
>>> from lxml import etree g("<p>🐻 </p>") XMLSyntaxError: Char 0x0 out of allowed range, line 1, column 2
>>> root = etree.fromstrin
...
lxml.etree.
>>> root = etree.fromstrin g("<p>🐻 </p>".encode( "ascii" , "xmlcharrefrepl ace").decode( "ascii" ))
>>> root.text
'🐻'
(lxml 4.9.2, Python 3.11.1)