UnicodeDecodeError on Mac OS X with Python 3.10

Bug #1981134 reported by alan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

Python: 3.10.5
lxml: 4.9.1
OS: Mac OS X

When parsed text contains an emoji character a UnicodeDecodeError is raised. This only happens with Python 3.10 on OS X, not on Linux nor Windows.

Full traceback:

Python 3.10.4 (main, May 18 2022, 22:24:47) [Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import readtime
>>> result = readtime.of_markdown"👽"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/users/note/workspace/readt ime/readtime/api.py", line 40, in of_markdown
    return utils.read_time(markdown, format='markdown', wpm-wpm)
  File "/Users/note/workspace/readtime/readtime/utils.py", line 49, in read_time
    text, images = parse_html(el)
  File "/Users/note/workspace/readtime/readtime/utils.py", line 115, in parse_html
    add_text(tag, no_tail=True)
  File "/Users/note/workspace/readtime/readtime/utils.py", line 105, in add_text
    if tag. text and not isinstance(tag, lxml.etree._Comment):
  File "src/lxml/etree.pyx", line 1035, in lxml.etree._Element.text. __get_
  File "src/lxml/apihelpers.pxi", line 707, in lxml.etree.collectText
  File "src/lxml/apihelpers.pxi", line 1507, in lxml.etree.funicode
UnicodeDecodeError: 'utf-8' codec can't decode_byte Oxf4 in position 9: unexpected end of data

Revision history for this message
alan (alan-hamlett) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.