lxml.html.html5parser crashes when given non-unicode input in python3
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Medium
|
Ondergetekende |
Bug Description
The first thing html5parser.
Between this bug and #1654544, there is no possible way to use html5parser.
## Version info
Python : sys.version_
lxml.etree : (3, 7, 3, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
## Reproduction:
lxml.html.
-------
TypeError Traceback (most recent call last)
<ipython-
----> 1 lxml.html.
/home/koert/
149 # document starts with doctype or <html>, full document!
150 start = html[:50]
--> 151 if start.startswit
152 return doc
153
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
Changed in lxml: | |
assignee: | nobody → Ondergetekende (kvdveer) |
importance: | Undecided → Medium |
status: | New → Fix Committed |
Patched & added unit tests