invalid UTF-8 characters cause error
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Triaged
|
Undecided
|
scoder |
Bug Description
It is an email corrector:
payload = msg.get_
parser = etree.HTMLParse
dom_tree = etree.fromstrin
#fails
etree.dump(
#fails
output = etree.tostring(
File "/usr/local/
etree.
File "lxml.etree.pyx", line 3070, in lxml.etree.dump (src/lxml/
File "lxml.etree.pyx", line 3157, in lxml.etree.tostring (src/lxml/
File "serializer.pxi", line 135, in lxml.etree.
File "serializer.pxi", line 195, in lxml.etree.
lxml.etree.
Please start me up, where to begin debugging. I think the �-s are the cause.
Thank you!
Debian squeeze 32 bit
Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
[GCC 4.4.5] on linux2
pip freeze|grep lxml
lxml==3.3.5