Comment 3 for bug 898072

Revision history for this message
scoder (scoder) wrote : Re: lxml.html.parse treats encoding as Latin1 when reading from file-objects directly

I assume that your system's default encoding (that CPython uses for opening the file) is not Latin-1 and that the HTML page uses exactly that encoding? In that case, pass the encoding into the parser explicitly.

Rejecting this ticket, because lxml (or libxml2) cannot possibly know what encoding your file is encoded with if the file does not contain any information about the encoding.