lxml

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #898072
Comment #5

Comment 5 for bug 898072

Revision history for this message

scoder (scoder) wrote on 2011-11-30: Re: lxml.html.parse treats encoding as Latin1 when reading from file-objects directly

No, Python 3.x does not magically "handle" this. It *guesses* the encoding, based on platform parameters. libxml2 guesses something different, which is just as good and simply happens to be the wrong assumption for this specific file. Just because the encoding of the file happens to be the same as the default encoding on your platforms does not mean that that's the case for all files that you (or someone else) wants to parse. So, doing what Python does would work in this specific case, but fail in others.