I mean, Python 3.x will decode the file in the specified encoding or system default one. I did not pass `encoding='utf-8'` to the `open()` call just because that happens to be my system default (my fault), which will decode the attached html file correctly.
lxml (or libxml2) *needn't* guess at all about a *text file object*. Python already takes care of this, and lxml should not ignore this. Why set `encoding` of `htmlCtxtReadIO` to NULL (around parser.pxi:331) when you've encoded it in UTF-8 (around parser.pxi:380)?
I mean, Python 3.x will decode the file in the specified encoding or system default one. I did not pass `encoding='utf-8'` to the `open()` call just because that happens to be my system default (my fault), which will decode the attached html file correctly.
lxml (or libxml2) *needn't* guess at all about a *text file object*. Python already takes care of this, and lxml should not ignore this. Why set `encoding` of `htmlCtxtReadIO` to NULL (around parser.pxi:331) when you've encoded it in UTF-8 (around parser.pxi:380)?