lxml

Bug #1284809
Comment #5

Comment 5 for bug 1284809

Revision history for this message

scoder (scoder) wrote on 2014-02-26: Re: [Bug 1284809] Re: Doc: In lxml.html.tostring() encoding "unicode" for Python 3

>> But there is Unicode in Python 3.
> And it is called str()

Unicode is actually called Unicode.

http://www.unicode.org/

The Python 2.x *type* "unicode" was renamed to "str" in Py3.

> This text is from lxml.html.tostring.__doc__
> It uses unicode() like it works, but it doesn't work in Python 3, because there is no unicode() since it was replaced by str()
> Therefore, this example is misleading
>
> >>> html.tostring(root, method='text', encoding=unicode)
> 'Helloworld!'

Ah, right. That wasn't updated. Thanks for bringing it up. It should read
encoding="unicode".

https://github.com/lxml/lxml/commit/477fa0b36c5ecd6c26d0ea5190f518ad2f7b196f

>> "str" would be ambiguous.
> Ambiguous with what ?

If I were to read encoding="str" somewhere, I'd be puzzled what it might
mean. Even encoding="unicode" isn't ideal, because Unicode is not an
encoding. But practicality beats purity here, and it's certainly more
obvious than encoding="str".