Comment 0 for bug 1934687

Revision history for this message
danny0838 (danny0838) wrote :

demo:

    list(lxml.html.fromstring('<p class="中 文">test</p>').classes)

expected:

    ['中 文']

actual:

    ['中', '文']

Aaccording to HTML spec., classes should be separated by ASCII whitespaces (which is defined as U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 SPACE) only. Other unicode spaces, such as U+3000 (fullwidth whitespace or " "), should not be considered as a class separator.

ref: https://html.spec.whatwg.org/multipage/dom.html#global-attributes:classes-2