2021-07-05 18:50:33 |
danny0838 |
description |
demo:
list(lxml.html.fromstring('<p class="中 文">test</p>').classes)
expected:
['中 文']
actual:
['中', '文']
Aaccording to HTML spec., classes should be separated by ASCII whitespaces (which is defined as U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 SPACE) only. Other unicode spaces, such as U+3000 (fullwidth whitespace or " "), should not be considered as a class separator.
ref: https://html.spec.whatwg.org/multipage/dom.html#global-attributes:classes-2 |
demo:
list(lxml.html.fromstring('<p class="中 文">test</p>').classes)
expected:
['中 文']
actual:
['中', '文']
According to HTML spec., classes should be separated by ASCII whitespaces (which is defined as U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 SPACE) only. Other unicode spaces, such as U+3000 (fullwidth whitespace or " "), should not be considered as a class separator.
ref: https://html.spec.whatwg.org/multipage/dom.html#global-attributes:classes-2 |
|