Comment 9 for bug 1518150

Revision history for this message
John Lenton (chipaca) wrote :

I'm not sure why it seems to think anything not in the first plane is a "special character". The yaml spec says
"The allowed character range explicitly excludes the surrogate block #xD800-#xDFFF, DEL #x7F, the C0 control block #x0-#x1F (except for #x9, #xA, and #xD), the C1 control block #x80-#x9F, #xFFFE, and #xFFFF."

I've tried a few characters that need 3 or 4 bytes in utf8 (because I was looking at bugs in utf16 in a yaml parser in a different language) and found that pyyaml aborts with them. At first I thought it was confusing the utf8 encoding with the codepoint, but U+20021 (𠀡) is not that.