Comment 23 for bug 191199

Revision history for this message
In , Daniel Berlin (dberlin) wrote :

Sigh, i was up too late that night.
What i meant is that you still have to remove the unrepresentable characters.
There are some characters that xml simply doesn't allow.
However, they can still make it into the db occassionally.
I believe they are 0x01-0x08, 0x0b-0x0c 0x0e-0x19. THe XML Spec 2.2 says:

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

Those not in here need to be simply dropped from the output.

ie $var =~ s<([\x01\x02\x03\x04\x05\x06\x07\x08\x0b\x0c\x0e-\x19])><>seg;