>+ # substitute & for & unless it is already
>+ # used in a character entity.
>+ $var =~ s/&(?![#A-Za-z][0-9A-Za-z]+;)/&/g;
That's getting too complex. The way the filter is used, it should be displaying "™" if somebody writes "™".
>+ # the following nukes characters disallowed by the XML 1.0
>+ # spec, Production 2.2. 1.0 declares that only the following
>+ # are valid:
>+ # (#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF])
>+ $var =~ s/([\x{0001}-\x{0008}]|
>+ [\x{000B}-\x{000C}]|
>+ [\x{00E}-\x{0019}]|
>+ [\x{D800}-\x{DFFF}]|
>+ [\x{FFFE}-\x{FFFF}])//gx;
I'd rather replace them with HTML entities, is that possible? People export data via XML, and theoretically some of these characters could be in comments (as unlikely as it seems).
Comment on attachment 354687
V1
>+ # substitute & for & unless it is already #A-Za-z] [0-9A-Za- z]+;)/& amp;/g;
>+ # used in a character entity.
>+ $var =~ s/&(?![
That's getting too complex. The way the filter is used, it should be displaying "™" if somebody writes "™".
>+ # the following nukes characters disallowed by the XML 1.0 0001}-\ x{0008} ]| -\x{000C} ]| -\x{DFFF} ]| -\x{FFFF} ])//gx;
>+ # spec, Production 2.2. 1.0 declares that only the following
>+ # are valid:
>+ # (#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF])
>+ $var =~ s/([\x{
>+ [\x{000B}
>+ [\x{00E}-\x{0019}]|
>+ [\x{D800}
>+ [\x{FFFE}
I'd rather replace them with HTML entities, is that possible? People export data via XML, and theoretically some of these characters could be in comments (as unlikely as it seems).