The proposed patch addresses two issues, both related to generating well-formed XML:
1) It replaces instances of '&' with '&' while skipping over valid character entities (so that '™' would *not* become '™' for example).
2) It knocks out characters from disallowed character ranges (control characters, etc) per the XML 1.0 Spec (production 2.2). Note that the more verbose form of expressing the hex characters is used in the regex because the abbreviated form appearing in character classes evidently makes certain perl's lexers cry bitter tears of failure.
Created attachment 354687
V1
The proposed patch addresses two issues, both related to generating well-formed XML:
1) It replaces instances of '&' with '&' while skipping over valid character entities (so that '™' would *not* become '™' for example).
2) It knocks out characters from disallowed character ranges (control characters, etc) per the XML 1.0 Spec (production 2.2). Note that the more verbose form of expressing the hex characters is used in the regex because the abbreviated form appearing in character classes evidently makes certain perl's lexers cry bitter tears of failure.